ABSTRACT
In this paper, we present a new multithreaded framework for information extraction with Java in heterogeneous enterprise application environments, which frees the developer from having to deal with the error-prone task of low-level thread programming. The power of this framework is demonstrated by an example of extracting product prices from web sites, but the framework is useful for numerous other purposes, too. Strong points of the framework are its performance, continuous feedback, and adherence to maximum response times. The description of the framework uses UML modeling techniques for visualizing multithreading. Moreover, we tackle Java problems of stopping running threads.
- Chan, P. (2002): The Java Developers Almanac 1.4, Volume 1: Examples and Quick Reference, e93. Stopping a Thread, http://javaalmanac.com/egs/java.lang/StopThread.html Google ScholarDigital Library
- Doorenbos, R. B., Etzioni, O., and Weld, D. S. (1997): A Scalable Comparison-Shopping Agent for the World-Wide Web, in: Proc. ACM Conf. Autonomous Agents, ftp://ftp.cs.washington.edu/pub/etzioni/softbots/agents97.ps Google ScholarDigital Library
- Eikvil, L. (1999): Information Extraction from World Wide Web - A Survey. Norwegian Computing Center, P. B. 114 Blindern, N-0314 Oslo, Norwegen, Rapport Nr. 945Google Scholar
- Friedl, J. E. F. (2002): Mastering Regular Expressions, 2nd edition, O'Reilly & Associates Google ScholarDigital Library
- Hull, R. (1997): Managing Semantic Heterogeneity in Databases - A Theoretical Perspective. Tutorial. Bell Laboratories. Lucent Technologies. http://www.db-research.bell-labs.com/user/hull/pods97-tutorial.html Google ScholarDigital Library
- Krulwich, B. T. (1996): The BargainFinder Agent - Comparison Price Shopping on the Internet, in: Williams, Joseph (ed.): Bots and other Internet Beasties, Sams. net Publishing (Macmillan), pp. 257--263Google Scholar
- Kuhlins, S. and Tredwell, R. (2003): Toolkits for Generating Wrappers - A Survey of Software Toolkits for Automated Data Extraction from Websites, in: Aksit, M., Mezini, M., and Unland, R. (eds.): Objects, Components, Architectures, Services, and Applications for a Networked World, International Conference NetObjectDays (NODe 2002), Oct. 7--10, 2002, Erfurt, Germany, Lecture Notes in Computer Science (LNCS 2591), Springer, pp. 184--198, http://www.wifo.uni-mannheim.de/~kuhlins/paper/wrapper.pdf Google ScholarDigital Library
- Kushmerick, N. (1998): (Toward) an Extensible Wrapper Repository Standard, in: Proc. Workshop on AI & Information Integration, AAAI-98 (Madison), http://www.cs.ucd.ie/staff/nick/home/research/download/kushmerick-aaai98-aiii-panel.ps.gzGoogle Scholar
- Kushmerick, N. (2002): Gleaning Answers from the Web. Position paper, AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases.Google Scholar
- Lea, D. (1999): Concurrent Programming in Java - Design Principles and Patterns, Second edition, Addition-Wesley: "Multiphase cancellation": http://gee.cs.oswego.edu/dl/cpj/cancel.html Google ScholarDigital Library
- Roth, M. T. and Schwarz, P. (1997): A Wrapper Architecture for Legacy Data Sources. IBM Almaden Research Center. Google ScholarDigital Library
- Schader, M., and Korthaus, A. (1998): Modeling Java Threads in UML. In: Schader, M., and Korthaus, A. (eds.): The Unified Modeling Language - Technical Aspects and Applications. Physica, Heidelberg, New York, pp. 122--143Google ScholarCross Ref
- Sun Microsystems (2003): Java 2 Platform, Standard Edition, v 1.4.2, API Specification, Class Thread, http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Thread.html#stop()Google Scholar
- Wiederhold, G. (1992): Mediators in the Architecture of Future Information Systems, in: IEEE Computer, 25(3), pp. 38--49, http://www-db.stanford.edu/pub/gio/1991/afis.ps Google ScholarDigital Library
Index Terms
- A multithreaded Java framework for information extraction in the context of enterprise application integration
Comments