Abstract
In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.
- McCallum, A., Corrada-Emanuel, A., and Wang, X. 2005. Topic and role discovery in social networks. International Joint Conferences on Artificial Intelligence. Google ScholarDigital Library
- Collins, M., and Singer, Y. 1999. Unsupervised models for named entity classification.Google Scholar
- Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the ICML: 282--289. Google ScholarDigital Library
- Klein, D., Smarr, J., Nguyen, H., and Manning, C. 2003. Named entity recognition with character-level models. Proceedings of the Seventh Conference on Natural Language Learning. Google ScholarDigital Library
- Wang, X., Mohanty, N., and McCallum, A. 2005. Group and topic discovery from relations and text. In Workshop on Link Discovery (LinkKDD), Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Bikel, D. M., Miller, S., Schwartz, R., and Weischedel, R. 1997. Nymble: A high-performance learning name-finder. Proceedings of ANLP: 194--201. Google ScholarDigital Library
- McCallum, A., and Jensen, D. 2003. A note on the unification of information extraction and data mining using conditional-probability, relational models. IJCAI Workshop on Learning Statistical Models from Relational Data.Google Scholar
- Lawrence, S., Giles, C. L., and Bollacker, K. 1999. Digital libraries and autonomous citation indexing. IEEE Computer 32(6): 67--71. Google ScholarDigital Library
- Soderland, S., and Lehnert, W. G. 1994. Corpus-driven knowledge acquisition for discourse analysis. AAAI. Google ScholarDigital Library
- Kleinberg, J. 2002. Bursty and hierarchical structure in streams. ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- See reference 5.Google Scholar
- Carvalho, V. R., and Cohen, W. W. 2004. Learning to extract signature and reply lines from e-mail. Conference on E-mail and Spam (CEAS).Google Scholar
- Califf, M. E., and Mooney, R. 1999. Relational learning of pattern-match rules for information extraction. Proceedings of the National Conference on Artificial Intelligence. Google ScholarDigital Library
- See reference 6.Google Scholar
- See reference 4.Google Scholar
- See reference 7.Google Scholar
- See reference 8.Google Scholar
- Freitag, D., and McCallum, A. K. 1999. Information extraction with HMMs and shrinkage. Proceedings of the AAAI Workshop on Machine Learning for Information Extraction.Google Scholar
- Roth, D., and Yih, W. 2002. Probabilistic reasoning for entity and relation recognition. COLING. Google ScholarDigital Library
- See reference 1.Google Scholar
- See reference 3.Google Scholar
- Nahm, U. Y., and Mooney, R. J. 2000. A mutually beneficial integration of data mining and information extraction. AAAI/IAAI: 627--632. Google ScholarDigital Library
- See reference 9.Google Scholar
- Culotta, A., and Sorensen, J. 2004. Dependency tree kernels for relation extraction. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Google ScholarDigital Library
- Ando, R. K., and Zhang, T. 2005. A high-performance semi-supervised learning method for text chunking. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Google ScholarDigital Library
- See reference 3.Google Scholar
- McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. Proceedings of ICML: 591--598. Google ScholarDigital Library
- Wellner, B., McCallum, A., Peng, F., and Hay, M. 2004. An integrated, conditional model of information extraction and co-reference with application to citation matching. Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarDigital Library
- Kristjannson, T., Culotta, A., Viola, P., and McCallum, A. 2004. Interactive information extraction with conditional random fields. Nineteenth National Conference on Artificial Intelligence. Google ScholarDigital Library
Index Terms
- Information Extraction: Distilling structured data from unstructured text
Recommendations
Visual information extraction
Typographic and visual information is an integral part of textual documents. Most information extraction (IE) systems ignore most of this visual information, processing the text as a linear sequence of words. Thus, much valuable information is lost. In ...
Comments