Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent benefits for retrieval performance, and it is difficult to construct a thesaurus automatically for large text databases. In this paper, an approach, called PhraseFinder, is proposed to construct collection-dependent association thesauri automatically using large full-text document collections. The association thesaurus can be accessed through natural language queries in INQUERY, an information retrieval system based on the probabilistic inference network. Experiments are conducted in INQUERY to evaluate different types of association thesauri, and thesauri constructed for a variety of collections.
Cited By
- Wei C, Yang C, Lee C, Shi H and Yang C (2018). Exploiting poly-lingual documents for improving text categorization effectiveness, Decision Support Systems, 57, (64-76), Online publication date: 1-Jan-2014.
- Campan A, Cooper N and Truta T On-the-fly generalization hierarchies for numerical attributes revisited Proceedings of the 8th VLDB international conference on Secure data management, (18-32)
- Chau R, Yeh C and Smith K A neural network model for hierarchical multilingual text categorization Proceedings of the Second international conference on Advances in neural networks - Volume Part II, (238-245)
- Chau R and Yeh C (2004). Fuzzy Conceptual Indexing for Concept-Based Cross-Lingual Text Retrieval, IEEE Internet Computing, 8:5, (14-21), Online publication date: 1-Sep-2004.
- Lin S, Chen M, Ho J and Huang Y (2002). ACIRD, IEEE Transactions on Knowledge and Data Engineering, 14:3, (599-614), Online publication date: 1-May-2002.
- Schiffman B and McKeown K Experiments in automated lexicon building for text searching Proceedings of the 18th conference on Computational linguistics - Volume 2, (719-725)
- van Doorn M and de Vries A The psychology of multimedia databases Proceedings of the fifth ACM conference on Digital libraries, (1-9)
- Vries A, Doorn M, Blanken H and Apers P The Mirror MMDBMS Architecture Proceedings of the 25th International Conference on Very Large Data Bases, (758-761)
- Aone C, Okurowski M and Gorlinsky J Trainable, scalable summarization using robust NLP and machine learning Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1, (62-66)
- Lin S, Shih C, Chen M, Ho J, Ko M and Huang Y Extracting classification knowledge of Internet documents with mining term associations Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, (241-249)
- Gonzalez-Rubio R and Guizol J A multilingual information retrieval and filtering system Computer-Assisted Information Searching on Internet - Volume 2, (773-782)
- Golovchinsky G What the query told the link Proceedings of the eighth ACM conference on Hypertext, (67-74)
- Sheridan P and Ballerini J Experiments in multilingual information retrieval using the SPIDER system Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, (58-65)
- Turtle H (2018). Text retrieval in the legal world, Artificial Intelligence and Law, 3:1-2, (5-54), Online publication date: 1-Mar-1995.
Recommendations
An association thesaurus for information retrieval
RIAO '94: Intelligent Multimedia Information Retrieval Systems and Management - Volume 1Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent benefits for retrieval performance, and it is difficult to construct a thesaurus automatically for large text databases. ...
Thesaurus Performance with Information Retrieval: Schema Matching as a Case Study
SMC '13: Proceedings of the 2013 IEEE International Conference on Systems, Man, and CyberneticsThesaurus is used with many Information Retrieval (IR) models such as data integration, data warehousing, semantic query processing and classifiers. Considering the existence of various thesauri for a particular domain of knowledge, output quality of an ...