skip to main content
Term Weighting Approaches in Automatic Text RetrievalNovember 1987
1987 Technical Report
Publisher:
  • Cornell University
  • PO Box 250, 124 Roberts Place Ithaca, NY
  • United States
Published:01 November 1987
Bibliometrics
Skip Abstract Section
Abstract

The experimental evidence accumulated over the past 20 years indicates that textindexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term weighting systems. This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

Cited By

  1. Nkongolo Wa Nkongolo M and Bi Y (2023). News Classification and Categorization with Smart Function Sentiment Analysis, International Journal of Intelligent Systems, 2023, Online publication date: 1-Jan-2023.
  2. Oommen B, Khoury R and Schmidt A Text Classification Using "Anti"-Bayesian Quantile Statistics-Based Classifiers Transactions on Computational Collective Intelligence XXV - Volume 9990, (101-126)
  3. Mu T, Goulermas J, Korkontzelos I and Ananiadou S (2016). Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities, Journal of the Association for Information Science and Technology, 67:1, (106-133), Online publication date: 1-Jan-2016.
  4. ACM
    Gomes de Andrade F, de Souza Baptista C and Henriques H Semantic annotation of geodata based on linked-open data Proceedings of the 7th International Conference on Management of computational and collective intElligence in Digital EcoSystems, (9-16)
  5. Sellami M, Bouchaala O, Gaaloul W and Tata S (2019). Communities of Web service registries, Journal of Systems and Software, 86:3, (835-853), Online publication date: 1-Mar-2013.
  6. ACM
    Sipos R, Swaminathan A, Shivaswamy P and Joachims T Temporal corpus summarization using submodular word coverage Proceedings of the 21st ACM international conference on Information and knowledge management, (754-763)
  7. De Maio C, Fenza G, Loia V and Senatore S (2012). Hierarchical web resources retrieval by exploiting Fuzzy Formal Concept Analysis, Information Processing and Management: an International Journal, 48:3, (399-418), Online publication date: 1-May-2012.
  8. ACM
    Sellami M, Gaaloul W and Tata S An implicit approach for building communities of web service registries Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, (230-237)
  9. ACM
    Bellogin A, Wang J and Castells P Structured collaborative filtering Proceedings of the 20th ACM international conference on Information and knowledge management, (2257-2260)
  10. Yang B and Heines J Using semantic distance to automatically suggest transfer course equivalencies Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications, (142-151)
  11. ACM
    Kumar S, Reddy P, Reddy V and Singh A Similarity analysis of legal judgments Proceedings of the Fourth Annual ACM Bangalore Conference, (1-4)
  12. Singh R and Bhattarai B (2018). Dynamic content-page identification for media-rich websites, Multimedia Tools and Applications, 50:3, (491-507), Online publication date: 1-Dec-2010.
  13. Dos Santos F, De Carvalho V and Rezende S Selecting candidate labels for hierarchical document clusters using association rules Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II, (163-176)
  14. ACM
    Li W, Zhang C and Hu S (2010). G-Finder, ACM SIGPLAN Notices, 45:10, (62-73), Online publication date: 17-Oct-2010.
  15. ACM
    Li W, Zhang C and Hu S G-Finder Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (62-73)
  16. Paltoglou G and Thelwall M A study of information retrieval weighting schemes for sentiment analysis Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (1386-1395)
  17. Bysani P Detecting novelty in the context of progressive summarization Proceedings of the NAACL HLT 2010 Student Research Workshop, (13-18)
  18. Reidemeister T, Munawar M, Jiang M and Ward P Diagnosis of recurrent faults using log files Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research, (12-23)
  19. ACM
    Au Yeung C, Gibbins N and Shadbolt N User-induced links in collaborative tagging systems Proceedings of the 18th ACM conference on Information and knowledge management, (787-796)
  20. De Maio C, Fenza G, Loia V and Senatore S Towards an automatic fuzzy ontology generation Proceedings of the 18th international conference on Fuzzy Systems, (1044-1049)
  21. Liu J, Lam T, Wang H and Tam H Automatic extraction of fuzzy domain ontology concepts Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7, (222-226)
  22. ACM
    Au Yeung C, Gibbins N and Shadbolt N Contextualising tags in collaborative tagging systems Proceedings of the 20th ACM conference on Hypertext and hypermedia, (251-260)
  23. Basiri M and Nemati S A novel hybrid ACO-GA algorithm for text feature selection Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (2561-2568)
  24. Aghdam M, Ghasem-Aghaee N and Basiri M (2009). Text feature selection using ant colony optimization, Expert Systems with Applications: An International Journal, 36:3, (6843-6853), Online publication date: 1-Apr-2009.
  25. ACM
    Singh R and Bhhatarai B Information-theoretic identification of content pages for analyzing user information needs and actions on the multimedia web Proceedings of the 2009 ACM symposium on Applied Computing, (1806-1810)
  26. Brunzel M The XTREEM Methods for Ontology Learning from Web Documents Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, (3-26)
  27. Carchiolo V, Malgeri M, Mangioni G and Nicosia V (2008). Emerging structures of P2P networks induced by social relationships, Computer Communications, 31:3, (620-628), Online publication date: 15-Feb-2008.
  28. ACM
    von Brzeski V, Irmak U and Kraft R Leveraging context in user-centric entity detection systems Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, (691-700)
  29. Rekimoto J, Miyaki T and Ishizawa T Lifetag Proceedings of the 3rd international conference on Location-and context-awareness, (35-49)
  30. Ruotsalo T and Hyvönen E A method for determining ontology-based semantic relevance Proceedings of the 18th international conference on Database and Expert Systems Applications, (680-688)
  31. Brunzel M Learning of semantic sibling group hierarchies - K-means vs. bi-secting-K-means Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery, (365-374)
  32. Ropero J, Gómez A, León C and Carrasco A Information extraction in a set of knowledge using a fuzzy logic based intelligent agent Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III, (811-820)
  33. Brunzel M and Spiliopoulou M Domain relevance on term weighting Proceedings of the 12th international conference on Applications of Natural Language to Information Systems, (427-432)
  34. ACM
    Nagarajan M, Sheth A, Aguilera M, Keeton K, Merchant A and Uysal M Altering document term vectors for classification Proceedings of the 16th international conference on World Wide Web, (1225-1226)
  35. Rahman M, Pi Yang W, Chow T and Wu S (2018). A flexible multi-layer self-organizing map for generic processing of tree-structured data, Pattern Recognition, 40:5, (1406-1424), Online publication date: 1-May-2007.
  36. Bhattarai B, Wong M and Singh R Discovering user information goals with semantic website media modeling Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I, (364-375)
  37. Carchiolo V, Malgeri M, Mangioni G and Nicosia V Evaluating the dynamic behaviour of PROSA P2P network Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications, (904-915)
  38. Ceravolo P, Damiani E, Leida M and Viviani M OntoExtractor Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part II, (1825-1834)
  39. Holi M and Hyvönen E Fuzzy view-based semantic search Proceedings of the First Asian conference on The Semantic Web, (351-365)
  40. Messina E, Toscani D and Archetti F UP-DRES Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining, (146-160)
  41. Milik N, Marshall M and Mitrovic A Responding to free-form student questions in ERM-Tutor Proceedings of the 8th international conference on Intelligent Tutoring Systems, (707-709)
  42. Carchiolo V, Malgeri M, Mangioni G and Nicosia V Self-organisation of resources in PROSA p2p network Proceedings of the Second IEEE international conference on Self-Managed Networks, Systems, and Services, (171-174)
  43. ACM
    Agarwal N, Liu H and Zhang J (2006). Blocking objectionable web content by leveraging multiple information sources, ACM SIGKDD Explorations Newsletter, 8:1, (17-26), Online publication date: 1-Jun-2006.
  44. ACM
    Parnin C, Görg C and Rugaber S Enriching revision history with interactions Proceedings of the 2006 international workshop on Mining software repositories, (155-158)
  45. ACM
    Lee J and Hou J Modeling steady-state and transient behaviors of user mobility Proceedings of the 7th ACM international symposium on Mobile ad hoc networking and computing, (85-96)
  46. Last M, Markov A and Kandel A Multi-lingual detection of terrorist content on the web Proceedings of the 2006 international conference on Intelligence and Security Informatics, (16-30)
  47. Liu N, Zhang B, Yan J, Chen Z, Liu W, Bai F and Chien L Text Representation Proceedings of the Fifth IEEE International Conference on Data Mining, (725-728)
  48. ACM
    Singh M, Kalagnanam J, Verma S, Shah A and Chalasani S Automated cleansing for spend analytics Proceedings of the 14th ACM international conference on Information and knowledge management, (437-445)
  49. How B, Kulathuramaiyer N and Kiong W Categorical Term Descriptor Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, (313-316)
  50. Carchiolo V, Malgeri M, Mangioni G and Nicosia V Efficient searching and retrieval of documents in PROSA Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing, (298-309)
  51. Tan S, Hoon G, Yong C, Kong T and Lin C Mapping search results into self-customized category hierarchy Intelligent information processing II, (311-323)
  52. How B and Narayanan K An Empirical Study of Feature Selection for Text Categorization based on Term Weightage Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, (599-602)
  53. Vembu S and Baumann S A self-organizing map based knowledge discovery for music recommendation systems Proceedings of the Second international conference on Computer Music Modeling and Retrieval, (119-129)
  54. Oudshoff A, Bosloper I, Klos T and Spaanenburg L (2018). Knowledge discovery in virtual community texts: Clustering virtual communities, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 14:1, (13-24), Online publication date: 1-Jan-2003.
  55. ACM
    Wolin B Automatic classification in product catalogs Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, (351-352)
  56. Lagus K (2019). Text Retrieval Using Self-Organized Document Maps, Neural Processing Letters, 15:1, (21-29), Online publication date: 1-Feb-2002.
  57. ACM
    Ioannou S, Moschovitis G, Ntalianis K, Karpouzis K and Kollias S Effective access to large audiovisual assets based on user preferences Proceedings of the 2000 ACM workshops on Multimedia, (227-232)
  58. Müller H, Müller W, Squire D, Pečenović Z, Marchand-Maillet S and Pun T An open framework for distributed multimedia retrieval Content-Based Multimedia Information Access - Volume 1, (701-712)
  59. ACM
    Pereira F and Costa E The influence of learning in the behavior of information retrieval adaptive agents Proceedings of the 2000 ACM symposium on Applied computing - Volume 1, (452-457)
  60. Bollacker K, Lawrence S and Giles C (2000). Discovering Relevant Scientific Literature on the Web, IEEE Intelligent Systems, 15:2, (42-47), Online publication date: 1-Mar-2000.
  61. Chun I, Lee J and Lee E (2018). I-SEE, International Journal of Electronic Commerce, 4:2, (83-98), Online publication date: 1-Dec-1999.
  62. Lagus K, Honkela T, Kaski S and Kohonen T (2019). Websom for Textual Data Mining, Artificial Intelligence Review, 13:5-6, (345-364), Online publication date: 1-Dec-1999.
  63. ACM
    Bollacker K, Lawrence S and Giles C A system for automatic personalized tracking of scientific literature on the Web Proceedings of the fourth ACM conference on Digital libraries, (105-113)
  64. Good N, Schafer J, Konstan J, Borchers A, Sarwar B, Herlocker J and Riedl J Combining collaborative filtering with personal agents for better recommendations Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, (439-446)
  65. Fragoudis D and Likothanassis S Retriever Proceedings of the 20th international conference on Information Systems, (422-427)
  66. ACM
    Sahami M, Yusufali S and Baldonaldo M SONIA Proceedings of the third ACM conference on Digital libraries, (200-209)
  67. ACM
    Giles C, Bollacker K and Lawrence S CiteSeer Proceedings of the third ACM conference on Digital libraries, (89-98)
  68. ACM
    Boone G Concept features in Re:Agent, an intelligent Email agent Proceedings of the second international conference on Autonomous agents, (141-148)
  69. ACM
    Bollacker K, Lawrence S and Giles C CiteSeer Proceedings of the second international conference on Autonomous agents, (116-123)
  70. Moukas A and Maes P (1998). Amalthaea, Autonomous Agents and Multi-Agent Systems, 1:1, (59-88), Online publication date: 1-Jan-1998.
  71. Boughanem M and Soulé-Dupuy C Query modification based on relevance backpropagation Computer-Assisted Information Searching on Internet, (469-487)
  72. ACM
    Moukas A and Zacharia G Evolving a multi-agent information filtering solution in Amalthaea Proceedings of the first international conference on Autonomous agents, (394-403)
  73. ACM
    Weiss R, Vélez B and Sheldon M HyPursuit Proceedings of the the seventh ACM conference on Hypertext, (180-193)
  74. ACM
    Stanfill C Partitioned posting files: a parallel inverted file structure for information retrieval Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, (413-428)
  75. ACM
    Crouch C A cluster-based approach to thesaurus construction Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, (309-320)
  76. ACM
    Salton G and Buckley C On the use of spreading activation methods in automatic information Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, (147-160)
Contributors
  • Cornell University
  • Cornell University

Recommendations