Article

Free Access

Inductive learning algorithms and representations for text categorization

Authors:
Susan Dumais

Microsoft Research, One Microsoft way, Redmond, WA

Microsoft Research, One Microsoft way, Redmond, WA
View Profile

,
John Platt

Microsoft Research, One Microsoft way, Redmond, WA

Microsoft Research, One Microsoft way, Redmond, WA
View Profile

,
David Heckerman

Microsoft Research, One Microsoft way, Redmond, WA

Microsoft Research, One Microsoft way, Redmond, WA
View Profile

,
Mehran Sahami

Computer Science Department, Standford University, Standford, CA

Computer Science Department, Standford University, Standford, CA
View Profile

CIKM '98: Proceedings of the seventh international conference on Information and knowledge managementNovember 1998Pages 148–155https://doi.org/10.1145/288627.288651

Published:01 November 1998Publication History

CIKM '98: Proceedings of the seventh international conference on Information and knowledge management

Pages 148–155

References

1.Apte, C., Damerau, F. and Weiss, S. Automated learning of decision rules for text categorization. A CM Transactions on Information Systems, 12(3), 233-251, 1994. Google ScholarDigital Library
2.Apte, C., Damerau, F. and Weiss, S. Text Mining with decision rules and decision trees. Proceedings of the Conference on Automated Learning and Discovery, CMU, June, 1998.Google Scholar
3.Boser, B. E., Guyon, I. M., and Vapnik, V., A Training Algorithm for Optimal Margin Classifiers. Fifth Annual Workshop on Computational Learning Theory, ACM, 1992. Google ScholarDigital Library
4.Chickering D., Heckerman D., and Meek, C. A Bayesian approach for learning Bayesian networks with local structure. In Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, 1997. Google ScholarDigital Library
5.Cohen, W.W. and Singer, Y. Context-sensitive learning methods for text categorization In SIGIR 96: Proceedings of the 19th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval, 307-315, 1996. Google ScholarDigital Library
6.Cortes, C., and Vapnik, V., Support vector networks. Machine Learning, 20, 273-297, 1995. Google ScholarDigital Library
7.Fuhr, N., Hartmanna, S., Lustig, G., Schwantner, M., and Tzeras, K. Air/X- A rule-based multi-stage indexing system for lage subject fields. In Proceedings of RIAO'91, 606-623, 1991.Google Scholar
8.Good, I.J. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, 1965.Google Scholar
9.Hayes, P.J. and Weinstein. S.P. CONSTRUE/TIS: A system for content-based indexing of a database of news stories. In Second Annual Conference on Innovative Applications of Artificial Intelligence, 1990. Google ScholarDigital Library
10.Heckerman, D. Geiger, D. and Chickering, D.M. Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning, 20, 131-163, 1995. Google ScholarDigital Library
11.Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In Proceedings I0tn European Conference on Machine Learning (ECML), Springer Verlag, 1998. http://wwwai.cs.unidortmund.de/DOKIMENTE/Joachims 97a.ps.gz Google ScholarDigital Library
12.LeCun, Y., Jackel, L. D., Bottou, L., Cortes, C., Denker, J. S., Drucker, H., Guyon, i., Muller, U. A., Sackinger, E., Simard, P. and Vapnik, V. Learning algorithms for classification: A comparison on handwritten digit recognition. Neural Networks: The Statistical Mechanics Perspective, 261-276, 1995.Google Scholar
13.Lewis, D.D. An evaluation of phrasal and clustered representations on a text categorization task. In SIGIR'92: Proceedings of the 15th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval, 37-50, 1992. Google ScholarDigital Library
14.Lewis, D.D. and Hayes, P.J. (Eds.)ACM Transactions on Information Systems- Special Issue on Text Categorization, 12(3), 1994.Google Scholar
15.Lewis, D.D. and Ringuette, M. A comparison of two learning algorithms for text categorization. In Third Annual Symposium on Document Analysis and Information Retrieval, 81-93, 1994.Google Scholar
16.Lewis. D.D. and Sparck Jones. K. Natural language processing for information retrieval. Communications of the ACM, 39(1), 92-101, January 1996. Google ScholarDigital Library
17.Lewis, D.D., Schapire, R., Callan, J.P., and Papka, R. Training algorithms for linear text classifiers. In SIGIR '96: Proceedings of the 19th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval, 298-306, 1996. Google ScholarDigital Library
18.Osuna, E., Freund, R., and Girosi, F. Training support vector machines: An application to face detection. In Proceedings of Computer Vision and Pattern Recognition '97, 130-136, 1997. Google ScholarDigital Library
19.Platt, J. Fast training of SVMs using sequential minimal optimization. To appear in: B. Scholkopf, C. Burges, and A. Smola (Eds.) Advances in Kernel Methods- Support Vector Learning, MIT Press, 1998. Google ScholarDigital Library
20.Rocchio, J.J. Jr. Relevance feedback in information retrieval, in G.Salton (Ed.), The SMART Retrieval System: Experiments in Automatic Document Processing, 313-323. Prentice Hall, 1971.Google ScholarDigital Library
21.Sahami, M. Learning Limited Dependence Bayesian Classifiers. In KDD-96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 335-338, AAAI Press, 1996. httr>://robotics.stanford.edu/users/sahami/papersdir/kdd96-1earn-bn.psGoogle Scholar
22.Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. A Bayesian approach to filtering junk e-mail. AAAI 98 Workshop on Text Categorization, July 1998. http://robotics.stanford.edu/users/sahami/papersdir/spam.psGoogle Scholar
23.Salton, G. and McGill, M. Introduction to Modern Information Retrieval. McGraw Hill, 1983. Google ScholarDigital Library
24.Schapire, R., Freund, Y., Bartlett, P. and Lee, W. S. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, to appear, 1998. Google ScholarDigital Library
25.Schiitze, H., Hull, D. and Pedersen, J.O. A comparison of classifiers and document representations for the routing problem, in SIGIR 95: Proceedings of the 18th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval, 229-237, 1995. Google ScholarDigital Library
26.Vapnik, V., The Nature of Statistical Learning Theory, Springer-Verlag, 1995. Google ScholarDigital Library
27.Wiener E., Pedersen, J.O. and Weigend, A.S. A neural network approach to topic spotting. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), 1995.Google Scholar
28.Yang, Y. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. SIGIR '94: Proceedings of the 17th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval, 13-22, 1994. Google ScholarDigital Library
29.Yang. Y. and Chute, C.G. An example-based mapping method for text categorization and retrieval. A CM Transactions on Information Systems, 12(3), 252-277, 1994. Google ScholarDigital Library
30.Yang, Y. and Pedersen, J.O. A comparative study on feature selection in text categorization, in Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97), 412-420, 1997. Google ScholarDigital Library
31.Yang, Y. An evaluation of statistical approaches to text categorization. CMU Technical Report, CMU-CS- 97-127, April 1997.Google Scholar
32.The Reuters-21578 collection is available at: http://www.research.att.conff-lewis/reuters2157 8.htmlGoogle Scholar

Index Terms

Inductive learning algorithms and representations for text categorization

Recommendations

Application for Web Text Categorization Based on Support Vector Machine
IFCSTA '09: Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 02

This paper put forward a text categorization method based on Naive Bayes learning support vector machine. First adopt the text pre-processing. Then vector space model and linked list of technical are used to extract text features, reduce dimensions ...
Read More
Large-scale linear nonparallel support vector machine solver

Twin support vector machines (TWSVMs), as the representative nonparallel hyperplane classifiers, have shown the effectiveness over standard SVMs from some aspects. However, they still have some serious defects restricting their further study and real ...
Read More
Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words
PAKDD '09: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

Text Categorization (TC) remains as a potential application area for linear support vector machines (SVMs). Among the numerous linear SVM formulations, we bring forward linear PSVM together with recently proposed distributional clustering (DC) of words ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '98: Proceedings of the seventh international conference on Information and knowledge management
November 1998
450 pages
ISBN:1581130619
DOI:10.1145/288627
Chairmen:
Niki Pissinou
Univ. of Southwestern,Louisiana
,
Charles Nicholas
Univ. of Maryland, Baltimore County
,
James French
Univ. of Virginia
,
George Gardarin
Univ. of Versailles SQ,/INRIA
,
Editors:
K. Makki,
L. Bouganim
Copyright © 1998 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 1998
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
information management
machine learning
support vector machines
text categorization
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 871
  Total Citations
  View Citations
- 5,514
  Total Downloads
- Downloads (Last 12 months)369
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Inductive learning algorithms and representations for text categorization

CIKM '98: Proceedings of the seventh international conference on Information and knowledge management

References

Cited By

Index Terms

Recommendations

Application for Web Text Categorization Based on Support Vector Machine

Large-scale linear nonparallel support vector machine solver

Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Inductive learning algorithms and representations for text categorization

CIKM '98: Proceedings of the seventh international conference on Information and knowledge management

References

Cited By

Index Terms

Recommendations

Application for Web Text Categorization Based on Support Vector Machine

Large-scale linear nonparallel support vector machine solver

Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media