Article

Free Access

A sequential algorithm for training text classifiers

Authors:
David D. Lewis

AT&T Bell Laboratories, Murray Hill, NJ

AT&T Bell Laboratories, Murray Hill, NJ
View Profile

,
William A. Gale

AT&T Bell Laboratories, Murray Hill, NJ

AT&T Bell Laboratories, Murray Hill, NJ
View Profile

SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrievalAugust 1994Pages 3–12

Published:01 August 1994Publication History

SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 3–12

References

1.P. J. Hayes. Intelligent high-volume text processing using shallow, domain-specific techniques, in Paul. S. Jacobs, editor, Text-Based Intelligent Systems: Current Research zn Text Analyszs, Information Extraction, and Retrieval, pages 227-241. Lawrence Erlbaum, Hillsdale, Nil, 1992. Google Scholar
2.P. Biebricher, N. Fuhr, G. Lustig, M. Schwantner, and G. Knorz. The automatic indexing system AIR/PHYS--from research to application. In Proc. SIGIR-88, pages 333--342, 1988. Google ScholarDigital Library
3.W. G. Cochran. Samphng Techniques. John Wiley & Sons, New York, 3rd edition, 1977.Google Scholar
4.G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288-297, 1990.Google ScholarCross Ref
5.W. A. Gale, K. W. Church, and D. Yarowsky. A method for disambiguating word senses in a large corpus. Computers and the Humanitzes, 26:415-439, 1993.Google ScholarCross Ref
6.B. K. Ghosh. A brief history of sequential analysis, in B. K. Ghosh and P. K. Sen, editors, Handbook of Sequential Analyszs, chapter 1, pages 1-19. Marcel Dekker, New York, 1991.Google Scholar
7.D. Angluin. Queries and concept learning. Machine Learning, 2:319-342, 1988. Google ScholarDigital Library
8.M. Plutowski and H. White. Selecting concise training sets from clean data. IEEE Transactzons on Neural Networks, 4(2):305-318, March 1993.Google ScholarDigital Library
9.D. Cohn, L. Atlas, and R. Ladner. Improving generalization with self-directed learning, 1992. To appear in Machine Learnzng. Google ScholarDigital Library
10.D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computat,on, 4:720-736, 1992.Google Scholar
11.H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the Fifth Annual A CM Workshop on Computational Learning Theory, pages 287-294, 1992. Google ScholarDigital Library
12.T. M. Mitchell. Generalization as search. Artificial Intell, gence, 18:203-226, 1982.Google Scholar
13.Y. Freund, It. S. Seung, E. Shamir, and N. Tishby. Information, prediction, and query by committee. In Advances ,n Neural Informatzons Processing Systems 5, San Mateo, CA, 1992. Morgan Kaufmann. Google ScholarDigital Library
14.J. Hwang, J. J. Choi, S. Oh, and R. J. Marks II. Query-based learning applied to partially trained multilayer perceptrons. IEEE Transactions on Neural Networks, 2(1):131-136, January 1991.Google ScholarDigital Library
15.D. T. Davis and J. Hwang. Attentional focus training by boundary region data selection. In International Joznt Conference on Neural Networks, pages 1-676 to 1-681, Baltimore, MD, June 7-11 1992.Google ScholarCross Ref
16.P. E. Hart. The condensed nearest neighbor rule. IEEE Transactions on Informal,on Theory, IT-14:515-516, May 1968.Google ScholarCross Ref
17.P. E. Utgoff. Improved training via incremental learning. In Szxth International Workshop on Machine Learning, pages 362-365, 1989. Google ScholarDigital Library
18.N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55-72, 1989. Google ScholarDigital Library
19.D. D. Lewis. An evaluation of phrasal and clustered representations on a text categorization task. In Proc. SIGIR-92, pages 37-50, 1992. Google ScholarDigital Library
20.M. E. Maron. Automatic indexing: An experimental inquiry. Journal of the Associat,on }:or Computing Machinery, 8:404-417, 1961. Google ScholarDigital Library
21.W. S. Cooper. Some inconsistencies and misnomers in probabilistic information retrieval. In Proc. SIGItt-91, pages 57-61, 1991. Google ScholarDigital Library
22.P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman ~ Hall, London, 2nd edition, 1989.Google Scholar
23.W. S. Cooper, F. C. Gey, and D. P. Dabney. Probabilistic retrieval based on staged logistic regression. In Proc. SIGIR-92, pages 198-210, 1992. Google ScholarDigital Library
24.N. Fuhr and U. Pfeifer. Combining model-oriented and description-oriented approaches for probabilistic indexing. In Proc. SIGIR-91, pages 46-56, 1991. Google ScholarDigital Library
25.S. Robertson and J. Bovey. Statistical problems in the application of probabilistic models to information retrieval. Report 5739, British Library, London, 1982.Google Scholar
26.W. A. Gale and K. W. Church. Poor estimates of context are worse than none. In Speech and Natural Language Workshop, pages 283-287, San Mateo, CA, June 1990. DARPA, Morgan Kaufmann. Google ScholarDigital Library
27.R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley-Interscience, New York, 1973.Google ScholarDigital Library
28.N. Goldstein, editor. The Assoczated Press Stylebook and L~bcl Manual. Addison-Wesley, Reading, MA, 1992.Google Scholar
29.W. B. Croft and D. J. Harper. Using probabilistic models of document retrieval without relevance feedback. Journal of Documentat,on, 35(4):285-295, 1979.Google Scholar
30.C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. Google ScholarDigital Library
31.A. Bookstein. Information retrieval: A sequential learning process. Yournal of the American Society for Information Science, 34:331-342, September 1983.Google ScholarCross Ref
32.David D. Lewis and Jason Catlett. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh International Conference on Machzne Learning, 1994. To appear.Google Scholar

Index Terms

Recommendations

Vertical Ensemble Co-Training for Text Classification
Regular Papers

High-quality, labeled data is essential for successfully applying machine learning methods to real-world text classification problems. However, in many cases, the amount of labeled data is very small compared to that of the unlabeled, and labeling ...
Read More
Reduction of training noises for text classifiers
ACIIDS'13: Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II

Automatic text classification (TC) is essential for the archiving and retrieval of texts, which are main ways of recording information and expertise. Previous studies thus have developed many text classifiers. They often employed training texts to build ...
Read More
Training more discriminative multi-class classifiers for hand detection

In this paper, an effective algorithm is developed to learn more discriminative multi-class classifiers for achieving more accurate hand detection. At each round of boosting, a set of shared stump classifiers with relatively low discrimination power are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
August 1994
363 pages
ISBN:038719889X
Editors:
W. Bruce Croft
Univ. of Massachusetts, Amherst
,
C. J. van Rijsbergen
Univ. of Glasgow, Glasgow, Scotland
Sponsors
In-Cooperation
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
- Published: 1 August 1994
Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 352
  Total Citations
  View Citations
- 3,048
  Total Downloads
- Downloads (Last 12 months)86
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A sequential algorithm for training text classifiers

SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval

References

Cited By

Index Terms

Recommendations

Vertical Ensemble Co-Training for Text Classification

Reduction of training noises for text classifiers

Training more discriminative multi-class classifiers for hand detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A sequential algorithm for training text classifiers

SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval

References

Cited By

Index Terms

Recommendations

Vertical Ensemble Co-Training for Text Classification

Reduction of training noises for text classifiers

Training more discriminative multi-class classifiers for hand detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media