ABSTRACT
Text classification is becoming more and more important with the rapid growth of on-line information available. This paper describes the text classification process. Of course, a single article cannot be a complete review of the text classification domain. Despite this, we hope that the references cited cover the major theoretical issues and guide the researcher to interesting research directions.
- {1} Bao Y. and Ishii N., "Combining Multiple kNN Classifiers for Text Categorization by Reducts", LNCS 2534, 2002, pp. 340-347. Google ScholarDigital Library
- {2} Bi Y., Bell D., Wang H., Guo G., Greer K., "Combining Multiple Classifiers Using Dempster's Rule of Combination for Text Categorization", MDAI, 2004, 127-138.Google Scholar
- {3} Brank J., Grobelnik M., Milic-Frayling N., Mladenic D., "Interaction of Feature Selection Methods and Linear Classification Models", Proc. of the 19th International Conference on Machine Learning, Australia, 2002.Google Scholar
- {4} Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P., "SMOTE: Synthetic Minority Over-sampling Technique," Journal of AI Research, 16 2002, pp. 321-357. Google ScholarDigital Library
- {5} Forman, G., An Experimental Study of Feature Selection Metrics for Text Categorization. Journal of Machine Learning Research, 3 2003, pp. 1289-1305. Google ScholarDigital Library
- {6} Fragoudis D., Meretakis D., Likothanassis S., "Integrating Feature and Instance Selection for Text Classification", SIGKDD '02, July 23-26, 2002, Edmonton, Alberta, Canada. Google ScholarDigital Library
- {7} Guan J., Zhou S., "Pruning Training Corpus to Speedup Text Classification", DEXA 2002, pp. 831-840. Google ScholarDigital Library
- {8} D. E. Johnson, F. J. Oles, T. Zhang, T. Goetz, "A decision-tree-based symbolic rule induction system for text categorization", IBM Systems Journal, September 2002. Google ScholarDigital Library
- {9} Han X., Zu G., Ohyama W., Wakabayashi T., Kimura F., Accuracy Improvement of Automatic Text Classification Based on Feature Transformation and Multiclassifier Combination, LNCS, Volume 3309, Jan 2004, pp. 463-468.Google Scholar
- {10} Ke H., Shaoping M., "Text categorization based on Concept indexing and principal component analysis", Proc. TENCON 2002 Conference on Computers, Communications, Control and Power Engineering, 2002, pp. 51-56.Google Scholar
- {11} Kehagias A., Petridis V., Kaburlasos V., Fragkou P., "A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms", JIIS, Volume 21, Issue 3, 2003, pp. 227-247. Google ScholarDigital Library
- {12} Kim S. B., Rim H. C., Yook D. S. and Lim H. S., "Effective Methods for Improving Naive Bayes Text Classifiers", LNAI 2417, 2002, pp. 414-423. Google ScholarDigital Library
- {13} Klopotek M. and Woch M., "Very Large Bayesian Networks in Text Classification", ICCS 2003, LNCS 2657, 2003, pp. 397-406. Google ScholarDigital Library
- {14} Leopold, Edda & Kindermann, Jöörg, "Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?", Machine Learning 46, 2002, pp. 423-444. Google ScholarDigital Library
- {15} Lewis D., Yang Y., Rose T., Li F., "RCV1: A New Benchmark Collection for Text Categorization Research", Journal of Machine Learning Research 5, 2004, pp. 361-397. Google ScholarDigital Library
- {16} Heui Lim, Improving kNN Based Text Classification with Well Estimated Parameters, LNCS, Vol. 3316, Oct 2004, Pages 516-523.Google Scholar
- {17} Madsen R. E., Sigurdsson S., Hansen L. K. and Lansen J., "Pruning the Vocabulary for Better Context Recognition", 7th International Conference on Pattern Recognition, 2004. Google ScholarDigital Library
- {18} Montanes E., Quevedo J. R. and Diaz I., "A Wrapper Approach with Support Vector Machines for Text Categorization", LNCS 2686, 2003, pp. 230-237. Google ScholarDigital Library
- {19} Nardiello P., Sebastiani F., Sperduti A., "Discretizing Continuous Attributes in AdaBoost for Text Categorization", LNCS, Volume 2633, Jan 2003, pp. 320-334. Google ScholarDigital Library
- {20} Qiang W., XiaoLong W., Yi G., "A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization", LNCS, Volume 3248, Jan 2005, pp. 606-615. Google ScholarDigital Library
- {21} Schneider, K., Techniques for Improving the Performance of Naive Bayes for Text Classification, LNCS, Vol. 3406, 2005, 682-693. Google ScholarDigital Library
- {22} Sebastiani F., "Machine Learning in Automated Text Categorization", ACM Computing Surveys, vol. 34 (1), 2002, pp. 1-47. Google ScholarDigital Library
- {23} Shanahan J. and Roma N., Improving SVM Text Classification Performance through Threshold Adjustment, LNAI 2837, 2003, 361-372.Google Scholar
- {24} Soucy P. and Mineau G., "Feature Selection Strategies for Text Categorization", AI 2003, LNAI 2671, 2003, pp. 505-509. Google ScholarDigital Library
- {25} Sousa P., Pimentao J. P., Santos B. R. and Moura-Pires F., "Feature Selection Algorithms to Improve Documents Classification Performance", LNAI 2663, 2003, pp. 288-296. Google ScholarDigital Library
- {26} Torkkola K., "Discriminative Features for Text Document Classification", Proc. International Conference on Pattern Recognition, Canada, 2002. Google ScholarDigital Library
- {27} Vinciarelli A., "Noisy Text Categorization, Pattern Recognition", 17th International Conference on (ICPR'04), 2004, pp. 554-557. Google ScholarDigital Library
- {28} Y. Yang, J. Zhang and B. Kisiel., "A scalability analysis of classifiers in text categorization", ACM SIGIR'03, 2003, pp 96-103. Google ScholarDigital Library
- {29} Zu G., Ohyama W., Wakabayashi T., Kimura F., "Accuracy improvement of automatic text classification based on feature transformation": Proc: the 2003 ACM Symposium on Document Engineering, November 20-22, 2003, pp. 118-120. Google ScholarDigital Library
Index Terms
- Text classification: a recent overview
Recommendations
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Urdu text classification
FIT '09: Proceedings of the 7th International Conference on Frontiers of Information TechnologyThis paper compares statistical techniques for text classification using Naïve Bayes and Support Vector Machines, in context of Urdu language. A large corpus is used for training and testing purpose of the classifiers. However, those classifiers cannot ...
Increasing the Accuracy of Discriminative of Multinomial Bayesian Classifier in Text Classification
ICCIT '09: Proceedings of the 2009 Fourth International Conference on Computer Sciences and Convergence Information TechnologyText Classification plays an important role in information extraction and summarization, text retrieval, and question-answering. The Discriminative Multinomial Naive Bayes classifier has been a focus of research in the field of text classification. This ...
Comments