ABSTRACT
Weblogs have become a prevalent source of information for people to express themselves. In general, there are two genres of contents in weblogs. The first kind is about the webloggers' personal feelings, thoughts or emotions. We call this kind of weblogs affective articles. The second kind of weblogs is about technologies and different kinds of informative news. In this paper, we present a machine learning method for classifying informative and affective articles among weblogs. We consider this problem as a binary classification problem. By using machine learning approaches, we achieve about 92% on information retrieval performance measures including precision, recall and F1. We set up three studies on the applications of above classification approach in both research and industrial fields. The above classification approach is used to improve the performance of classification of emotions from weblog articles. We also develop an intent-driven weblog-search engine based on the classification techniques to improve the satisfaction of Web users. Finally, our approach is applied to search for weblogs with a great deal of informative articles.
- J. Bar-llan. An Outsider's View on "Topic-oriented" Blogging. In Proceedings of the Alt. Papers Track of the 13th International Conference on World Wide Web, papers 28--34, May, 2004 Google ScholarDigital Library
- R. Bruce, and J. Wiebe, Recognizing Subjectivity: A Case Study of Manual Tagging. Natural Language Engineering, 2000. Google ScholarDigital Library
- K. T. Durant and M. D. Smith. Mining Sentiment Classification from Political Web Logs. In Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (WebKDD-2006). August, 2006.Google Scholar
- N. Glance, M. Hurst, and T. Tornkiyo. Blogpulse: Automated Trend Discovery for Weblogs. In Proceedings of WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.Google Scholar
- D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information Diffusion Through Blogspace. In Proceedings of the 13th International Conference on World Wide Web, pages 491--501, 2004. Google ScholarDigital Library
- M. Hodder. Live Web Search. http://www2.sims.berkeley.edu/courses/is141/f05/schedule.htmlGoogle Scholar
- http://blogsearch.google.com/Google Scholar
- http://spaces.live.com/Google Scholar
- http://www.china.com.cn.Google Scholar
- http://www.sohu.com/Google Scholar
- http://www.technorati.comGoogle Scholar
- T. Joachims, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, In Proceedings of 14th International Conference on Machine Learning (ICML-97), pages 143--151, 1997. Google ScholarDigital Library
- T. Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevant Features, In Proceedings of the 10th European Conference on Machine Learning (ECML-98), pages 137--142, 1998. Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the Bursty Evolution of Blogspace. In Proceedings of the 12th International Conference on World Wide Web, pages 568--576, 2003. Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure and Evolution of Blogspace. Commun. ACM, 47(12):35--39, 2004. Google ScholarDigital Library
- J.D. Lasica, Weblogs: A New Source of Information. In We've got blog: How weblogs are changing our culture, John Rodzvilla (ed). Perseus Publishing, Cambridge, MA, 2002. Also http://www.ojr.org/ojr/lasica/p1019165278.phpGoogle Scholar
- G. Leshed and J. Kaye. Understanding How Bloggers Feel: Recognizing Affect in Blog Posts. In Proceedings of Conference on Human Factors in Computing System 2006 extended abstracts on Human factors in computing systems, pages 1019--1024, April, 2006. Google ScholarDigital Library
- A. Mccallum and K. Nigam, A Comparison of Event Models for Naive Byaes Text Classification", In Proceedings of AAAI-98 Workshop on "Learning for Text Categorization", pages 41--48, 1998.Google Scholar
- Q. Mei, C. Liu, H. Su, and C. Zhai. A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In Proceedings of the 15th International Conference on World Wide Web, 2006. Google ScholarDigital Library
- G. Mishne. Experiments with Mood Classification in Blog Posts. In Style 2005- 1st Workshop on Stylistic Analysis of Text for Information Access, at SIGIR 2005, 2005.Google Scholar
- Pew Internet and the American Life Project. http://www.pewinternet.org/PPF/r/186/report_display.aspGoogle Scholar
- Pew Internet and the American Life Project. 2005. http://www.pewinternet.org/trends/Internet_Activities_12.05.05.htm.Google Scholar
- Pew Internet and the American Life Project. 2006. http://www.pewinternet.org/trends/Internet_Activities_7.19.06.htmGoogle Scholar
- J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger, Tackling the Poor Assumption of Naive Bayes Text Classifiers, In Proceedings of the 20th International Conference on Machine Learning (ICML-2003), Washington DC, USA, 2003.Google Scholar
- C. J. van Rijsbergen. Information Retrieval. Buterworth, London, 1979, 173--176. Google ScholarDigital Library
- E. Riloff, and J. Wiebe. Learning Extraction Patterns for Subjective Expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-03), 2003. Google ScholarDigital Library
- P. Turney, Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of 40th Meeting of the Association for Computational Linguistics (ACL-02), 2002. Google ScholarDigital Library
- V. Vapnik, Principles of Risk Minimization for Learning Theory, In D.S. Lippman, J.E. Moody, and D.S. Touretzky, editors, Advances in Neural Information Processding Systems, Morgan Kaufmann, pages 831--838, 1992.Google Scholar
- J. Wiebe. Learning Subjective Adjectives from Corpora. In Proceedings of the National Conference on Artificial Intelligence 2000 (AAAI-2000), 2000. Google ScholarDigital Library
- J. Wiebe, R. Bruce, and T. O'Hara. Development and Use of a Gold Standard Data Set for Subjectivity Classification. In Proceedings of 37th Meeting of the Association for Computational Linguistics (ACL-99), 1999. Google ScholarDigital Library
- J. Wiebe, and T. Wilson. Learning to Disambiguate Potentially Subjective Expressions. In Proceedings of the 6th conference on Natural language learning, pages 1--7, 2002. Google ScholarDigital Library
- Y, Yang, and Pedersen, J.O, A Comparative Study on Feature Selection in Text Categorization. In: Proceedings 14th International Conference on Machine Learning (ICML 97), pages 412--420. Google ScholarDigital Library
Index Terms
- Exploring in the weblog space by detecting informative and affective articles
Recommendations
A method for automatically generating the emotional vectors of emoticons using weblog articles
ACACOS'11: Proceedings of the 10th WSEAS international conference on Applied computer and applied computational scienceIn recent years, reputation analysis services using weblogs, message boards, and community web sites have been developed. To improve the accuracy of the reputation analysis, we have to extract emotions or reactions of writers of documents accurately. ...
Weblog success: Exploring the role of technology
Human-computer interaction research in the managemant information systems disciplineWeblogs have recently gained considerable media attention. Leading weblog sites are already attracting millions of visitors. Yet, success in the highly competitive world of weblogs is not easily achieved. This study seeks to explore weblog success from ...
Detecting cyber security threats in weblogs using probabilistic models
PAISI'07: Proceedings of the 2007 Pacific Asia conference on Intelligence and security informaticsOrganizations and governments are becoming vulnerable to a wide variety of security breaches against their information infrastructure. The magnitude of this threat is evident from the increasing rate of cyber attacks against computers and critical ...
Comments