skip to main content
10.1145/1242572.1242611acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Exploring in the weblog space by detecting informative and affective articles

Authors Info & Claims
Published:08 May 2007Publication History

ABSTRACT

Weblogs have become a prevalent source of information for people to express themselves. In general, there are two genres of contents in weblogs. The first kind is about the webloggers' personal feelings, thoughts or emotions. We call this kind of weblogs affective articles. The second kind of weblogs is about technologies and different kinds of informative news. In this paper, we present a machine learning method for classifying informative and affective articles among weblogs. We consider this problem as a binary classification problem. By using machine learning approaches, we achieve about 92% on information retrieval performance measures including precision, recall and F1. We set up three studies on the applications of above classification approach in both research and industrial fields. The above classification approach is used to improve the performance of classification of emotions from weblog articles. We also develop an intent-driven weblog-search engine based on the classification techniques to improve the satisfaction of Web users. Finally, our approach is applied to search for weblogs with a great deal of informative articles.

References

  1. J. Bar-llan. An Outsider's View on "Topic-oriented" Blogging. In Proceedings of the Alt. Papers Track of the 13th International Conference on World Wide Web, papers 28--34, May, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bruce, and J. Wiebe, Recognizing Subjectivity: A Case Study of Manual Tagging. Natural Language Engineering, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. T. Durant and M. D. Smith. Mining Sentiment Classification from Political Web Logs. In Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (WebKDD-2006). August, 2006.Google ScholarGoogle Scholar
  4. N. Glance, M. Hurst, and T. Tornkiyo. Blogpulse: Automated Trend Discovery for Weblogs. In Proceedings of WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.Google ScholarGoogle Scholar
  5. D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information Diffusion Through Blogspace. In Proceedings of the 13th International Conference on World Wide Web, pages 491--501, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Hodder. Live Web Search. http://www2.sims.berkeley.edu/courses/is141/f05/schedule.htmlGoogle ScholarGoogle Scholar
  7. http://blogsearch.google.com/Google ScholarGoogle Scholar
  8. http://spaces.live.com/Google ScholarGoogle Scholar
  9. http://www.china.com.cn.Google ScholarGoogle Scholar
  10. http://www.sohu.com/Google ScholarGoogle Scholar
  11. http://www.technorati.comGoogle ScholarGoogle Scholar
  12. T. Joachims, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, In Proceedings of 14th International Conference on Machine Learning (ICML-97), pages 143--151, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevant Features, In Proceedings of the 10th European Conference on Machine Learning (ECML-98), pages 137--142, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the Bursty Evolution of Blogspace. In Proceedings of the 12th International Conference on World Wide Web, pages 568--576, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure and Evolution of Blogspace. Commun. ACM, 47(12):35--39, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J.D. Lasica, Weblogs: A New Source of Information. In We've got blog: How weblogs are changing our culture, John Rodzvilla (ed). Perseus Publishing, Cambridge, MA, 2002. Also http://www.ojr.org/ojr/lasica/p1019165278.phpGoogle ScholarGoogle Scholar
  17. G. Leshed and J. Kaye. Understanding How Bloggers Feel: Recognizing Affect in Blog Posts. In Proceedings of Conference on Human Factors in Computing System 2006 extended abstracts on Human factors in computing systems, pages 1019--1024, April, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Mccallum and K. Nigam, A Comparison of Event Models for Naive Byaes Text Classification", In Proceedings of AAAI-98 Workshop on "Learning for Text Categorization", pages 41--48, 1998.Google ScholarGoogle Scholar
  19. Q. Mei, C. Liu, H. Su, and C. Zhai. A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In Proceedings of the 15th International Conference on World Wide Web, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Mishne. Experiments with Mood Classification in Blog Posts. In Style 2005- 1st Workshop on Stylistic Analysis of Text for Information Access, at SIGIR 2005, 2005.Google ScholarGoogle Scholar
  21. Pew Internet and the American Life Project. http://www.pewinternet.org/PPF/r/186/report_display.aspGoogle ScholarGoogle Scholar
  22. Pew Internet and the American Life Project. 2005. http://www.pewinternet.org/trends/Internet_Activities_12.05.05.htm.Google ScholarGoogle Scholar
  23. Pew Internet and the American Life Project. 2006. http://www.pewinternet.org/trends/Internet_Activities_7.19.06.htmGoogle ScholarGoogle Scholar
  24. J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger, Tackling the Poor Assumption of Naive Bayes Text Classifiers, In Proceedings of the 20th International Conference on Machine Learning (ICML-2003), Washington DC, USA, 2003.Google ScholarGoogle Scholar
  25. C. J. van Rijsbergen. Information Retrieval. Buterworth, London, 1979, 173--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Riloff, and J. Wiebe. Learning Extraction Patterns for Subjective Expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-03), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Turney, Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of 40th Meeting of the Association for Computational Linguistics (ACL-02), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Vapnik, Principles of Risk Minimization for Learning Theory, In D.S. Lippman, J.E. Moody, and D.S. Touretzky, editors, Advances in Neural Information Processding Systems, Morgan Kaufmann, pages 831--838, 1992.Google ScholarGoogle Scholar
  29. J. Wiebe. Learning Subjective Adjectives from Corpora. In Proceedings of the National Conference on Artificial Intelligence 2000 (AAAI-2000), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Wiebe, R. Bruce, and T. O'Hara. Development and Use of a Gold Standard Data Set for Subjectivity Classification. In Proceedings of 37th Meeting of the Association for Computational Linguistics (ACL-99), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Wiebe, and T. Wilson. Learning to Disambiguate Potentially Subjective Expressions. In Proceedings of the 6th conference on Natural language learning, pages 1--7, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y, Yang, and Pedersen, J.O, A Comparative Study on Feature Selection in Text Categorization. In: Proceedings 14th International Conference on Machine Learning (ICML 97), pages 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploring in the weblog space by detecting informative and affective articles

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '07: Proceedings of the 16th international conference on World Wide Web
        May 2007
        1382 pages
        ISBN:9781595936547
        DOI:10.1145/1242572

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 May 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader