ABSTRACT
This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually examine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various informative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers.
- Wikipedia, Spam blog. http://en.wikipedia.org/wiki/Spam_blog.Google Scholar
- Wikipedia, Word salad (computer science). http://en.wikipedia.org/wiki/Word_salad_%28computer_science%29.Google Scholar
- T. Fukuhara, T. Murayama, and T. Nishida. Analyzing concerns of people using Weblog articles and real world temporal data. In Proceedings of WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.Google Scholar
- T. Fukuhara, H. Nakagawa, and T. Nishida. Understanding sentiment of people from news articles: Temporal sentiment analysis of social events. In Proceedings of ICWSM, pages 271--272, 2007.Google Scholar
- T. Fukuhara, T. Utsuro, and H. Nakagawa. Cross-lingual concern analysis from multilingual weblog articles. In A. Nijholt, O. Stock, and T. Nishida, editors, Proceedings of the 6th International Workshop on Social Intelligence Design, pages 55--64, 2007.Google Scholar
- N. Glance, M. Hurst, and T. Tomokiyo. Blogpulse: Automated trend discovery for Weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.Google Scholar
- Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proc. 1st AIRWeb, pages 39--47, 2005.Google Scholar
- P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog identification and Splog detection. In Proceedings of the 2006 AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, pages 92--99, 2006.Google Scholar
- P. Kolari, T. Finin, and A. Joshi. Spam in blogs and social media. In Tutorial at ICWSM, 2007.Google Scholar
- P. Kolari, A. Joshi, and T. Finin. Characterizing the splogosphere. In Proceedings of WWW 2006 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2006.Google Scholar
- Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Splog detection using self-similarity analysis on blog temporal dynamics. In Proc. 3rd AIRWeb, pages 1--8, 2007. Google ScholarDigital Library
- C. Macdonald and I. Ounis. The TREC Blogs06 collection: Creating and analysing a blog test collection. Technical Report TR-2006-224, University of Glasgow, Department of Computing Science, 2006.Google Scholar
- T. Nanno, T. Fujiki, Y. Suzuki, and M. Okumura. Automatically collecting, monitoring, and mining Japanese weblogs. In WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 320--321. ACM Press, 2004. Google ScholarDigital Library
- Y. Sato, T. Utsuro, T. Fukuhara, Y. Kawada, Y. Murakami, H. Nakagawa, and N. Kando. Collecting and analyzing Japanese splogs based on characteristics of keywords. In Proc. ICWSM, pages 218--219, 2008.Google Scholar
- T. Urvoy, T. Lavergne, and P. Filoche. Tracking Web spam with hidden style similarity. In Proc. 2nd AIRWeb, pages 25--30, 2006.Google Scholar
- Y. Wang, M. Ma, Y. Niu, and H. Chen. Spam double-funnel: Connecting web spammers with advertisers,. In Proc. 16th WWW Conf., pages 291--300, 2007. Google ScholarDigital Library
Index Terms
- Analysing features of Japanese splogs and characteristics of keywords
Recommendations
Detecting spam blogs from blog search results
Blogging has been an emerging media for people to express themselves. However, the presence of spam blogs (also known as splogs) may reduce the value of blogs and blog search engines. Hence, splog detection has recently attracted much attention from ...
Identifying Domain Experts in the Blogosphere -- Ranking Blogs Based on Topic Consistency
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01Current ranking algorithms, such as Page Rank, Technorati authority, and BI-Impact, favor blogs that report on a diversity of topics since those attract a large audience and thus more visitors, links, and comments. On the other side, niche blogs with a ...
Analyzing topological characteristics of the Korean blogosphere
Due to their popularity and widespread use, blogs have become an important medium through which many people communicate and exchange information on the World Wide Web (WWW). The blogosphere has provided many opportunities for individuals and companies ...
Comments