skip to main content
10.1145/2872427.2883008acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Public Access

Characterizing Long-tail SEO Spam on Cloud Web Hosting Services

Published:11 April 2016Publication History

ABSTRACT

The popularity of long-tail search engine optimization (SEO) brings with new security challenges: incidents of long-tail keyword poisoning to lower competition and increase revenue have been reported. The emergence of cloud web hosting services provides a new and effective platform for long-tail SEO spam attacks. There is growing evidence that large-scale long-tail SEO campaigns are being carried out on cloud hosting platforms because they offer low-cost, high-speed hosting services. In this paper, we take the first step toward understanding how long-tail SEO spam is implemented on cloud hosting platforms. After identifying 3,186 cloud directories and 318,470 doorway pages on the leading cloud platforms for long-tail SEO spam, we characterize their abusive behavior. One highlight of our findings is the effectiveness of the cloud-based long-tail SEO spam, with 6% of the doorway pages successfully appearing in the top 10 search results of the poisoned long-tail keywords.

Examples of other important discoveries include how such doorway pages monetize traffic and their ability to manage cloud platform's countermeasures. These findings bring such abuse to the spotlight and provide some insights to eliminating this practice.

References

  1. Sumayah Alrwais, Kan Yuan, Eihal Alowaisheq, Zhou Li, and X Wang. Understanding the dark side of domain parking. In Proceedings of the 23rd USENIX Security Symposium, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Juan Caballero, Chris Grier, Christian Kreibich, and Vern Paxson. Measuring Pay-per-Install: The Commoditization of Malware Distribution. In Proc. 20th USENIX Security Symposium, San Francisco, CA, August 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chris Kanich and Christian Kreibich and Kirill Levchenko and Brandon Enright and Vern Paxson and Geoffrey M. Voelker and Stefan Savage,. Spamalytics: an Empirical Analysis of Spam Marketing Conversion. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), Arlington, VA, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Comm100. Spammy words. http://emailmarketing.comm100.com/email-marketing-ebook/spam-words.aspx, 2015. {Online}.Google ScholarGoogle Scholar
  5. Matthew F Der, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. Knock it off: profiling the online storefronts of counterfeit merchandise. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1759--1768. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Google. Google Trend. http://www.google.com/trends/hottrends, 2014. {Online}.Google ScholarGoogle Scholar
  7. Google. Rich snippets guidelines. https://support.google.com/webmasters/answer/2722261?hl=en, 2014. {Online}.Google ScholarGoogle Scholar
  8. Google. Webmaster Guidelines. https://support.google.com/webmasters/answer/35769?hl=en&ref_topic=6002025, 2014. {Online}.Google ScholarGoogle Scholar
  9. Google. Publish website content. https://developers.google.com/drive/web/publish-site, 2015. {Online}.Google ScholarGoogle Scholar
  10. Chris Grier, Lucas Ballard, Juan Caballero, Neha Chachra, Christian J Dietrich, Kirill Levchenko, Panayiotis Mavrommatis, Damon McCoy, Antonio Nappa, Andreas Pitsillidis, et al. Manufacturing compromise: the emergence of exploit-as-a-service. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 821--832. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dan Gusfield. Efficient algorithms for inferring evolutionary trees. Networks, 21(1):19--28, 1991. Google ScholarGoogle ScholarCross RefCross Ref
  12. Zoltan Gyongyi, Hector Garcia-Molina, and Jan Pedersen. Combating Web Spam with TrustRank. In Proc. 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, September 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xiao Han, Nizar Kheir, and Davide Balzarotti. The role of cloud services in malicious software: Trends and insights. In Detection of Intrusions and Malware, and Vulnerability Assessment, pages 187--204. Springer, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. IMS Health. IMS Health. http://www.imshealth.com/portal/site/imshealth, 2014. {Online}.Google ScholarGoogle Scholar
  15. John P John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy, and Martin Abadi. deSEO: Combating Search-Result Poisoning. In Proc. 20th USENIX Security Symposium, San Francisco, CA, August 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Amy N Langville and Carl D Meyer. Google's PageRank and beyond: The science of search engine rankings. Princeton University Press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Nektarios Leontiadis, Tyler Moore, and Nicolas Christin. Measuring and Analyzing Search-Redirection Attacks in the Illicit Online Prescription Drug Trade. In Proc. 20th USENIX Security Symposium, San Francisco, CA, August 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nektarios Leontiadis, Tyler Moore, and Nicolas Christin. A Nearly Four-Year Longitudinal Study of Search-Engine Poisoning. In Proc. 21st Conference on Computer and Communications Security (CCS), Scottsdale, AZ, October 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kirill Levchenko, Neha Chachra, Brandon Enright, Mark Felegyhazi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Andreas Pitsillidis, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, and Stefan Savage. Click Trajectories: End-to-End Analysis of the Spam Value Chain. In Proc. IEEE Symposium on Security and Privacy, Oakland, CA, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alan A Lew. Long tail tourism: New geographies for marketing niche tourism products. Journal of Travel & Tourism Marketing, 25(3--4):409--419, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  21. Damon McCoy, Andreas Pitsillidis, Grant Jordan, Nicholas Weaver, Christian Kreibich, Brian Krebs, Geoffrey M. Voelker, Stefan Savage, and Kirill Levchenko. PharmaLeaks: Understanding the Business of Online Pharmaceutical Affiliate Programs. In Proc. 21st USENIX Security Symposium, Bellevue, WA, August 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Michael Mitzenmacher and Eli Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tyler Moore, Nektarios Leontiadis, and Nicolas Christin. Fashion crimes: trending-term exploitation on the web. In Proceedings of the 18th ACM conference on Computer and communications security, pages 455--466. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Martin Mulazzani, Sebastian Schrittwieser, Manuel Leithner, and Markus Huber. Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space. In Proc. 20th USENIX Security Symposium, San Francisco, CA, August 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly. Detecting Spam Web Pages through Content Analysis. In Proc. 15th International Wordl Wide Web Conference (WWW), Edinburgh, Scotland, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. John C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, 1998.Google ScholarGoogle Scholar
  27. Reviewopedia. Reviewopedia. http://www.reviewopedia.com/, 2015. {Online}.Google ScholarGoogle Scholar
  28. Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds. In Proc. 21st Conference on Computer and Communications Security (CCS), Chicago, IL, November 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dmitry Samosseiko. The Partnerka -- What Is It, and Why Should You Care? . In Proc. of Virus Bulletin Conference, Geneva, Switzerland, September 2009.Google ScholarGoogle Scholar
  30. Tanguy Urvoy, Emmanuel Chauveau, Pascal Filoche, and Thomas Lavergne. Tracking Web Spam with HTML Style Similarities. ACM Transactions on the Web, 2(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. John Wadleigh, Jake Drew, and Tyler Moore. The e-commerce market for lemons: Identification and analysis of websites selling counterfeit goods. In Proceedings of the 24th International Conference on World Wide Web, pages 1188--1197. International World Wide Web Conferences Steering Committee, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. David Y Wang, Stefan Savage, and Geoffrey M Voelker. Cloak and dagger: dynamics of web search cloaking. In Proceedings of the 18th ACM conference on Computer and communications security, pages 477--490. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Baoning Wu and Brian D. Davison. Identifying Link Farm Spam Pages. In Proc. 14th International Wordl Wide Web Conference (WWW), Chiba, Japan, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yahoo. Yahoo! Content Analysis API. https://developer.yahoo.com/contentanalysis, 2015. {Online}.Google ScholarGoogle Scholar
  35. Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Cross-Tenant Side-Channel Attacks in PaaS Clouds. In Proc. 21st Conference on Computer and Communications Security (CCS), Scottsdale, AZ, October 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characterizing Long-tail SEO Spam on Cloud Web Hosting Services

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '16: Proceedings of the 25th International Conference on World Wide Web
      April 2016
      1482 pages
      ISBN:9781450341431

      Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Published: 11 April 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader