ABSTRACT
The popularity of long-tail search engine optimization (SEO) brings with new security challenges: incidents of long-tail keyword poisoning to lower competition and increase revenue have been reported. The emergence of cloud web hosting services provides a new and effective platform for long-tail SEO spam attacks. There is growing evidence that large-scale long-tail SEO campaigns are being carried out on cloud hosting platforms because they offer low-cost, high-speed hosting services. In this paper, we take the first step toward understanding how long-tail SEO spam is implemented on cloud hosting platforms. After identifying 3,186 cloud directories and 318,470 doorway pages on the leading cloud platforms for long-tail SEO spam, we characterize their abusive behavior. One highlight of our findings is the effectiveness of the cloud-based long-tail SEO spam, with 6% of the doorway pages successfully appearing in the top 10 search results of the poisoned long-tail keywords.
Examples of other important discoveries include how such doorway pages monetize traffic and their ability to manage cloud platform's countermeasures. These findings bring such abuse to the spotlight and provide some insights to eliminating this practice.
- Sumayah Alrwais, Kan Yuan, Eihal Alowaisheq, Zhou Li, and X Wang. Understanding the dark side of domain parking. In Proceedings of the 23rd USENIX Security Symposium, 2014. Google ScholarDigital Library
- Juan Caballero, Chris Grier, Christian Kreibich, and Vern Paxson. Measuring Pay-per-Install: The Commoditization of Malware Distribution. In Proc. 20th USENIX Security Symposium, San Francisco, CA, August 2011. Google ScholarDigital Library
- Chris Kanich and Christian Kreibich and Kirill Levchenko and Brandon Enright and Vern Paxson and Geoffrey M. Voelker and Stefan Savage,. Spamalytics: an Empirical Analysis of Spam Marketing Conversion. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), Arlington, VA, October 2008. Google ScholarDigital Library
- Comm100. Spammy words. http://emailmarketing.comm100.com/email-marketing-ebook/spam-words.aspx, 2015. {Online}.Google Scholar
- Matthew F Der, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. Knock it off: profiling the online storefronts of counterfeit merchandise. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1759--1768. ACM, 2014. Google ScholarDigital Library
- Google. Google Trend. http://www.google.com/trends/hottrends, 2014. {Online}.Google Scholar
- Google. Rich snippets guidelines. https://support.google.com/webmasters/answer/2722261?hl=en, 2014. {Online}.Google Scholar
- Google. Webmaster Guidelines. https://support.google.com/webmasters/answer/35769?hl=en&ref_topic=6002025, 2014. {Online}.Google Scholar
- Google. Publish website content. https://developers.google.com/drive/web/publish-site, 2015. {Online}.Google Scholar
- Chris Grier, Lucas Ballard, Juan Caballero, Neha Chachra, Christian J Dietrich, Kirill Levchenko, Panayiotis Mavrommatis, Damon McCoy, Antonio Nappa, Andreas Pitsillidis, et al. Manufacturing compromise: the emergence of exploit-as-a-service. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 821--832. ACM, 2012. Google ScholarDigital Library
- Dan Gusfield. Efficient algorithms for inferring evolutionary trees. Networks, 21(1):19--28, 1991. Google ScholarCross Ref
- Zoltan Gyongyi, Hector Garcia-Molina, and Jan Pedersen. Combating Web Spam with TrustRank. In Proc. 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, September 2004. Google ScholarDigital Library
- Xiao Han, Nizar Kheir, and Davide Balzarotti. The role of cloud services in malicious software: Trends and insights. In Detection of Intrusions and Malware, and Vulnerability Assessment, pages 187--204. Springer, 2015. Google ScholarDigital Library
- IMS Health. IMS Health. http://www.imshealth.com/portal/site/imshealth, 2014. {Online}.Google Scholar
- John P John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy, and Martin Abadi. deSEO: Combating Search-Result Poisoning. In Proc. 20th USENIX Security Symposium, San Francisco, CA, August 2011. Google ScholarDigital Library
- Amy N Langville and Carl D Meyer. Google's PageRank and beyond: The science of search engine rankings. Princeton University Press, 2011. Google ScholarDigital Library
- Nektarios Leontiadis, Tyler Moore, and Nicolas Christin. Measuring and Analyzing Search-Redirection Attacks in the Illicit Online Prescription Drug Trade. In Proc. 20th USENIX Security Symposium, San Francisco, CA, August 2011. Google ScholarDigital Library
- Nektarios Leontiadis, Tyler Moore, and Nicolas Christin. A Nearly Four-Year Longitudinal Study of Search-Engine Poisoning. In Proc. 21st Conference on Computer and Communications Security (CCS), Scottsdale, AZ, October 2014. Google ScholarDigital Library
- Kirill Levchenko, Neha Chachra, Brandon Enright, Mark Felegyhazi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Andreas Pitsillidis, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, and Stefan Savage. Click Trajectories: End-to-End Analysis of the Spam Value Chain. In Proc. IEEE Symposium on Security and Privacy, Oakland, CA, May 2011. Google ScholarDigital Library
- Alan A Lew. Long tail tourism: New geographies for marketing niche tourism products. Journal of Travel & Tourism Marketing, 25(3--4):409--419, 2008.Google ScholarCross Ref
- Damon McCoy, Andreas Pitsillidis, Grant Jordan, Nicholas Weaver, Christian Kreibich, Brian Krebs, Geoffrey M. Voelker, Stefan Savage, and Kirill Levchenko. PharmaLeaks: Understanding the Business of Online Pharmaceutical Affiliate Programs. In Proc. 21st USENIX Security Symposium, Bellevue, WA, August 2012. Google ScholarDigital Library
- Michael Mitzenmacher and Eli Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005. Google ScholarDigital Library
- Tyler Moore, Nektarios Leontiadis, and Nicolas Christin. Fashion crimes: trending-term exploitation on the web. In Proceedings of the 18th ACM conference on Computer and communications security, pages 455--466. ACM, 2011. Google ScholarDigital Library
- Martin Mulazzani, Sebastian Schrittwieser, Manuel Leithner, and Markus Huber. Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space. In Proc. 20th USENIX Security Symposium, San Francisco, CA, August 2011. Google ScholarDigital Library
- Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly. Detecting Spam Web Pages through Content Analysis. In Proc. 15th International Wordl Wide Web Conference (WWW), Edinburgh, Scotland, May 2006. Google ScholarDigital Library
- John C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, 1998.Google Scholar
- Reviewopedia. Reviewopedia. http://www.reviewopedia.com/, 2015. {Online}.Google Scholar
- Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds. In Proc. 21st Conference on Computer and Communications Security (CCS), Chicago, IL, November 2009. Google ScholarDigital Library
- Dmitry Samosseiko. The Partnerka -- What Is It, and Why Should You Care? . In Proc. of Virus Bulletin Conference, Geneva, Switzerland, September 2009.Google Scholar
- Tanguy Urvoy, Emmanuel Chauveau, Pascal Filoche, and Thomas Lavergne. Tracking Web Spam with HTML Style Similarities. ACM Transactions on the Web, 2(1), 2008. Google ScholarDigital Library
- John Wadleigh, Jake Drew, and Tyler Moore. The e-commerce market for lemons: Identification and analysis of websites selling counterfeit goods. In Proceedings of the 24th International Conference on World Wide Web, pages 1188--1197. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
- David Y Wang, Stefan Savage, and Geoffrey M Voelker. Cloak and dagger: dynamics of web search cloaking. In Proceedings of the 18th ACM conference on Computer and communications security, pages 477--490. ACM, 2011. Google ScholarDigital Library
- Baoning Wu and Brian D. Davison. Identifying Link Farm Spam Pages. In Proc. 14th International Wordl Wide Web Conference (WWW), Chiba, Japan, May 2005. Google ScholarDigital Library
- Yahoo. Yahoo! Content Analysis API. https://developer.yahoo.com/contentanalysis, 2015. {Online}.Google Scholar
- Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Cross-Tenant Side-Channel Attacks in PaaS Clouds. In Proc. 21st Conference on Computer and Communications Security (CCS), Scottsdale, AZ, October 2014. Google ScholarDigital Library
Index Terms
- Characterizing Long-tail SEO Spam on Cloud Web Hosting Services
Recommendations
A framework for data security in cloud using collaborative intrusion detection scheme
SIN '17: Proceedings of the 10th International Conference on Security of Information and NetworksCloud computing offers an on demand, elastic, global network access to a shared pool of resources that can be configured on user demand. The advantages of cloud computing are lucrative for well-established organizations looking to reduce infrastructure ...
Fact checks versus problematic content in search rankings: SEO effects and the question of Google’s content moderation
WEBSCI '24: Proceedings of the 16th ACM Web Science ConferenceThis study investigates the ranking of problematic content and fact-checks of that content in Google Web Search results, examining their competition. The analysis is based on over 825 URLs extracted from Google Search Engine results pages (SERP) using ...
Comments