ABSTRACT
Malicious web pages that host drive-by-download exploits have become a popular means for compromising hosts on the Internet and, subsequently, for creating large-scale botnets. In a drive-by-download exploit, an attacker embeds a malicious script (typically written in JavaScript) into a web page. When a victim visits this page, the script is executed and attempts to compromise the browser or one of its plugins. To detect drive-by-download exploits, researchers have developed a number of systems that analyze web pages for the presence of malicious code. Most of these systems use dynamic analysis. That is, they run the scripts associated with a web page either directly in a real browser (running in a virtualized environment) or in an emulated browser, and they monitor the scripts' executions for malicious activity. While the tools are quite precise, the analysis process is costly, often requiring in the order of tens of seconds for a single page. Therefore, performing this analysis on a large set of web pages containing hundreds of millions of samples can be prohibitive.
One approach to reduce the resources required for performing large-scale analysis of malicious web pages is to develop a fast and reliable filter that can quickly discard pages that are benign, forwarding to the costly analysis tools only the pages that are likely to contain malicious code. In this paper, we describe the design and implementation of such a filter. Our filter, called Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. This analysis takes into account features derived from the HTML contents of a page, from the associated JavaScript code, and from the corresponding URL. We automatically derive detection models that use these features using machine-learning techniques applied to labeled datasets.
To demonstrate the effectiveness and efficiency of Prophiler, we crawled and collected millions of pages, which we analyzed for malicious behavior. Our results show that our filter is able to reduce the load on a more costly dynamic analysis tools by more than 85%, with a negligible amount of missed malicious pages.
- Alexa.com. Alexa Top Global Sites. http://www.alexa.com/topsites/.Google Scholar
- Clam AntiVirus. http://www.clamav.net/, 2010.Google Scholar
- A. Clark and M. Guillemot. CyberNeko HTML Parser. http://nekohtml.sourceforge.net/.Google Scholar
- M. Cova, C. Kruegel, and G. Vigna. Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code. In Proceedings of the International World Wide Web Conference (WWW), 2010. Google ScholarDigital Library
- B. Feinstein and D. Peck. Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious JavaScript. In Proceedings of the Black Hat Security Conference, 2007.Google Scholar
- S. Garera, N. Provos, M. Chew, and A. D. Rubin. A Framework for Detection and Measurement of Phishing Attacks. In Proceedings of the Workshop on Rapid Malcode (WORM), 2007. Google ScholarDigital Library
- D. Goodin. SQL injection taints BusinessWeek.com. http://www.theregister.co.uk/2008/09/16/businessweek_hacked/, September 2008.Google Scholar
- D. Goodin. Potent malware link infects almost 300,000 webpages. http://www.theregister.co.uk/2009/12/10/mass_web_attack/, December 2010.Google Scholar
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1):10--18. Google ScholarDigital Library
- Heritrix. http://crawler.archive.org/.Google Scholar
- M. Hines. Malware SEO: Gaming Google Trends and Big Bird. http://securitywatch.eweek.com/seo/malware_seo_gaming_google_trends_and_big_bird.html, November 2009.Google Scholar
- W. Hobson. Cyber-criminals use SEO on topical trends. http://www.vertical-leap.co.uk/news/cybercriminals-use-seo-on-topical-trends/, February 2010.Google Scholar
- HoneyClient Project Team. HoneyClient. http://www.honeyclient.org/, 2010.Google Scholar
- A. Ikinci, T. Holz, and F. Freiling. Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients. In Proceedings of Sicherheit, Schutz und Zuverlässigkeit, 2008.Google Scholar
- JSUnpack. http://jsunpack.jeek.org, 2010.Google Scholar
- P. Likarish, E. Jung, and I. Jo. Obfuscated Malicious Javascript Detection using Classification Techniques. In Proceedings of the Conference on Malicious and Unwanted Software (Malware), 2009.Google ScholarCross Ref
- J. Ma, L. Saul, S. Savage, and G. Voelker. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2009. Google ScholarDigital Library
- A. Moshchuk, T. Bragin, S. Gribble, and H. Levy. A Crawler-based Study of Spyware in the Web. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2006.Google Scholar
- Mozilla Foundation. Rhino: JavaScript for Java. http://www.mozilla.org/rhino/.Google Scholar
- J. Nazario. PhoneyC: A Virtual Client Honeypot. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET), 2009. Google ScholarDigital Library
- D. Oswald. HTMLParser. http://htmlparser.sourceforge.net/.Google Scholar
- N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose. All Your iFrames Point to Us. In Proceedings of the USENIX Security Symposium, 2008. Google ScholarDigital Library
- P. Ratanaworabhan, B. Livshits, B., and Zorn. Nozzle: a defense against heap-spraying code injection attacks. In Proceedings of the USENIX Security Symposium, 2009. Google ScholarDigital Library
- K. Rieck, T. Krueger, and A. Dewald. CUJO: Efficient Detection and Prevention of Drive-by-Download Attacks. In Proceedings of the Annual Computer Security Applications Conference (ACSAC), 2010. Google ScholarDigital Library
- C. Seifert and R. Steenson. Capture-HPC. https://projects.honeynet.org/capture-hpc, 2008.Google Scholar
- C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages Through Analysis of Underlying DNS and Web Server Relationships. In Proceedings of the LCN Workshop on Network Security (WNS), 2008.Google ScholarCross Ref
- C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages with Static Heuristics. In Proceedings of the Australasian Telecommunication Networks and Applications Conference (ATNAC), 2008.Google ScholarCross Ref
- R. Sommer and V. Paxson. Outside the Closed World: On Using Machine Learning For Network Intrusion Detection. In Proceedings of the IEEE Symposium on Security and Privacy, 2010. Google ScholarDigital Library
- B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna. Your Botnet is My Botnet: Analysis of a Botnet Takeover. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2009. Google ScholarDigital Library
- Y.-M. Wang, D. Beck, X. Jiang, R. Roussev, C. Verbowski, S. Chen, and S. King. Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites that Exploit Browser Vulnerabilities. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2006.Google Scholar
Recommendations
ZDVUE: prioritization of javascript attacks to discover new vulnerabilities
AISec '11: Proceedings of the 4th ACM workshop on Security and artificial intelligenceMalware writers are constantly looking for new vulnerabilities to exploit in popular software applications. A successful exploit of a previously unknown vulnerability, that evades state-of-the art anti-virus and intrusion-detection systems is called a ...
WormTerminator: an effective containment of unknown and polymorphic fast spreading worms
ANCS '06: Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systemsThe fast spreading worm is becoming one of the most serious threats to today's networked information systems. A fast spreading worm could infect hundreds of thousands of hosts within a few minutes. In order to stop a fast spreading worm, we need the ...
Detecting, validating and characterizing computer infections in the wild
IMC '11: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conferenceAlthough network intrusion detection systems (IDSs) have been studied for several years, their operators are still overwhelmed by a large number of false-positive alerts. In this work we study the following problem: from a large archive of intrusion ...
Comments