research-article

Prophiler: a fast filter for the large-scale detection of malicious web pages

Authors:
Davide Canali

Institute Eurecom, Sophia Antipolis, France

Institute Eurecom, Sophia Antipolis, France
View Profile

,
Marco Cova

University of Birmingham, Birmingham, United Kingdom

University of Birmingham, Birmingham, United Kingdom
View Profile

,
Giovanni Vigna

University of California, Santa Barbara, Santa Barbara, CA, USA

University of California, Santa Barbara, Santa Barbara, CA, USA
View Profile

,
Christopher Kruegel

University of California, Santa Barbara, Santa Barbara, CA, USA

University of California, Santa Barbara, Santa Barbara, CA, USA
View Profile

WWW '11: Proceedings of the 20th international conference on World wide webMarch 2011Pages 197–206https://doi.org/10.1145/1963405.1963436

Published:28 March 2011Publication History

WWW '11: Proceedings of the 20th international conference on World wide web

Pages 197–206

ABSTRACT

Malicious web pages that host drive-by-download exploits have become a popular means for compromising hosts on the Internet and, subsequently, for creating large-scale botnets. In a drive-by-download exploit, an attacker embeds a malicious script (typically written in JavaScript) into a web page. When a victim visits this page, the script is executed and attempts to compromise the browser or one of its plugins. To detect drive-by-download exploits, researchers have developed a number of systems that analyze web pages for the presence of malicious code. Most of these systems use dynamic analysis. That is, they run the scripts associated with a web page either directly in a real browser (running in a virtualized environment) or in an emulated browser, and they monitor the scripts' executions for malicious activity. While the tools are quite precise, the analysis process is costly, often requiring in the order of tens of seconds for a single page. Therefore, performing this analysis on a large set of web pages containing hundreds of millions of samples can be prohibitive.

One approach to reduce the resources required for performing large-scale analysis of malicious web pages is to develop a fast and reliable filter that can quickly discard pages that are benign, forwarding to the costly analysis tools only the pages that are likely to contain malicious code. In this paper, we describe the design and implementation of such a filter. Our filter, called Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. This analysis takes into account features derived from the HTML contents of a page, from the associated JavaScript code, and from the corresponding URL. We automatically derive detection models that use these features using machine-learning techniques applied to labeled datasets.

To demonstrate the effectiveness and efficiency of Prophiler, we crawled and collected millions of pages, which we analyzed for malicious behavior. Our results show that our filter is able to reduce the load on a more costly dynamic analysis tools by more than 85%, with a negligible amount of missed malicious pages.

References

Alexa.com. Alexa Top Global Sites. http://www.alexa.com/topsites/.Google Scholar
Clam AntiVirus. http://www.clamav.net/, 2010.Google Scholar
A. Clark and M. Guillemot. CyberNeko HTML Parser. http://nekohtml.sourceforge.net/.Google Scholar
M. Cova, C. Kruegel, and G. Vigna. Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code. In Proceedings of the International World Wide Web Conference (WWW), 2010. Google ScholarDigital Library
B. Feinstein and D. Peck. Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious JavaScript. In Proceedings of the Black Hat Security Conference, 2007.Google Scholar
S. Garera, N. Provos, M. Chew, and A. D. Rubin. A Framework for Detection and Measurement of Phishing Attacks. In Proceedings of the Workshop on Rapid Malcode (WORM), 2007. Google ScholarDigital Library
D. Goodin. SQL injection taints BusinessWeek.com. http://www.theregister.co.uk/2008/09/16/businessweek_hacked/, September 2008.Google Scholar
D. Goodin. Potent malware link infects almost 300,000 webpages. http://www.theregister.co.uk/2009/12/10/mass_web_attack/, December 2010.Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1):10--18. Google ScholarDigital Library
Heritrix. http://crawler.archive.org/.Google Scholar
M. Hines. Malware SEO: Gaming Google Trends and Big Bird. http://securitywatch.eweek.com/seo/malware_seo_gaming_google_trends_and_big_bird.html, November 2009.Google Scholar
W. Hobson. Cyber-criminals use SEO on topical trends. http://www.vertical-leap.co.uk/news/cybercriminals-use-seo-on-topical-trends/, February 2010.Google Scholar
HoneyClient Project Team. HoneyClient. http://www.honeyclient.org/, 2010.Google Scholar
A. Ikinci, T. Holz, and F. Freiling. Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients. In Proceedings of Sicherheit, Schutz und Zuverlässigkeit, 2008.Google Scholar
JSUnpack. http://jsunpack.jeek.org, 2010.Google Scholar
P. Likarish, E. Jung, and I. Jo. Obfuscated Malicious Javascript Detection using Classification Techniques. In Proceedings of the Conference on Malicious and Unwanted Software (Malware), 2009.Google ScholarCross Ref
J. Ma, L. Saul, S. Savage, and G. Voelker. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2009. Google ScholarDigital Library
A. Moshchuk, T. Bragin, S. Gribble, and H. Levy. A Crawler-based Study of Spyware in the Web. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2006.Google Scholar
Mozilla Foundation. Rhino: JavaScript for Java. http://www.mozilla.org/rhino/.Google Scholar
J. Nazario. PhoneyC: A Virtual Client Honeypot. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET), 2009. Google ScholarDigital Library
D. Oswald. HTMLParser. http://htmlparser.sourceforge.net/.Google Scholar
N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose. All Your iFrames Point to Us. In Proceedings of the USENIX Security Symposium, 2008. Google ScholarDigital Library
P. Ratanaworabhan, B. Livshits, B., and Zorn. Nozzle: a defense against heap-spraying code injection attacks. In Proceedings of the USENIX Security Symposium, 2009. Google ScholarDigital Library
K. Rieck, T. Krueger, and A. Dewald. CUJO: Efficient Detection and Prevention of Drive-by-Download Attacks. In Proceedings of the Annual Computer Security Applications Conference (ACSAC), 2010. Google ScholarDigital Library
C. Seifert and R. Steenson. Capture-HPC. https://projects.honeynet.org/capture-hpc, 2008.Google Scholar
C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages Through Analysis of Underlying DNS and Web Server Relationships. In Proceedings of the LCN Workshop on Network Security (WNS), 2008.Google ScholarCross Ref
C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages with Static Heuristics. In Proceedings of the Australasian Telecommunication Networks and Applications Conference (ATNAC), 2008.Google ScholarCross Ref
R. Sommer and V. Paxson. Outside the Closed World: On Using Machine Learning For Network Intrusion Detection. In Proceedings of the IEEE Symposium on Security and Privacy, 2010. Google ScholarDigital Library
B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna. Your Botnet is My Botnet: Analysis of a Botnet Takeover. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2009. Google ScholarDigital Library
Y.-M. Wang, D. Beck, X. Jiang, R. Roussev, C. Verbowski, S. Chen, and S. King. Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites that Exploit Browser Vulnerabilities. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2006.Google Scholar

Recommendations

ZDVUE: prioritization of javascript attacks to discover new vulnerabilities
AISec '11: Proceedings of the 4th ACM workshop on Security and artificial intelligence

Malware writers are constantly looking for new vulnerabilities to exploit in popular software applications. A successful exploit of a previously unknown vulnerability, that evades state-of-the art anti-virus and intrusion-detection systems is called a ...
Read More
WormTerminator: an effective containment of unknown and polymorphic fast spreading worms
ANCS '06: Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems

The fast spreading worm is becoming one of the most serious threats to today's networked information systems. A fast spreading worm could infect hundreds of thousands of hosts within a few minutes. In order to stop a fast spreading worm, we need the ...
Read More
Detecting, validating and characterizing computer infections in the wild
IMC '11: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference

Although network intrusion detection systems (IDSs) have been studied for several years, their operators are still overwhelmed by a large number of false-positive alerts. In this work we study the following problem: from a large archive of intrusion ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '11: Proceedings of the 20th international conference on World wide web
March 2011
840 pages
ISBN:9781450306324
DOI:10.1145/1963405
General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
drive-by download exploits
efficient web page filtering
malicious web page analysis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 230
  Total Citations
  View Citations
- 1,727
  Total Downloads
- Downloads (Last 12 months)63
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Prophiler: a fast filter for the large-scale detection of malicious web pages

WWW '11: Proceedings of the 20th international conference on World wide web

ABSTRACT

References

Cited By

Recommendations

ZDVUE: prioritization of javascript attacks to discover new vulnerabilities

WormTerminator: an effective containment of unknown and polymorphic fast spreading worms

Detecting, validating and characterizing computer infections in the wild