research-article

Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference

Authors:
Mohamed Nabeel

Qatar Computing Research Institute, Doha, Qatar

Qatar Computing Research Institute, Doha, Qatar
View Profile

,
Issa M. Khalil

Qatar Computing Research Institute, Doha, Qatar

Qatar Computing Research Institute, Doha, Qatar
View Profile

,
Bei Guan

Collaborative Innovation Center, Chinese Academy of Sciences, Beijing, China

Collaborative Innovation Center, Chinese Academy of Sciences, Beijing, China
View Profile

,
Ting Yu

Qatar Computing Research Institute, Doha, Qatar

Qatar Computing Research Institute, Doha, Qatar
View Profile

Authors Info & Claims

ACM Transactions on Privacy and Security Volume 23 Issue 4Article No.: 17pp 1–36https://doi.org/10.1145/3401897

Published:06 July 2020Publication History

ACM Transactions on Privacy and Security

Abstract

Malicious domains, including phishing websites, spam servers, and command and control servers, are the reason for many of the cyber attacks nowadays. Thus, detecting them in a timely manner is important to not only identify cyber attacks but also take preventive measures. There has been a plethora of techniques proposed to detect malicious domains by analyzing Domain Name System (DNS) traffic data. Traditionally, DNS acts as an Internet miscreant’s best friend, but we observe that the subtle traces in DNS logs left by such miscreants can be used against them to detect malicious domains. Our approach is to build a set of domain graphs by connecting “related” domains together and injecting known malicious and benign domains into these graphs so that we can make inferences about the other domains in the domain graphs. A key challenge in building these graphs is how to accurately identify related domains so that incorrect associations are minimized and the number of domains connected from the dataset is maximized. Based on our observations, we first train two classifiers and then devise a set of association rules that assist in linking domains together. We perform an in-depth empirical analysis of the graphs built using these association rules on passive DNS data and show that our techniques can detect many more malicious domains than the state-of-the-art.

References

2016. RFC 7858: Specification for DNS over Transport Layer Security (TLS). Retrieved on Feb. 17, 2019 from https://tools.ietf.org/html/rfcGoogle Scholar
2018. RFC 8484: DNS Queries over HTTPS. Retrieved on Feb. 17, 2019 from https://tools.ietf.org/html/rfc8484.Google Scholar
2019a. AWS Public IP Ranges. Retrieved Feb. 17, 2019 from https://ip-ranges.amazonaws.com/ip-ranges.json.Google Scholar
2019a. CDN Planet CDN List. Retrieved Feb. 25, 2019 from https://www.cdnplanet.com/cdns/.Google Scholar
2019. Comodo Free SSL Certificate. Retrieved from https://www.comodo.com/e-commerce/ssl-certificates/free-ssl-certificate.php.Google Scholar
2019a. DNS Lookup Dynamic DNS List. Retrieved Feb. 25, 2019 from https://dnslookup.me/dynamic-dns/.Google Scholar
2019a. Google Public IP API. Retrieved Feb. 17, 2019 from https://github.com/bcoe/gce-ips/blob/master/index.js.Google Scholar
2019. Google Safe Browsing: Making the world’s information safely accessible. Retrieved February 2019 from https://safebrowsing.google.com.Google Scholar
2019. Microsoft Azure Public IP Ranges. Retrieved Feb. 17, 2019 from https://github.com/bcoe/which-cloud/blob/master/data/PublicIPs.xml.Google Scholar
2019b. Neu5ron Dynamic DNS List. Retrieved Feb. 25, 2019 from https://gist.github.com/neu5ron/860c158180e01b61a524.Google Scholar
2019. Public Suffix List. Retrieved Feb. 10, 2019 from https://publicsuffix.org/.Google Scholar
2019. scikit-learn. Retrieved from Feb. 10, 2019. http://scikit-learn.org/.Google Scholar
2019b. Team AWS. Retrieved Feb. 17, 2019 from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html/.Google Scholar
2019c. Team AWS. Retrieved Feb. 17, 2019 from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html.Google Scholar
2019b. Team Google. Retrieved Feb. 17, 2019 from https://cloud.google.com/compute/docs/ip-addresses/ephemeraladdress.Google Scholar
2019. Which-Cloud Tool. Retrieved Feb. 17, 2019 from https://github.com/bcoe/which-cloud.Google Scholar
2019. WHOIS Records. Retrieved Feb. 10, 2019 from https://whois.icann.org/.Google Scholar
2019b. WPO Foundation CDN List. Retrieved Feb. 25, 2019 from https://github.com/WPO-Foundation/webpagetest/blob/master/agent/wpthook/cdn.h.Google Scholar
2020. McAfee Site Advisor. Retrieved March 2020 from https://www.mcafee.com/siteadvisor.Google Scholar
Alexa. 2019. Alexa Top Sites. Retrieved Feb. 28, 2019 from http://aws.amazon.com/alexa-top-sites/.Google Scholar
S. Alrwais, X. Liao, X. Mi, P. Wang, X. Wang, F. Qian, R. Beyah, and D. McCoy. 2017. Under the shadow of sunshine: Understanding and detecting bulletproof hosting on legitimate service provider networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 805--823.Google Scholar
Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. 13--21.Google ScholarDigital Library
Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. 2010. Building a dynamic reputation system for DNS. In Proceedings of the 19th USENIX Conference on Security. 273--290.Google Scholar
Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, and David Dagon. 2011. Detecting malware domains at the upper DNS hierarchy. In Proceedings of the 20th USENIX Conference on Security. USENIX Association, 27--42.Google Scholar
Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, Saeed Abu-Nimeh, Wenke Lee, and David Dagon. 2012. From throw-away traffic to bots: Detecting the rise of DGA-based malware. In Proceedings of the 21st USENIX Conference on Security Symposium. 24--24.Google Scholar
Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. 2014. Exposure: A passive DNS analysis service to detect and report malicious domains. ACM Transactions on Information and System Security 16, 4 (Apr. 2014), 14:1–14:28.Google ScholarDigital Library
Z. Chen, C. Ji, and P. Barford. 2008. Spatial-temporal characteristics of internet malicious sources. In Proceedings of the 27th Conference on Computer Communications, INFOCOMM’08.Google Scholar
Farsight Security, Inc. 2019. DNS Database. Retrieved Feb. 28, 2019 from https://www.dnsdb.info/.Google Scholar
Kensuke Fukuda and John Heidemann. 2015. Detecting malicious activity with DNS backscatter. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference. 197--210.Google ScholarDigital Library
H. Gao, V. Yegneswaran, J. Jiang, Y. Chen, P. Porras, S. Ghosh, and H. Duan. 2016. Reexamining DNS from a global recursive resolver perspective. IEEE/ACM Transactions on Networking 24, 1 (Feb. 2016), 43--57.Google ScholarDigital Library
Ching-Hsiang Hsu, Chun-Ying Huang, and Kuan-Ta Chen. 2010. Fast-flux bot detection in real time. In Proceedings of the 13th International Conference on Recent Advances in Intrusion Detection. 464--483.Google ScholarDigital Library
L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova, and G. Vigna. 2012. EvilSeed: A guided approach to finding malicious web pages. In 2012 IEEE Symposium on Security and Privacy. 428--442. DOI:https://doi.org/10.1109/SP.2012.33Google ScholarDigital Library
Lee J. and Lee H. 2014. GMAD: Graph-based malware activity detection by DNS traffic analysis. Computer Communications 49 (2014), 33--47.Google ScholarDigital Library
Nan Jiang, Jin Cao, Yu Jin, Li Erran Li, and Zhi-Li Zhang. 2010. Identifying suspicious activities through DNS failure graph analysis. In Proceedings of the 18th IEEE International Conference on Network Protocols. IEEE Computer Society, 144--153.Google ScholarDigital Library
Issa M. Khalil, Bei Guan, Mohamed Nabeel, and Ting Yu. 2018. A domain is only as good as its buddies: Detecting stealthy malicious domains via graph inference. In Proceedings of the 8th ACM Conference on Data and Application Security and Privacy (CODASPY’18). ACM, New York, NY, 330--341. DOI:https://doi.org/10.1145/3176258.3176329Google ScholarDigital Library
Issa M. Khalil, Ting Yu, and Bei Guan. 2016. Discovering malicious domains through passive DNS data graph analysis. In Proceedings of the 11th ACM Symposium on Information, Computer and Communications Security. 663--674.Google ScholarDigital Library
Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, Xiaoyong Zhou, and XiaoFeng Wang. 2009. Effective and efficient malware detection at the end host. In Proceedings of the 18th Conference on USENIX Security Symposium. USENIX Association, 351--366.Google Scholar
Maria Konte, Roberto Perdisci, and Nick Feamster. 2015. ASwatch: An as reputation system to expose bulletproof hosting ases. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. ACM, 625--638.Google ScholarDigital Library
Platon Kotzias, Srdjan Matic, Richard Rivera, and Juan Caballero. 2015. Certified PUP: Abuse in authenticode code signing. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 465--478.Google ScholarDigital Library
Athanasios Kountouras, Panagiotis Kintis, Charles Lever, Yizheng Chen, Yacin Nadji, David Dagon, and Manos Antonakakis. 2016. Enabling network security through active DNS datasets. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses. 188--208.Google ScholarCross Ref
C. Leistner, A. Saffari, J. Santner, and H. Bischof. 2009. Semi-supervised random forests. In Proceedings of the IEEE 12th International Conference on Computer Vision. 506--513.Google Scholar
C. Lever, P. Kotzias, D. Balzarotti, J. Caballero, and M. Antonakakis. 2017. A lustrum of malware network communication: Evolution and insights. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 788--804.Google Scholar
Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M. Voelker, and Lawrence K. Saul. 2015. Who is .Com?: Learning to parse WHOIS records. In Proceedings of the 2015 Internet Measurement Conference. ACM, 369--380.Google Scholar
Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, and William Horne. 2014. Detecting malicious domains via graph inference. In Proceedings of the 19th European Symposium on Research in Computer Security, Mirosław Kutyłowski and Jaideep Vaidya (Eds.). Springer International Publishing, Cham, 1--18.Google ScholarDigital Library
Antonio Nappa, M. Zubair Rafique, and Juan Caballero. 2013. Driving in the cloud: An analysis of drive-by download operations and abuse reporting. In Proceedings of the 10th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer-Verlag, 1--20.Google ScholarDigital Library
OpenDNS. [n.d.]. PhishTank. Retrieved Feb. 16, 2019 from https://www.phishtank.com/.Google Scholar
Judea Pearl. 1982. Reverend Bayes on inference engines: A distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence.Google Scholar
Chengwei Peng, Xiaochun Yun, Yongzheng Zhang, and Shuhao Li. 2018. MalShoot: Shooting malicious domains through graph embedding on passive DNS data. In Collaborative Computing: Networking, Applications and Worksharing - Proceedings of the14th EAI International Conference, CollaborateCom 2018. 488--503.Google Scholar
Chengwei Peng, Xiaochun Yun, Yongzheng Zhang, Shuhao Li, and Jun Xiao. 2017. Discovering malicious domains through alias-canonical graph. In 2017 IEEE Trustcom/BigDataSE/ICESS. 225--232. DOI:https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.241Google Scholar
B. Rahbarinia, R. Perdisci, and M. Antonakakis. 2015. Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks. In Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 403--414.Google Scholar
Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. 2011. Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19, 4 (Dec. 2011), 639--668.Google ScholarCross Ref
Christian Rossow, Christian Dietrich, and Herbert Bos. 2013. Large-scale analysis of malware downloaders. In Proceedings of the 9th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer-Verlag, 42--61.Google ScholarDigital Library
M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. 2001. Survey and taxonomy of IP address lookup algorithms. Magazine of Global Internetworking 15, 2 (March 2001), 8--23.Google Scholar
Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A long way to the top: Significance, structure, and stability of internet top lists. In IMC.Google Scholar
Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. 2016. Satellite: Joint analysis of CDNs and network-level interference. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference. USENIX Association, 195--208.Google Scholar
Matija Stevanovic, Jens Myrup Pedersen, Alessandro D’Alconzo, and Stefan Ruehrup. 2017. A method for identifying compromised clients based on DNS traffic analysis. International Journal of Information Security 16, 2 (2017), 115--132.Google ScholarDigital Library
Elizabeth Stinson and John C. Mitchell. 2008. Towards systematic evaluation of the evadability of bot/botnet detection methods. In Proceedings of the 2nd Conference on USENIX Workshop on Offensive Technologies. 5:1–5:9.Google Scholar
Brett Stone-Gross, Christopher Kruegel, Kevin Almeroth, Andreas Moser, and Engin Kirda. 2009. FIRE: FInding rogue networks. In Proceedings of the 2009 Annual Computer Security Applications Conference. IEEE Computer Society, 231--240.Google ScholarDigital Library
Xiaoqing Sun, Mingkai Tong, Jiahai Yang, Liu Xinran, and Liu Heng. 2019. HinDom: A robust malicious domain detection system based on heterogeneous information network with transductive classification. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID’19). USENIX Association, 399--412. https://www.usenix.org/conference/raid2019/presentation/sun.Google Scholar
Acar Tamersoy, Kevin Roundy, and Duen Horng Chau. 2014. Guilt by association: Large scale malware detection by mining file-relation graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1524--1533.Google ScholarDigital Library
Kurt Thomas, Elie Bursztein, Chris Grier, Grant Ho, Nav Jagpal, Alexandros Kapravelos, Damon Mccoy, Antonio Nappa, Vern Paxson, Paul Pearce, Niels Provos, and Moheeb Abu Rajab. 2015. Ad injection at scale: Assessing deceptive advertisement modifications. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE Computer Society, 151--167.Google ScholarDigital Library
Ke Tian, Steve T. K. Jan, Hang Hu, Danfeng Yao, and Gang Wang. 2018. Needle in a haystack: Tracking down elite phishing domains in the wild. In Proceedings of the Internet Measurement Conference 2018 (IMC’18). 429--442.Google ScholarDigital Library
Van Tong and Giang Nguyen. 2016. A method for detecting DGA botnet based on semantic and cluster analysis. In Proceedings of the 7th Symposium on Information and Communication Technology. 272--277.Google ScholarDigital Library
VirusTotal, Subsidiary of Google. 2019. Retrieved Feb. 28, 2019 from VirusTotal – Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/.Google Scholar
Liang Wang, Antonio Nappa, Juan Caballero, Thomas Ristenpart, and Aditya Akella. 2014. WhoWas: A platform for measuring web deployments on iaas clouds. In Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, 101--114.Google ScholarDigital Library
Florian Weimer. 2005. Passive DNS replication. In FIRST Conference on Computer Security Incident. 98.Google Scholar
Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, and Ted Wobber. 2007. How dynamic are IP addresses?. In Proceedings of the Special Interest Group on Data Communication (SIGCOMM’07). ACM, 301--312. http://dblp.uni-trier.de/db/conf/sigcomm/sigcomm2007.html#XieYAGGW07.Google ScholarDigital Library
Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2003. Exploring artificial intelligence in the new millennium. Morgan Kaufmann Publishers Inc., 239--269.Google Scholar
Jialong Zhang, Sabyasachi Saha, Guofei Gu, Sung-Ju Lee, and Marco Mellia. 2015. Systematic mining of associated server herds for malware campaign discovery. In Proceedings of the 35th IEEE International Conference on Distributed Computing Systems. 630--641.Google ScholarCross Ref
Yury Zhauniarovich, Issa Khalil, Ting Yu, and Marc Dacier. 2018. A survey on malicious domains detection through DNS data analysis. ACM Computing Surveys 51, 4, Article 67 (July 2018), 36 pages. DOI:https://doi.org/10.1145/3191329Google Scholar
Futai Zou, Siyu Zhang, Weixiong Rao, and Ping Yi. 2015. Detecting malware based on DNS graph mining. International Journal of Distributed Sensor Networks 2015 (2015).Google ScholarDigital Library

Index Terms

Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference
1. Networks
  1. Network properties
    1. Network security
      1. Web protocol security
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation
    2. Social engineering attacks
      1. Phishing

Recommendations

Discovering Malicious Domains through Passive DNS Data Graph Analysis
ASIA CCS '16: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security

Malicious domains are key components to a variety of cyber attacks. Several recent techniques are proposed to identify malicious domains through analysis of DNS data. The general approach is to build classifiers based on DNS-related local domain ...
Read More
Using Passive DNS to Detect Malicious Domain Name
ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing

With the prosperity of the Internet, the number of malicious domain name is enormous, and the scope and harm of the threats they create are increasing. Using traditional reputation systems and reverse engineering methods to detect malicious domain name ...
Read More
Detecting Malicious Domains via Graph Inference
Computer Security - ESORICS 2014
Abstract
Enterprises routinely collect terabytes of security relevant data, e.g., network logs and application logs, for several reasons such as cheaper storage, forensic analysis, and regulatory compliance. Analyzing these big data sets to identify ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Privacy and Security Volume 23, Issue 4
November 2020
196 pages
ISSN:2471-2566
EISSN:2471-2574
DOI:10.1145/3409662
Editor:
David Basin
ETH Zurich, Switzerland
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 July 2020
- Accepted: 1 May 2020
- Revised: 1 March 2020
- Received: 1 March 2019
Published in tops Volume 23, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Malicious domains
domain association
graph inference
passive DNS
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 553
  Total Downloads
- Downloads (Last 12 months)78
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference

ACM Transactions on Privacy and Security

Abstract

References

Cited By

Index Terms

Recommendations

Discovering Malicious Domains through Passive DNS Data Graph Analysis

Using Passive DNS to Detect Malicious Domain Name

Detecting Malicious Domains via Graph Inference