Abstract
Malicious domains, including phishing websites, spam servers, and command and control servers, are the reason for many of the cyber attacks nowadays. Thus, detecting them in a timely manner is important to not only identify cyber attacks but also take preventive measures. There has been a plethora of techniques proposed to detect malicious domains by analyzing Domain Name System (DNS) traffic data. Traditionally, DNS acts as an Internet miscreant’s best friend, but we observe that the subtle traces in DNS logs left by such miscreants can be used against them to detect malicious domains. Our approach is to build a set of domain graphs by connecting “related” domains together and injecting known malicious and benign domains into these graphs so that we can make inferences about the other domains in the domain graphs. A key challenge in building these graphs is how to accurately identify related domains so that incorrect associations are minimized and the number of domains connected from the dataset is maximized. Based on our observations, we first train two classifiers and then devise a set of association rules that assist in linking domains together. We perform an in-depth empirical analysis of the graphs built using these association rules on passive DNS data and show that our techniques can detect many more malicious domains than the state-of-the-art.
- 2016. RFC 7858: Specification for DNS over Transport Layer Security (TLS). Retrieved on Feb. 17, 2019 from https://tools.ietf.org/html/rfcGoogle Scholar
- 2018. RFC 8484: DNS Queries over HTTPS. Retrieved on Feb. 17, 2019 from https://tools.ietf.org/html/rfc8484.Google Scholar
- 2019a. AWS Public IP Ranges. Retrieved Feb. 17, 2019 from https://ip-ranges.amazonaws.com/ip-ranges.json.Google Scholar
- 2019a. CDN Planet CDN List. Retrieved Feb. 25, 2019 from https://www.cdnplanet.com/cdns/.Google Scholar
- 2019. Comodo Free SSL Certificate. Retrieved from https://www.comodo.com/e-commerce/ssl-certificates/free-ssl-certificate.php.Google Scholar
- 2019a. DNS Lookup Dynamic DNS List. Retrieved Feb. 25, 2019 from https://dnslookup.me/dynamic-dns/.Google Scholar
- 2019a. Google Public IP API. Retrieved Feb. 17, 2019 from https://github.com/bcoe/gce-ips/blob/master/index.js.Google Scholar
- 2019. Google Safe Browsing: Making the world’s information safely accessible. Retrieved February 2019 from https://safebrowsing.google.com.Google Scholar
- 2019. Microsoft Azure Public IP Ranges. Retrieved Feb. 17, 2019 from https://github.com/bcoe/which-cloud/blob/master/data/PublicIPs.xml.Google Scholar
- 2019b. Neu5ron Dynamic DNS List. Retrieved Feb. 25, 2019 from https://gist.github.com/neu5ron/860c158180e01b61a524.Google Scholar
- 2019. Public Suffix List. Retrieved Feb. 10, 2019 from https://publicsuffix.org/.Google Scholar
- 2019. scikit-learn. Retrieved from Feb. 10, 2019. http://scikit-learn.org/.Google Scholar
- 2019b. Team AWS. Retrieved Feb. 17, 2019 from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html/.Google Scholar
- 2019c. Team AWS. Retrieved Feb. 17, 2019 from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html.Google Scholar
- 2019b. Team Google. Retrieved Feb. 17, 2019 from https://cloud.google.com/compute/docs/ip-addresses/ephemeraladdress.Google Scholar
- 2019. Which-Cloud Tool. Retrieved Feb. 17, 2019 from https://github.com/bcoe/which-cloud.Google Scholar
- 2019. WHOIS Records. Retrieved Feb. 10, 2019 from https://whois.icann.org/.Google Scholar
- 2019b. WPO Foundation CDN List. Retrieved Feb. 25, 2019 from https://github.com/WPO-Foundation/webpagetest/blob/master/agent/wpthook/cdn.h.Google Scholar
- 2020. McAfee Site Advisor. Retrieved March 2020 from https://www.mcafee.com/siteadvisor.Google Scholar
- Alexa. 2019. Alexa Top Sites. Retrieved Feb. 28, 2019 from http://aws.amazon.com/alexa-top-sites/.Google Scholar
- S. Alrwais, X. Liao, X. Mi, P. Wang, X. Wang, F. Qian, R. Beyah, and D. McCoy. 2017. Under the shadow of sunshine: Understanding and detecting bulletproof hosting on legitimate service provider networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 805--823.Google Scholar
- Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. 13--21.Google ScholarDigital Library
- Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. 2010. Building a dynamic reputation system for DNS. In Proceedings of the 19th USENIX Conference on Security. 273--290.Google Scholar
- Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, and David Dagon. 2011. Detecting malware domains at the upper DNS hierarchy. In Proceedings of the 20th USENIX Conference on Security. USENIX Association, 27--42.Google Scholar
- Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, Saeed Abu-Nimeh, Wenke Lee, and David Dagon. 2012. From throw-away traffic to bots: Detecting the rise of DGA-based malware. In Proceedings of the 21st USENIX Conference on Security Symposium. 24--24.Google Scholar
- Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. 2014. Exposure: A passive DNS analysis service to detect and report malicious domains. ACM Transactions on Information and System Security 16, 4 (Apr. 2014), 14:1–14:28.Google ScholarDigital Library
- Z. Chen, C. Ji, and P. Barford. 2008. Spatial-temporal characteristics of internet malicious sources. In Proceedings of the 27th Conference on Computer Communications, INFOCOMM’08.Google Scholar
- Farsight Security, Inc. 2019. DNS Database. Retrieved Feb. 28, 2019 from https://www.dnsdb.info/.Google Scholar
- Kensuke Fukuda and John Heidemann. 2015. Detecting malicious activity with DNS backscatter. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference. 197--210.Google ScholarDigital Library
- H. Gao, V. Yegneswaran, J. Jiang, Y. Chen, P. Porras, S. Ghosh, and H. Duan. 2016. Reexamining DNS from a global recursive resolver perspective. IEEE/ACM Transactions on Networking 24, 1 (Feb. 2016), 43--57.Google ScholarDigital Library
- Ching-Hsiang Hsu, Chun-Ying Huang, and Kuan-Ta Chen. 2010. Fast-flux bot detection in real time. In Proceedings of the 13th International Conference on Recent Advances in Intrusion Detection. 464--483.Google ScholarDigital Library
- L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova, and G. Vigna. 2012. EvilSeed: A guided approach to finding malicious web pages. In 2012 IEEE Symposium on Security and Privacy. 428--442. DOI:https://doi.org/10.1109/SP.2012.33Google ScholarDigital Library
- Lee J. and Lee H. 2014. GMAD: Graph-based malware activity detection by DNS traffic analysis. Computer Communications 49 (2014), 33--47.Google ScholarDigital Library
- Nan Jiang, Jin Cao, Yu Jin, Li Erran Li, and Zhi-Li Zhang. 2010. Identifying suspicious activities through DNS failure graph analysis. In Proceedings of the 18th IEEE International Conference on Network Protocols. IEEE Computer Society, 144--153.Google ScholarDigital Library
- Issa M. Khalil, Bei Guan, Mohamed Nabeel, and Ting Yu. 2018. A domain is only as good as its buddies: Detecting stealthy malicious domains via graph inference. In Proceedings of the 8th ACM Conference on Data and Application Security and Privacy (CODASPY’18). ACM, New York, NY, 330--341. DOI:https://doi.org/10.1145/3176258.3176329Google ScholarDigital Library
- Issa M. Khalil, Ting Yu, and Bei Guan. 2016. Discovering malicious domains through passive DNS data graph analysis. In Proceedings of the 11th ACM Symposium on Information, Computer and Communications Security. 663--674.Google ScholarDigital Library
- Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, Xiaoyong Zhou, and XiaoFeng Wang. 2009. Effective and efficient malware detection at the end host. In Proceedings of the 18th Conference on USENIX Security Symposium. USENIX Association, 351--366.Google Scholar
- Maria Konte, Roberto Perdisci, and Nick Feamster. 2015. ASwatch: An as reputation system to expose bulletproof hosting ases. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. ACM, 625--638.Google ScholarDigital Library
- Platon Kotzias, Srdjan Matic, Richard Rivera, and Juan Caballero. 2015. Certified PUP: Abuse in authenticode code signing. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 465--478.Google ScholarDigital Library
- Athanasios Kountouras, Panagiotis Kintis, Charles Lever, Yizheng Chen, Yacin Nadji, David Dagon, and Manos Antonakakis. 2016. Enabling network security through active DNS datasets. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses. 188--208.Google ScholarCross Ref
- C. Leistner, A. Saffari, J. Santner, and H. Bischof. 2009. Semi-supervised random forests. In Proceedings of the IEEE 12th International Conference on Computer Vision. 506--513.Google Scholar
- C. Lever, P. Kotzias, D. Balzarotti, J. Caballero, and M. Antonakakis. 2017. A lustrum of malware network communication: Evolution and insights. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 788--804.Google Scholar
- Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M. Voelker, and Lawrence K. Saul. 2015. Who is .Com?: Learning to parse WHOIS records. In Proceedings of the 2015 Internet Measurement Conference. ACM, 369--380.Google Scholar
- Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, and William Horne. 2014. Detecting malicious domains via graph inference. In Proceedings of the 19th European Symposium on Research in Computer Security, Mirosław Kutyłowski and Jaideep Vaidya (Eds.). Springer International Publishing, Cham, 1--18.Google ScholarDigital Library
- Antonio Nappa, M. Zubair Rafique, and Juan Caballero. 2013. Driving in the cloud: An analysis of drive-by download operations and abuse reporting. In Proceedings of the 10th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer-Verlag, 1--20.Google ScholarDigital Library
- OpenDNS. [n.d.]. PhishTank. Retrieved Feb. 16, 2019 from https://www.phishtank.com/.Google Scholar
- Judea Pearl. 1982. Reverend Bayes on inference engines: A distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence.Google Scholar
- Chengwei Peng, Xiaochun Yun, Yongzheng Zhang, and Shuhao Li. 2018. MalShoot: Shooting malicious domains through graph embedding on passive DNS data. In Collaborative Computing: Networking, Applications and Worksharing - Proceedings of the14th EAI International Conference, CollaborateCom 2018. 488--503.Google Scholar
- Chengwei Peng, Xiaochun Yun, Yongzheng Zhang, Shuhao Li, and Jun Xiao. 2017. Discovering malicious domains through alias-canonical graph. In 2017 IEEE Trustcom/BigDataSE/ICESS. 225--232. DOI:https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.241Google Scholar
- B. Rahbarinia, R. Perdisci, and M. Antonakakis. 2015. Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks. In Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 403--414.Google Scholar
- Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. 2011. Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19, 4 (Dec. 2011), 639--668.Google ScholarCross Ref
- Christian Rossow, Christian Dietrich, and Herbert Bos. 2013. Large-scale analysis of malware downloaders. In Proceedings of the 9th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer-Verlag, 42--61.Google ScholarDigital Library
- M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. 2001. Survey and taxonomy of IP address lookup algorithms. Magazine of Global Internetworking 15, 2 (March 2001), 8--23.Google Scholar
- Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A long way to the top: Significance, structure, and stability of internet top lists. In IMC.Google Scholar
- Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. 2016. Satellite: Joint analysis of CDNs and network-level interference. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference. USENIX Association, 195--208.Google Scholar
- Matija Stevanovic, Jens Myrup Pedersen, Alessandro D’Alconzo, and Stefan Ruehrup. 2017. A method for identifying compromised clients based on DNS traffic analysis. International Journal of Information Security 16, 2 (2017), 115--132.Google ScholarDigital Library
- Elizabeth Stinson and John C. Mitchell. 2008. Towards systematic evaluation of the evadability of bot/botnet detection methods. In Proceedings of the 2nd Conference on USENIX Workshop on Offensive Technologies. 5:1–5:9.Google Scholar
- Brett Stone-Gross, Christopher Kruegel, Kevin Almeroth, Andreas Moser, and Engin Kirda. 2009. FIRE: FInding rogue networks. In Proceedings of the 2009 Annual Computer Security Applications Conference. IEEE Computer Society, 231--240.Google ScholarDigital Library
- Xiaoqing Sun, Mingkai Tong, Jiahai Yang, Liu Xinran, and Liu Heng. 2019. HinDom: A robust malicious domain detection system based on heterogeneous information network with transductive classification. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID’19). USENIX Association, 399--412. https://www.usenix.org/conference/raid2019/presentation/sun.Google Scholar
- Acar Tamersoy, Kevin Roundy, and Duen Horng Chau. 2014. Guilt by association: Large scale malware detection by mining file-relation graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1524--1533.Google ScholarDigital Library
- Kurt Thomas, Elie Bursztein, Chris Grier, Grant Ho, Nav Jagpal, Alexandros Kapravelos, Damon Mccoy, Antonio Nappa, Vern Paxson, Paul Pearce, Niels Provos, and Moheeb Abu Rajab. 2015. Ad injection at scale: Assessing deceptive advertisement modifications. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE Computer Society, 151--167.Google ScholarDigital Library
- Ke Tian, Steve T. K. Jan, Hang Hu, Danfeng Yao, and Gang Wang. 2018. Needle in a haystack: Tracking down elite phishing domains in the wild. In Proceedings of the Internet Measurement Conference 2018 (IMC’18). 429--442.Google ScholarDigital Library
- Van Tong and Giang Nguyen. 2016. A method for detecting DGA botnet based on semantic and cluster analysis. In Proceedings of the 7th Symposium on Information and Communication Technology. 272--277.Google ScholarDigital Library
- VirusTotal, Subsidiary of Google. 2019. Retrieved Feb. 28, 2019 from VirusTotal – Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/.Google Scholar
- Liang Wang, Antonio Nappa, Juan Caballero, Thomas Ristenpart, and Aditya Akella. 2014. WhoWas: A platform for measuring web deployments on iaas clouds. In Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, 101--114.Google ScholarDigital Library
- Florian Weimer. 2005. Passive DNS replication. In FIRST Conference on Computer Security Incident. 98.Google Scholar
- Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, and Ted Wobber. 2007. How dynamic are IP addresses?. In Proceedings of the Special Interest Group on Data Communication (SIGCOMM’07). ACM, 301--312. http://dblp.uni-trier.de/db/conf/sigcomm/sigcomm2007.html#XieYAGGW07.Google ScholarDigital Library
- Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2003. Exploring artificial intelligence in the new millennium. Morgan Kaufmann Publishers Inc., 239--269.Google Scholar
- Jialong Zhang, Sabyasachi Saha, Guofei Gu, Sung-Ju Lee, and Marco Mellia. 2015. Systematic mining of associated server herds for malware campaign discovery. In Proceedings of the 35th IEEE International Conference on Distributed Computing Systems. 630--641.Google ScholarCross Ref
- Yury Zhauniarovich, Issa Khalil, Ting Yu, and Marc Dacier. 2018. A survey on malicious domains detection through DNS data analysis. ACM Computing Surveys 51, 4, Article 67 (July 2018), 36 pages. DOI:https://doi.org/10.1145/3191329Google Scholar
- Futai Zou, Siyu Zhang, Weixiong Rao, and Ping Yi. 2015. Detecting malware based on DNS graph mining. International Journal of Distributed Sensor Networks 2015 (2015).Google ScholarDigital Library
Index Terms
- Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference
Recommendations
Discovering Malicious Domains through Passive DNS Data Graph Analysis
ASIA CCS '16: Proceedings of the 11th ACM on Asia Conference on Computer and Communications SecurityMalicious domains are key components to a variety of cyber attacks. Several recent techniques are proposed to identify malicious domains through analysis of DNS data. The general approach is to build classifiers based on DNS-related local domain ...
Using Passive DNS to Detect Malicious Domain Name
ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal ProcessingWith the prosperity of the Internet, the number of malicious domain name is enormous, and the scope and harm of the threats they create are increasing. Using traditional reputation systems and reverse engineering methods to detect malicious domain name ...
Detecting Malicious Domains via Graph Inference
Computer Security - ESORICS 2014AbstractEnterprises routinely collect terabytes of security relevant data, e.g., network logs and application logs, for several reasons such as cheaper storage, forensic analysis, and regulatory compliance. Analyzing these big data sets to identify ...
Comments