skip to main content
research-article

Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference

Authors Info & Claims
Published:06 July 2020Publication History
Skip Abstract Section

Abstract

Malicious domains, including phishing websites, spam servers, and command and control servers, are the reason for many of the cyber attacks nowadays. Thus, detecting them in a timely manner is important to not only identify cyber attacks but also take preventive measures. There has been a plethora of techniques proposed to detect malicious domains by analyzing Domain Name System (DNS) traffic data. Traditionally, DNS acts as an Internet miscreant’s best friend, but we observe that the subtle traces in DNS logs left by such miscreants can be used against them to detect malicious domains. Our approach is to build a set of domain graphs by connecting “related” domains together and injecting known malicious and benign domains into these graphs so that we can make inferences about the other domains in the domain graphs. A key challenge in building these graphs is how to accurately identify related domains so that incorrect associations are minimized and the number of domains connected from the dataset is maximized. Based on our observations, we first train two classifiers and then devise a set of association rules that assist in linking domains together. We perform an in-depth empirical analysis of the graphs built using these association rules on passive DNS data and show that our techniques can detect many more malicious domains than the state-of-the-art.

References

  1. 2016. RFC 7858: Specification for DNS over Transport Layer Security (TLS). Retrieved on Feb. 17, 2019 from https://tools.ietf.org/html/rfcGoogle ScholarGoogle Scholar
  2. 2018. RFC 8484: DNS Queries over HTTPS. Retrieved on Feb. 17, 2019 from https://tools.ietf.org/html/rfc8484.Google ScholarGoogle Scholar
  3. 2019a. AWS Public IP Ranges. Retrieved Feb. 17, 2019 from https://ip-ranges.amazonaws.com/ip-ranges.json.Google ScholarGoogle Scholar
  4. 2019a. CDN Planet CDN List. Retrieved Feb. 25, 2019 from https://www.cdnplanet.com/cdns/.Google ScholarGoogle Scholar
  5. 2019. Comodo Free SSL Certificate. Retrieved from https://www.comodo.com/e-commerce/ssl-certificates/free-ssl-certificate.php.Google ScholarGoogle Scholar
  6. 2019a. DNS Lookup Dynamic DNS List. Retrieved Feb. 25, 2019 from https://dnslookup.me/dynamic-dns/.Google ScholarGoogle Scholar
  7. 2019a. Google Public IP API. Retrieved Feb. 17, 2019 from https://github.com/bcoe/gce-ips/blob/master/index.js.Google ScholarGoogle Scholar
  8. 2019. Google Safe Browsing: Making the world’s information safely accessible. Retrieved February 2019 from https://safebrowsing.google.com.Google ScholarGoogle Scholar
  9. 2019. Microsoft Azure Public IP Ranges. Retrieved Feb. 17, 2019 from https://github.com/bcoe/which-cloud/blob/master/data/PublicIPs.xml.Google ScholarGoogle Scholar
  10. 2019b. Neu5ron Dynamic DNS List. Retrieved Feb. 25, 2019 from https://gist.github.com/neu5ron/860c158180e01b61a524.Google ScholarGoogle Scholar
  11. 2019. Public Suffix List. Retrieved Feb. 10, 2019 from https://publicsuffix.org/.Google ScholarGoogle Scholar
  12. 2019. scikit-learn. Retrieved from Feb. 10, 2019. http://scikit-learn.org/.Google ScholarGoogle Scholar
  13. 2019b. Team AWS. Retrieved Feb. 17, 2019 from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html/.Google ScholarGoogle Scholar
  14. 2019c. Team AWS. Retrieved Feb. 17, 2019 from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html.Google ScholarGoogle Scholar
  15. 2019b. Team Google. Retrieved Feb. 17, 2019 from https://cloud.google.com/compute/docs/ip-addresses/ephemeraladdress.Google ScholarGoogle Scholar
  16. 2019. Which-Cloud Tool. Retrieved Feb. 17, 2019 from https://github.com/bcoe/which-cloud.Google ScholarGoogle Scholar
  17. 2019. WHOIS Records. Retrieved Feb. 10, 2019 from https://whois.icann.org/.Google ScholarGoogle Scholar
  18. 2019b. WPO Foundation CDN List. Retrieved Feb. 25, 2019 from https://github.com/WPO-Foundation/webpagetest/blob/master/agent/wpthook/cdn.h.Google ScholarGoogle Scholar
  19. 2020. McAfee Site Advisor. Retrieved March 2020 from https://www.mcafee.com/siteadvisor.Google ScholarGoogle Scholar
  20. Alexa. 2019. Alexa Top Sites. Retrieved Feb. 28, 2019 from http://aws.amazon.com/alexa-top-sites/.Google ScholarGoogle Scholar
  21. S. Alrwais, X. Liao, X. Mi, P. Wang, X. Wang, F. Qian, R. Beyah, and D. McCoy. 2017. Under the shadow of sunshine: Understanding and detecting bulletproof hosting on legitimate service provider networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 805--823.Google ScholarGoogle Scholar
  22. Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. 13--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. 2010. Building a dynamic reputation system for DNS. In Proceedings of the 19th USENIX Conference on Security. 273--290.Google ScholarGoogle Scholar
  24. Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, and David Dagon. 2011. Detecting malware domains at the upper DNS hierarchy. In Proceedings of the 20th USENIX Conference on Security. USENIX Association, 27--42.Google ScholarGoogle Scholar
  25. Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, Saeed Abu-Nimeh, Wenke Lee, and David Dagon. 2012. From throw-away traffic to bots: Detecting the rise of DGA-based malware. In Proceedings of the 21st USENIX Conference on Security Symposium. 24--24.Google ScholarGoogle Scholar
  26. Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. 2014. Exposure: A passive DNS analysis service to detect and report malicious domains. ACM Transactions on Information and System Security 16, 4 (Apr. 2014), 14:1–14:28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Chen, C. Ji, and P. Barford. 2008. Spatial-temporal characteristics of internet malicious sources. In Proceedings of the 27th Conference on Computer Communications, INFOCOMM’08.Google ScholarGoogle Scholar
  28. Farsight Security, Inc. 2019. DNS Database. Retrieved Feb. 28, 2019 from https://www.dnsdb.info/.Google ScholarGoogle Scholar
  29. Kensuke Fukuda and John Heidemann. 2015. Detecting malicious activity with DNS backscatter. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference. 197--210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Gao, V. Yegneswaran, J. Jiang, Y. Chen, P. Porras, S. Ghosh, and H. Duan. 2016. Reexamining DNS from a global recursive resolver perspective. IEEE/ACM Transactions on Networking 24, 1 (Feb. 2016), 43--57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ching-Hsiang Hsu, Chun-Ying Huang, and Kuan-Ta Chen. 2010. Fast-flux bot detection in real time. In Proceedings of the 13th International Conference on Recent Advances in Intrusion Detection. 464--483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova, and G. Vigna. 2012. EvilSeed: A guided approach to finding malicious web pages. In 2012 IEEE Symposium on Security and Privacy. 428--442. DOI:https://doi.org/10.1109/SP.2012.33Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lee J. and Lee H. 2014. GMAD: Graph-based malware activity detection by DNS traffic analysis. Computer Communications 49 (2014), 33--47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nan Jiang, Jin Cao, Yu Jin, Li Erran Li, and Zhi-Li Zhang. 2010. Identifying suspicious activities through DNS failure graph analysis. In Proceedings of the 18th IEEE International Conference on Network Protocols. IEEE Computer Society, 144--153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Issa M. Khalil, Bei Guan, Mohamed Nabeel, and Ting Yu. 2018. A domain is only as good as its buddies: Detecting stealthy malicious domains via graph inference. In Proceedings of the 8th ACM Conference on Data and Application Security and Privacy (CODASPY’18). ACM, New York, NY, 330--341. DOI:https://doi.org/10.1145/3176258.3176329Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Issa M. Khalil, Ting Yu, and Bei Guan. 2016. Discovering malicious domains through passive DNS data graph analysis. In Proceedings of the 11th ACM Symposium on Information, Computer and Communications Security. 663--674.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, Xiaoyong Zhou, and XiaoFeng Wang. 2009. Effective and efficient malware detection at the end host. In Proceedings of the 18th Conference on USENIX Security Symposium. USENIX Association, 351--366.Google ScholarGoogle Scholar
  38. Maria Konte, Roberto Perdisci, and Nick Feamster. 2015. ASwatch: An as reputation system to expose bulletproof hosting ases. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. ACM, 625--638.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Platon Kotzias, Srdjan Matic, Richard Rivera, and Juan Caballero. 2015. Certified PUP: Abuse in authenticode code signing. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 465--478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Athanasios Kountouras, Panagiotis Kintis, Charles Lever, Yizheng Chen, Yacin Nadji, David Dagon, and Manos Antonakakis. 2016. Enabling network security through active DNS datasets. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses. 188--208.Google ScholarGoogle ScholarCross RefCross Ref
  41. C. Leistner, A. Saffari, J. Santner, and H. Bischof. 2009. Semi-supervised random forests. In Proceedings of the IEEE 12th International Conference on Computer Vision. 506--513.Google ScholarGoogle Scholar
  42. C. Lever, P. Kotzias, D. Balzarotti, J. Caballero, and M. Antonakakis. 2017. A lustrum of malware network communication: Evolution and insights. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 788--804.Google ScholarGoogle Scholar
  43. Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M. Voelker, and Lawrence K. Saul. 2015. Who is .Com?: Learning to parse WHOIS records. In Proceedings of the 2015 Internet Measurement Conference. ACM, 369--380.Google ScholarGoogle Scholar
  44. Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, and William Horne. 2014. Detecting malicious domains via graph inference. In Proceedings of the 19th European Symposium on Research in Computer Security, Mirosław Kutyłowski and Jaideep Vaidya (Eds.). Springer International Publishing, Cham, 1--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Antonio Nappa, M. Zubair Rafique, and Juan Caballero. 2013. Driving in the cloud: An analysis of drive-by download operations and abuse reporting. In Proceedings of the 10th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer-Verlag, 1--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. OpenDNS. [n.d.]. PhishTank. Retrieved Feb. 16, 2019 from https://www.phishtank.com/.Google ScholarGoogle Scholar
  47. Judea Pearl. 1982. Reverend Bayes on inference engines: A distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  48. Chengwei Peng, Xiaochun Yun, Yongzheng Zhang, and Shuhao Li. 2018. MalShoot: Shooting malicious domains through graph embedding on passive DNS data. In Collaborative Computing: Networking, Applications and Worksharing - Proceedings of the14th EAI International Conference, CollaborateCom 2018. 488--503.Google ScholarGoogle Scholar
  49. Chengwei Peng, Xiaochun Yun, Yongzheng Zhang, Shuhao Li, and Jun Xiao. 2017. Discovering malicious domains through alias-canonical graph. In 2017 IEEE Trustcom/BigDataSE/ICESS. 225--232. DOI:https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.241Google ScholarGoogle Scholar
  50. B. Rahbarinia, R. Perdisci, and M. Antonakakis. 2015. Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks. In Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 403--414.Google ScholarGoogle Scholar
  51. Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. 2011. Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19, 4 (Dec. 2011), 639--668.Google ScholarGoogle ScholarCross RefCross Ref
  52. Christian Rossow, Christian Dietrich, and Herbert Bos. 2013. Large-scale analysis of malware downloaders. In Proceedings of the 9th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer-Verlag, 42--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. 2001. Survey and taxonomy of IP address lookup algorithms. Magazine of Global Internetworking 15, 2 (March 2001), 8--23.Google ScholarGoogle Scholar
  54. Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A long way to the top: Significance, structure, and stability of internet top lists. In IMC.Google ScholarGoogle Scholar
  55. Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. 2016. Satellite: Joint analysis of CDNs and network-level interference. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference. USENIX Association, 195--208.Google ScholarGoogle Scholar
  56. Matija Stevanovic, Jens Myrup Pedersen, Alessandro D’Alconzo, and Stefan Ruehrup. 2017. A method for identifying compromised clients based on DNS traffic analysis. International Journal of Information Security 16, 2 (2017), 115--132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Elizabeth Stinson and John C. Mitchell. 2008. Towards systematic evaluation of the evadability of bot/botnet detection methods. In Proceedings of the 2nd Conference on USENIX Workshop on Offensive Technologies. 5:1–5:9.Google ScholarGoogle Scholar
  58. Brett Stone-Gross, Christopher Kruegel, Kevin Almeroth, Andreas Moser, and Engin Kirda. 2009. FIRE: FInding rogue networks. In Proceedings of the 2009 Annual Computer Security Applications Conference. IEEE Computer Society, 231--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Xiaoqing Sun, Mingkai Tong, Jiahai Yang, Liu Xinran, and Liu Heng. 2019. HinDom: A robust malicious domain detection system based on heterogeneous information network with transductive classification. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID’19). USENIX Association, 399--412. https://www.usenix.org/conference/raid2019/presentation/sun.Google ScholarGoogle Scholar
  60. Acar Tamersoy, Kevin Roundy, and Duen Horng Chau. 2014. Guilt by association: Large scale malware detection by mining file-relation graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1524--1533.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Kurt Thomas, Elie Bursztein, Chris Grier, Grant Ho, Nav Jagpal, Alexandros Kapravelos, Damon Mccoy, Antonio Nappa, Vern Paxson, Paul Pearce, Niels Provos, and Moheeb Abu Rajab. 2015. Ad injection at scale: Assessing deceptive advertisement modifications. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE Computer Society, 151--167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Ke Tian, Steve T. K. Jan, Hang Hu, Danfeng Yao, and Gang Wang. 2018. Needle in a haystack: Tracking down elite phishing domains in the wild. In Proceedings of the Internet Measurement Conference 2018 (IMC’18). 429--442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Van Tong and Giang Nguyen. 2016. A method for detecting DGA botnet based on semantic and cluster analysis. In Proceedings of the 7th Symposium on Information and Communication Technology. 272--277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. VirusTotal, Subsidiary of Google. 2019. Retrieved Feb. 28, 2019 from VirusTotal – Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/.Google ScholarGoogle Scholar
  65. Liang Wang, Antonio Nappa, Juan Caballero, Thomas Ristenpart, and Aditya Akella. 2014. WhoWas: A platform for measuring web deployments on iaas clouds. In Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, 101--114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Florian Weimer. 2005. Passive DNS replication. In FIRST Conference on Computer Security Incident. 98.Google ScholarGoogle Scholar
  67. Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, and Ted Wobber. 2007. How dynamic are IP addresses?. In Proceedings of the Special Interest Group on Data Communication (SIGCOMM’07). ACM, 301--312. http://dblp.uni-trier.de/db/conf/sigcomm/sigcomm2007.html#XieYAGGW07.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2003. Exploring artificial intelligence in the new millennium. Morgan Kaufmann Publishers Inc., 239--269.Google ScholarGoogle Scholar
  69. Jialong Zhang, Sabyasachi Saha, Guofei Gu, Sung-Ju Lee, and Marco Mellia. 2015. Systematic mining of associated server herds for malware campaign discovery. In Proceedings of the 35th IEEE International Conference on Distributed Computing Systems. 630--641.Google ScholarGoogle ScholarCross RefCross Ref
  70. Yury Zhauniarovich, Issa Khalil, Ting Yu, and Marc Dacier. 2018. A survey on malicious domains detection through DNS data analysis. ACM Computing Surveys 51, 4, Article 67 (July 2018), 36 pages. DOI:https://doi.org/10.1145/3191329Google ScholarGoogle Scholar
  71. Futai Zou, Siyu Zhang, Weixiong Rao, and Ping Yi. 2015. Detecting malware based on DNS graph mining. International Journal of Distributed Sensor Networks 2015 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Privacy and Security
          ACM Transactions on Privacy and Security  Volume 23, Issue 4
          November 2020
          196 pages
          ISSN:2471-2566
          EISSN:2471-2574
          DOI:10.1145/3409662
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 July 2020
          • Accepted: 1 May 2020
          • Revised: 1 March 2020
          • Received: 1 March 2019
          Published in tops Volume 23, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format