skip to main content
10.1145/3151137.3151142acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssprewConference Proceedingsconference-collections
research-article
Open Access

Fast Model Learning for the Detection of Malicious Digital Documents

Published:05 December 2017Publication History

ABSTRACT

Modern cyber attacks are often conducted by distributing digital documents that contain malware. The approach detailed herein, which consists of a classifier that uses features derived from dynamic analysis of a document viewer as it renders the document in question, is capable of classifying the disposition of digital documents with greater than 98% accuracy even when its model is trained on just small amounts of data. To keep the classification model itself small and thereby to provide scalability, we employ an entity resolution strategy that merges syntactically disparate features that are thought to be semantically equivalent but vary due to programmatic randomness. Entity resolution enables construction of a comprehensive model of benign functionality using relatively few training documents, and the model does not improve significantly with additional training data.

References

  1. Ross Anderson, Chris Barton, Rainer Böhme, Richard Clayton, Michel JG Van Eeten, Michael Levi, Tyler Moore, and Stefan Savage. 2013. Measuring the cost of cybercrime. In The economics of information security and privacy. Springer, 265--300.Google ScholarGoogle Scholar
  2. Michael Bailey, Jon Oberheide, Jon Andersen, Z Morley Mao, Farnam Jahanian, and Jose Nazario. 2007. Automated classification and analysis of internet malware. In International Workshop on Recent Advances in Intrusion Detection. Springer, 178--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ahmad Bazzi and Yoshikuni Onozato. 2013. IDS for detecting malicious nonexecutable files using dynamic analysis.. In APNOMS. 1--3.Google ScholarGoogle Scholar
  4. Rudi Cilibrasi and Paul MB Vitányi. 2005. Clustering by compression. IEEE Transactions on Information theory 51, 4 (2005), 1523--1545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Artem Dinaburg, Paul Royal, Monirul Sharif, and Wenke Lee. 2008. Ether: malware analysis via hardware virtualization extensions. In Proceedings of the 15th ACM conference on Computer and communications security. ACM, 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M Engleberth, Carsten Willems, and Thorsten Holz. 2009. Detecting malicious documents with combined static and dynamic analysis (Powerpoint Presentation). Virus Bulletin (2009).Google ScholarGoogle Scholar
  7. Tal Garfinkel, Mendel Rosenblum, et al. 2003. A Virtual Machine Introspection Based Architecture for Intrusion Detection.. In NDSS, Vol. 3. 191--206.Google ScholarGoogle Scholar
  8. Kent Griffin, Scott Schneider, Xin Hu, and Tzi-Cker Chiueh. 2009. Automatic generation of string signatures for malware detection. In International Workshop on Recent Advances in Intrusion Detection. Springer, 101--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Galen Hunt and Doug Brubacher. 1999. DETOURS: BINARY INTERCEPTION OF WIN 32 FUNCTIONS. In 3rd Usenix Windows NT Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rafiqul Islam, Ronghua Tian, Lynn Batten, and Steve Versteeg. 2010. Classification of malware based on string and function feature selection. In Cybercrime and Trustworthy Computing Workshop (CTC), 2010 Second. IEEE, 9--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jiyong Jang, David Brumley, and Shobha Venkataraman. 2011. Bitshred: feature hashing malware for scalable triage and semantic analysis. In Proceedings of the 18th ACM conference on Computer and communications security. ACM, 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Joint Task Force Transformation Initiative Interagency Working Group. 2013. NIST Special Publication 800-53 Revision 4 - Security and Privacy Controls for Federal Information Systems and Organizations. Technical Report. National Institute of Science and Technology (NIST).Google ScholarGoogle Scholar
  13. Suleyman Kondakci. 2009. A concise cost analysis of Internet malware. Computers & Security 28, 7 (2009), 648--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Pavel Laskov and Nedim Šrndić. 2011. Static detection of malicious JavaScript-bearing PDF documents. In Proceedings of the 27th Annual Computer Security Applications Conference. ACM, 373--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals. In Soviet physics doklady, Vol. 10. 707.Google ScholarGoogle Scholar
  16. Yun Li and Bao-Liang Lu. 2009. Feature selection based on loss-margin of nearest neighbor classification. Pattern Recognition 42, 9 (2009), 1914--1921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. In ACM Sigplan Notices, Vol. 40. ACM, 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Davide Maiorca, Giorgio Giacinto, and Igino Corona. 2012. A pattern recognition system for malicious PDF files detection. In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, 510--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nir Nissim, Aviad Cohen, Chanan Glezer, and Yuval Elovici. 2015. Detection of malicious PDF files and directions for enhancements: a state-of-the art survey. Computers & Security 48 (2015), 246--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Himanshu Pareek, P Eswari, N Sarat Chandra Babu, and C Bangalore. 2013. Entropy and n-gram analysis of malicious PDF documents. Int J Eng Res Tech 2, 2 (2013).Google ScholarGoogle Scholar
  21. Karthik Selvaraj and Nino Fred Gutierrez. 2010. The rise of PDF malware. Symantec Security Response (2010).Google ScholarGoogle Scholar
  22. Charles Smutz and Angelos Stavrou. 2012. Malicious PDF detection using metadata and structural features. In Proceedings of the 28th Annual Computer Security Applications Conference. ACM, 239--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nedim Šrndic and Pavel Laskov. 2013. Detection of malicious PDF files based on hierarchical document structure. In Proceedings of the 20th Annual Network & Distributed System Security Symposium.Google ScholarGoogle Scholar
  24. Cristina Vatamanu, Dragoş Gavriluţ, and Răzvan Benchea. 2012. A practical approach on clustering malicious PDF documents. Journal in Computer Virology 8, 4 (2012), 151--163. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast Model Learning for the Detection of Malicious Digital Documents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SSPREW-7: Proceedings of the 7th Software Security, Protection, and Reverse Engineering / Software Security and Protection Workshop
      December 2017
      68 pages
      ISBN:9781450353878
      DOI:10.1145/3151137

      Copyright © 2017 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 December 2017

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      SSPREW-7 Paper Acceptance Rate6of13submissions,46%Overall Acceptance Rate6of13submissions,46%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader