ABSTRACT
Given a history of detected malware attacks, can we predict the number of malware infections in a country? Can we do this for different malware and countries? This is an important question which has numerous implications for cyber security, right from designing better anti-virus software, to designing and implementing targeted patches to more accurately measuring the economic impact of breaches. This problem is compounded by the fact that, as externals, we can only detect a fraction of actual malware infections. In this paper we address this problem using data from Symantec covering more than 1.4 million hosts and 50 malware spread across 2 years and multiple countries. We first carefully design domain-based features from both malware and machine-hosts perspectives. Secondly, inspired by epidemiological and information diffusion models, we design a novel temporal non-linear model for malware spread and detection. Finally we present ESM, an ensemble-based approach which combines both these methods to construct a more accurate algorithm. Using extensive experiments spanning multiple malware and countries, we show that ESM can effectively predict malware infection ratios over time (both the actual number and trend) upto 4 times better compared to several baselines on various metrics. Furthermore, ESM's performance is stable and robust even when the number of detected infections is low.
- E. Adar and L. A. Adamic. Tracking information epidemics in blogspace. Web Intelligence, pages 207--214, 2005. Google ScholarDigital Library
- R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press, 1991.Google Scholar
- N. Bailey. The Mathematical Theory of Infectious Diseases and its Applications. Griffin, London, 1975.Google Scholar
- S. Bikhchandani, D. Hirshleifer, and I. Welch. A theory of fads, fashion, custom, and cultural change in informational cascades. Journal of Political Economy, 100(5):992--1026, October 1992.Google ScholarCross Ref
- L. Bilge and T. Dumitras. Before we knew it: an empirical study of zero-day attacks in the real world. In ACM Conference on Computer and Communications Security, pages 833--844, 2012. Google ScholarDigital Library
- D. H. P. Chau, C. Nachenberg, J. Wilhelm, A. Wright, and C. Faloutsos. Polonium : Tera-scale graph mining for malware detection. In SDM, Mesa, AZ, April 2011.Google ScholarCross Ref
- A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of epidemics. IEEE INFOCOM, 2005.Google ScholarCross Ref
- C. Gkantsidis, T. Karagiannis, and M. Vojnovic. Planet scale software updates. In SIGCOMM, pages 423--434, 2006. Google ScholarDigital Library
- J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Letters, 2001.Google Scholar
- M. Granovetter. Threshold models of collective behavior. Am. Journal of Sociology, 83(6):1420--1443, 1978.Google Scholar
- D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In WWW '04, 2004. Google ScholarDigital Library
- H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000. Google ScholarDigital Library
- J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE Computer Society Symposium on Research in Security and Privacy, 1993. Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW '03: Proceedings of the 12th international conference on World Wide Web, pages 568--576, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
- M. Lad, X. Zhao, B. Zhang, D. Massey, and L. Zhang. Analysis of BGP Update Burst During Slammer Attack. In The 5th International Workshop on Distributed Computing, Dec 2005.Google Scholar
- K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly Journal of Applied Mathmatics, II(2):164--168, 1944.Google ScholarCross Ref
- J. Li, Z. Wu, and E. Purpus. CAM04--5: Toward Understanding the Behavior of BGP During Large-Scale Power Outages. GLOBECOM '06. IEEE, pages 1--5, Nov. 2006.Google ScholarCross Ref
- Y. Matsubara, Y. Sakurai, B. A. Prakash, L. Li, and C. Faloutsos. Rise and fall patterns of information diffusion: model and implications. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '12, pages 6--14, 2012. Google ScholarDigital Library
- Y. Matsubara, Y. Sakurai, W. G. Van-Panhuis, and C. Faloutsos. Funnel: automatic mining of spatially coevolving epidemics. In KDD, pages 105--114, 2014. Google ScholarDigital Library
- A. G. McKendrick. Applications of mathematics to medical problems. In Proceedings of Edin. Math. Society, volume 44, pages 98--130, 1925.Google ScholarCross Ref
- D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver. Inside the Slammer worm. Security & Privacy, IEEE, 1(4):33--39, 2003. Google ScholarDigital Library
- D. Moore, C. Shannon, and K. C. Claffy. Code-red: a case study on the spread and victims of an internet worm. In Internet Measurement Workshop, pages 273--284, 2002. Google ScholarDigital Library
- E. E. Papalexakis, T. Dumitras, D. H. Chau, B. A. Prakash, and C. Faloutsos. Spatio-temporal mining of software adoption & penetration. In 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2013. Google ScholarDigital Library
- B. A. Prakash, D. Chakrabarti, M. Faloutsos, N. Valler, and C. Faloutsos. Threshold conditions for arbitrary cascade models on arbitrary networks. In ICDM, 2011. Google ScholarDigital Library
- S. Staniford, D. Moore, V. Paxson, and N. Weaver. The top speed of flash worms. In WORM, pages 33--42, 2004. Google ScholarDigital Library
- S. Staniford, V. Paxson, and N. Weaver. How to 0wn the internet in your spare time. In Proceedings of the 11th USENIX Security Symposium, pages 149--167, Berkeley, CA, USA, 2002. USENIX Association. Google ScholarDigital Library
- L. Wang, X. Zhao, D. Pei, R. Bush, D. Massey, A. Mankin, S. Wu, and L. Zhang. Observation and Analysis of BGP Behavior under Stress. In IMW, 2002. Google ScholarDigital Library
- N. Weaver and D. Ellis. Reflections on Witty: Analyzing the attacker. ;login: The USENIX Magazine, 29(3):34--37, June 2004.Google Scholar
Index Terms
- Ensemble Models for Data-driven Prediction of Malware Infections
Recommendations
Testing malware detectors
In today's interconnected world, malware, such as worms and viruses, can cause havoc. A malware detector (commonly known as virus scanner) attempts to identify malware. In spite of the importance of malware detectors, there is a dearth of testing ...
Testing malware detectors
ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysisIn today's interconnected world, malware, such as worms and viruses, can cause havoc. A malware detector (commonly known as virus scanner) attempts to identify malware. In spite of the importance of malware detectors, there is a dearth of testing ...
Revealing Packed Malware
In concert with the ever-growing network applications, a significant increase in the spread of malware over the Internet has been observed. In cases where malware are the zero-day threats, generating their signatures for detection via anti-virus (AV) ...
Comments