ABSTRACT
Insider threat detection represents a challenging problem to companies and organizations where malicious actions are performed by authorized users. This is a highly skewed data problem, where the huge class imbalance makes the adaptation of learning algorithms to the real world context very difficult. In this work, applications of genetic programming (GP) and stream active learning are evaluated for insider threat detection. Linear GP with lexicase/multi-objective selection is employed to address the problem under a stationary data assumption. Moreover, streaming GP is employed to address the problem under a non-stationary data assumption. Experiments conducted on a publicly available corporate data set show the capability of the approaches in dealing with extreme class imbalance, stream learning and adaptation to the real world context.
- M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. 2010. The security of machine learning. Machine Learning 81, 2 (2010), 121--148. Google ScholarDigital Library
- A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. 2010. MOA: Massive Online Analysis. Journal of Machine Learning Research 11 (2010), 1601--1604. Google ScholarDigital Library
- M. F. Brameier and W. Banzhaf. 2007. Linear Genetic Programming. Springer US. Google ScholarDigital Library
- J. Demsar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7 (2006), 1--30. Google ScholarDigital Library
- W. Eberle, L. Holder, and D. Cook. 2009. Identifying Threats Using Graph4)ased Anomaly Detection. In Machine Learning in Cyber Trust. Springer, 73--108.Google Scholar
- F. Eibe, M. A. Hall, and I. H. Witten. 2017. The WEKA Workbench. In Data mining: practical machine learning tools and techniques (4 ed.). Morgan Kaufmann.Google Scholar
- J. Gama. 2012. A survey on learning from data streams: current and future trends. Progress in AI 1, 1 (2012), 45--55.Google Scholar
- J. Glasser and B. Lindauer. 2013. Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data. In IEEE Symposium on Security and Privacy Workshops. 98--104. Google ScholarDigital Library
- F. Haddadi and A. N. Zincir-Heywood. 2015. A Closer Look at the HTTP and P2P Based Botnets from a Detector's Perspective. In Foundations and Practice of Security - 8th International Symposium (FPS 2015). Clermont-Ferrand, France, 212--228.Google Scholar
- T. Helmuth, L. Spector, and J. Matheson. 2015. Solving Uncompromising Problems With Lexicase Selection. IEEE Transactions on Evolutionary Computation 19, 5 (2015), 630--643.Google ScholarDigital Library
- M. I. Heywood. 2015. Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genetic Programming and Evolvable Machines 16, 3 (2015), 283--326. Google ScholarDigital Library
- G. Hulten, L. Spencer, and P. M. Domingos. 2001. Mining time-changing data streams. In ACM SIGKDD International Conference on Knowledge discovery and data mining. 97--106. Google ScholarDigital Library
- S. Khanchi, M. I. Heywood, and A. N. Zincir-Heywood. 2016. On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets. In European Genetic Programming Conference. 35--50.Google Scholar
- S. Khanchi, M. I. Heywood, and A. N. Zincir-Heywood. 2017. Properties of a GP active learning framework for streaming data with class imbalance. In ACM Genetic and Evolutionary Computation Conference. 945--952. Google ScholarDigital Library
- K. Krawiec and M. I. Heywood. 2017. Solving Complex Problems with Coevolutionary Algorithms. In ACM Genetic and Evolutionary Computation Conference (Companion). 782--806. Google ScholarDigital Library
- P. Lichodzijewski and M. I. Heywood. 2008. Managing team-based problem solving with symbiotic bid-based genetic programming. In ACM Genetic and Evolutionary Computation Conference. 363--370. Google ScholarDigital Library
- P. Parveen, J. Evans, B. M. Thuraisingham, K. W. Hamlen, and L. Khan. 2011. Insider Threat Detection Using Stream Mining and Graph Mining. In IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing. 1102--1110.Google Scholar
- P. Parveen and B. M. Thuraisingham. 2012. Unsupervised incremental sequence learning for insider threat detection. In IEEE International Conference on Intelligence and Security Informatics. 141--143.Google Scholar
- T. Rashid, I. Agrafiotis, and J. R. C. Nurse. 2016. A New Take on Detecting Insider Threats: Exploring the Use of Hidden Markov Models. In ACM CCS International Workshop on Managing Insider Security Threats. 47--56. Google ScholarDigital Library
- S. Ren, Y. Lian, and X. Zou. 2014. Incremental Naïve Bayesian Learning Algorithm based on Classification Contribution Degree. Journal of Computers 9, 8 (2014), 1967--1974.Google ScholarCross Ref
- T. E. Senator, H. G. Goldberg, A. Memory, W. T. Young, B. Rees, R. Pierce, D. Huang, M. Reardon, D. A. Bader, E. Chow, I. A. Essa, J. Jones, V. Bettadapura, D. H. Chau, O. Green, O. Kaya, A. Zakrzewska, E. Briscoe, R. L. Mappus IV, R. McColl, L. Weiss, T. G. Dietterich, A. Fern, W.-K. Wong, S. Das, A. Emmott, J. Irvine, J. Yoon Lee, D. Koutra, C. Faloutsos, D. D. Corkill, L. Friedland, A. Gentzel, and D. D. Jensen. 2013. Detecting insider threats in a real corporate database of computer usage activity. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1393--1401. Google ScholarDigital Library
- W. T. Strayer, D. E. Lapsley, R. Walsh, and C. Livadas. 2008. Botnet Detection Based on Network Behavior. In Botnet Detection: Countering the Largest Security Threat. 1--24.Google Scholar
- A. Tuor, S. Kaplan, B. Hutchinson, N. Nichols, and S. Robinson. 2017. Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams. In Proceedings of the AAAI-17 Workshop on Artificial Intelligence for Cyber Security. 224--231.Google Scholar
- A. Vahdat, J. Morgan, A. R. McIntyre, M. I. Heywood, and A. N. Zincir-Heywood. 2015. Evolving GP Classifiers for Streaming Data Tasks with Concept Change and Label Budgets: A Benchmarking Study. In Handbook of Genetic Programming Applications. 451--480.Google Scholar
- Q. Wang, W. Guo, K. Zhang, A. G. Ororbia II, X. Xing, Liu X, and C. L. Giles. 2017. Adversary Resistant Deep Neural Networks with an Application to Malware Detection. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1145--1153. Google ScholarDigital Library
- X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. F. M. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg. 2008. Top 10 algorithms in data mining. Knowledge Information Systems 14, 1 (2008), 1--37. Google ScholarDigital Library
- I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes. 2014. Active Learning With Drifting Streaming Data. IEEE Transactions on Neural Networks Learning Systems 25, 1 (2014), 27--39.Google ScholarCross Ref
Index Terms
- Benchmarking evolutionary computation approaches to insider threat detection
Recommendations
Few-shot Insider Threat Detection
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementInsiders cause significant cyber-security threats to organizations. Due to a very limited number of insiders, most of the current studies adopt unsupervised learning approaches to detect insiders by analyzing the audit data that record information about ...
Classification of Insider Threat Detection Techniques
CISRC '16: Proceedings of the 11th Annual Cyber and Information Security Research ConferenceMost insider attacks done by people who have the knowledge and technical know-how of launching such attacks. This topic has long been studied and many detection techniques were proposed to deal with insider threats. This short paper summarized and ...
Contrastive Learning for Insider Threat Detection
Database Systems for Advanced ApplicationsAbstractInsider threat detection techniques typically employ supervised learning models for detecting malicious insiders by using insider activity audit data. In many situations, the number of detected malicious insiders is extremely limited. To address ...
Comments