ABSTRACT
Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.
- L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarDigital Library
- P.-Y. Hao, J.-H. Chiang, and Y.-H. Lin. A new maximal margin spherical structured multi class support vector machine. Applied Intelligence, 30:98--111, 2009. Google ScholarDigital Library
Index Terms
- Weighted linear kernel with tree transformed features for malware detection
Recommendations
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISecA popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Malware Detection Method Focusing on Anti-debugging Functions
CANDAR '14: Proceedings of the 2014 Second International Symposium on Computing and NetworkingMalware has received much attention in recent years. Antivirus software is widely used as a countermeasure against malware. However, some kinds of malware can evade detection by antivirus software, hence, a new detection method is required. In this ...
Visualization and deep-learning-based malware variant detection using OpCode-level features
AbstractMalicious software (malware) is a major threat to the systems and networks’ security. Although anti-malware products are used to protect systems and networks against malware attacks, obfuscated malware that is capable of evading ...
Highlights- Detecting newly obfuscated malware remained a major challenge.
- Semisupervised ...
Comments