ABSTRACT
Sampling techniques are widely used for traffic measurements at high link speed to conserve router resources. Traditionally, sampled traffic data is used for network management tasks such as traffic matrix estimations, but recently it has also been used in numerous anomaly detection algorithms, as security analysis becomes increasingly critical for network providers. While the impact of sampling on traffic engineering metrics such as flow size and mean rate is well studied, its impact on anomaly detection remains an open question.This paper presents a comprehensive study on whether existing sampling techniques distort traffic features critical for effective anomaly detection. We sampled packet traces captured from a Tier-1 IP-backbone using four popular methods: random packet sampling, random flow sampling, smart sampling, and sample-and-hold. The sampled data is then used as input to detect two common classes of anomalies: volume anomalies and port scans. Since it is infeasible to enumerate all existing solutions, we study three representative algorithms: a wavelet-based volume anomaly detection and two portscan detection algorithms based on hypotheses testing. Our results show that all the four sampling methods introduce fundamental bias that degrades the performance of the three detection schemes, however the degradation curves are very different. We also identify the traffic features critical for anomaly detection and analyze how they are affected by sampling. Our work demonstrates the need for better measurement techniques, since anomaly detection operates on a drastically different information region, which is often overlooked by existing traffic accounting methods that target heavy-hitters.
- Cisco IOS Software NetFlow. http://www.cisco.com/warp/public/732/ Tech/nmp/netflow/.Google Scholar
- Juniper Networks: JUNOS 7.2 Software Documentation. http://www.juniper.net/techpubs/software/junos/junos72/index.html.Google Scholar
- Snort. http://www.snort.org.Google Scholar
- P. Barford, J. Kline, D. Plonka, and A. Ron. A Signal Analysis of Network Traffic Anomalies. In Proc. ACM SIGCOMM IMW'02, pages 71--82, Marseille, France, Nov. 2002. Google ScholarDigital Library
- P. Barford and D. Plonka. Characteristics of Network TRaffic Flow Anomalies. In Proc. ACM SIGCOMM IMW'01, pages 69--73, San Francisco, CA, USA, Nov. 2001. Google ScholarDigital Library
- B.-Y. Choi, J. Park, and Z.-L. Zhang. Adaptive Random Sampling for Traffic Load Measurement. In Proc. IEEE International Conference on Communications (ICC'03), Anchorage, Alaska, USA, May 2003.Google Scholar
- N. Duffield. Sampling for Passive Internet Measurement: A Review. Statistical Science, 19(3):472--498, 2004.Google ScholarCross Ref
- N. Duffield, C. Lund, and M. Thorup. Properties and Prediction of Flow Statistics from Sampled Packet Streams. In Proc. ACM SIGCOMM IMW'02, Marseille, France, Nov. 2002. Google ScholarDigital Library
- N. Duffield, C. Lund, and M. Thorup. Estimating Flow Distributions from Sampled Flow Statistics. In Proc. ACM SIGCOMM'03, Karlsruhe, Germany, Aug. 2003. Google ScholarDigital Library
- C. Estan, K. Keys, D. Moore, and G. Varghese. Building a Better NetFlow. In Proc. of SIGCOMM'04, Portland, Oregon, USA, Aug. 2004. Google ScholarDigital Library
- C. Estan and G. Varghese. New Directions in Traffic Measurement and Accounting. In Proc. of SIGCOMM'02, Pittsburgh, Pennsylvania, USA, Aug. 2002. Google ScholarDigital Library
- C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. Moll, R. Rockell, T. Seely, and C. Diot. Packet-Level Traffic Measurements from the Sprint IP Backbone. IEEE Network, 17(6):6--16, November/December 2003. Google ScholarDigital Library
- N. Hohn and D. Veitch. Inverting Sampled Traffic. In Proc. ACM SIGCOMM IMC'03, Miami Beach, Florida, USA, Oct. 2003. Google ScholarDigital Library
- J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan. Fast Portscan Detection Using Sequential Hypothesis Testing. In Proc. of 2004 IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 2004.Google ScholarCross Ref
- B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen. Sketch-based Change Detection: Methods, Evaluation, and Applications. In Proc. ACM SIGCOMM IMC'03, Miami Beach, Florida, USA, Oct. 2003. Google ScholarDigital Library
- A. Lakhina, M. Crovella, and C. Diot. Mining Anomalies Using Traffic Feature Distributions. In Proc. ACM SIGCOMM '05, Philadelphia, PA, USA, Aug. 2005. Google ScholarDigital Library
- J. Mai, A. Sridharan, C.-N. Chuah, T. Ye, and H. Zang. Impact of Packet Sampling on Portscan Detection. Technical Report RR06-ATL-043166, Sprint ATL, 2006. (accepted by IEEE JSAC Special Issue on Sampling the Internet).Google Scholar
- M. Roesch. Snort - Lightweight Intrusion Detection for Networks. In Proc. 1999 USENIX LISA Conference, Seattle, WA, USA, Nov. 1999. Google ScholarDigital Library
- A. Sridharan, T. Ye, and S. Bhattacharyya. Connection Port Scan Detection on the Backbone. In Malware Workshop held in conjunction with IPCC, Phoenix, Arizona, USA, April 2006.Google Scholar
- S. Staniford, J. A. Hoagland, and J. M. McAlerney. Practical automated detection of stealthy portscans. Journal of Computer Security, 10:105--136, 2002. Google ScholarDigital Library
- M. Thottan and C. Ji. Anomaly Detection in IP Networks. IEEE Trans. on Signal Processing, 51(8):2191--2204, Aug. 2003. Google ScholarDigital Library
Index Terms
- Is sampled data sufficient for anomaly detection?
Recommendations
Impact of packet sampling on anomaly detection metrics
IMC '06: Proceedings of the 6th ACM SIGCOMM conference on Internet measurementPacket sampling methods such as Cisco's NetFlow are widely employed by large networks to reduce the amount of traffic data measured. A key problem with packet sampling is that it is inherently a lossy process, discarding (potentially useful) ...
On mitigating sampling-induced accuracy loss in traffic anomaly detection systems
Real-time Anomaly Detection Systems (ADSs) use packet sampling to realize traffic analysis at wire speeds. While recent studies have shown that a considerable loss of anomaly detection accuracy is incurred due to sampling, solutions to mitigate this ...
Towards efficient flow sampling technique for anomaly detection
TMA'12: Proceedings of the 4th international conference on Traffic Monitoring and AnalysisWith increasing amount of network traffic, sampling techniques have become widely employed allowing monitoring and analysis of high-speed network links. Despite of all benefits, sampling methods negatively influence the accuracy of anomaly detection ...
Comments