GenMatcher: A Generic Clustering-Based Arbitrary Matching Framework

Authors:
Ping Wang

Department of Electrical and Computer Engineering, Texas A8M University, College Station, TX

Department of Electrical and Computer Engineering, Texas A8M University, College Station, TX
View Profile

,
Luke McHale

Department of Electrical and Computer Engineering, Texas A8M University, College Station, TX

Department of Electrical and Computer Engineering, Texas A8M University, College Station, TX
View Profile

,
Paul V. Gratz

Department of Electrical and Computer Engineering, Texas A8M University, College Station, TX

Department of Electrical and Computer Engineering, Texas A8M University, College Station, TX
View Profile

,
Alex Sprintson

Department of Electrical and Computer Engineering, Texas A8M University, College Station, TX

Department of Electrical and Computer Engineering, Texas A8M University, College Station, TX
View Profile

ACM Transactions on Architecture and Code Optimization Volume 15 Issue 4Article No.: 51pp 1–22https://doi.org/10.1145/3281663

Published:16 November 2018Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Packet classification methods rely upon packet content/header matching against rules. Thus, throughput of matching operations is critical in many networking applications. Further, with the advent of Software Defined Networking (SDN), efficient implementation of software approaches to matching are critical for the overall system performance.

This article presents¹ GenMatcher, a generic, software-only, arbitrary matching framework for fast, efficient searches. The key idea of our approach is to represent arbitrary rules with efficient prefix-based tries. To support arbitrary wildcards, we rearrange bits within the rules such that wildcards accumulate to one side of the bitstring. Since many non-contiguous wildcards often remain, we use multiple prefix-based tries. The main challenge in this context is to generate efficient trie groupings and expansions to support all arbitrary rules. Finding an optimal mix of grouping and expansion is an NP-complete problem.

Our contribution includes a novel, clustering-based grouping algorithm to group rules based upon their bit-level similarities. Our algorithm generates near-optimal trie groupings with low configuration times and provides significantly higher match throughput compared to prior techniques. Experiments with synthetic traffic show that our method can achieve a 58.9X speedup compared to the baseline on a single core processor under a given memory constraint.

References

{n. d.}. The CAIDA Anonymized 2012 Internet Traces—2012. Retrieved from http://www.caida.org/data/passive/passive_2012_dataset.xml.Google Scholar
Carlos E. Andrade, Mauricio G. C. Resende, Howard J. Karloff, and Flávio K. Miyazawa. 2014. Evolutionary algorithms for overlapping correlation clustering. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation (GECCO’14). ACM, New York, NY, 405--412. Google ScholarDigital Library
Arindam Banerjee, Chase Krumpelman, Joydeep Ghosh, Sugato Basu, and Raymond J. Mooney. 2005. Model-based overlapping clustering. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, 532--537. Google ScholarDigital Library
M. Bayatpour, H. Subramoni, S. Chakraborty, and D. K. Panda. 2016. Adaptive and dynamic design for MPI tag matching. In Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER). 1--10.Google Scholar
Guy E. Blelloch, Jeremy T. Fineman, and Julian Shun. 2012. Greedy sequential maximal independent set and matching are parallel on average. CoRR abs/1202.3205 (2012). arxiv:1202.3205Google Scholar
Dana Drachsler, Martin Vechev, and Eran Yahav. 2014. Practical concurrent binary search trees via logical ordering. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). ACM, New York, NY, 343--356. Google ScholarDigital Library
M. Elmahgiubi, O. Ahmed, S. Areibi, and G. Grewal. 2016. Efficient algorithm selection for packet classification using machine learning. In Proceedings of the 2016 IEEE 21st International Workshop on Computer Aided Modelling and Design of Communication Links and Networks (CAMAD). 24--30.Google Scholar
Message Passing Interface Forum. 2015. MPI: A Message-Passing Interface Standard Version 3.1. Chapter author for Collective Communication, Process Topologies, and One Sided Communications.Google Scholar
Yi Gu and Chaoli Wang. 2010. A study of hierarchical correlation clustering for scientific volume data. In Proceedings of the 6th International Conference on Advances in Visual Computing—Volume Part III (ISVC’10). Springer-Verlag, Berlin, 437--446. Google ScholarDigital Library
P. He, G. Xie, K. Salamatian, and L. Mathy. 2014. Meta-algorithms for software-based packet classification. In 2014 IEEE 22nd International Conference on Network Protocols. 308--319. Google ScholarDigital Library
Shane V. Howley and Jeremy Jones. 2012. A non-blocking internal binary search tree. In Proceedings of the 24th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’12). ACM, New York, NY, 161--171. Google ScholarDigital Library
C. L. Hsieh and N. Weng. 2016. Many-field packet classification for software-defined networking switches. In Proceedings of the 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). 13--24. Google ScholarDigital Library
T. Inoue, T. Mano, K. Mizutani, S. I. Minato, and O. Akashi. 2014. Rethinking packet classification for global network view of software-defined networking. In Proceedings of the 2014 IEEE 22nd International Conference on Network Protocols. 296--307. Google ScholarDigital Library
Kirill Kogan, Sergey Nikolenko, Ori Rottenstreich, William Culhane, and Patrick Eugster. 2014. SAX-PAC (scalable and eXpressive PAcket classification). In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM’14). ACM, New York, NY, 15--26. Google ScholarDigital Library
K. Kogan, S. I. Nikolenko, P. Eugster, A. Shalimov, and O. Rottenstreich. 2016. FIB efficiency in distributed platforms. In Proceedings of the 2016 IEEE 24th International Conference on Network Protocols (ICNP). 1--10.Google Scholar
K. Kogan, S. I. Nikolenko, P. Eugster, A. Shalimov, and O. Rottenstreich. 2017. Efficient FIB representations on distributed platforms. IEEE/ACM Transactions on Networking 99 (2017), 1--14. Google ScholarDigital Library
K. Kogan, S. I. Nikolenko, O. Rottenstreich, W. Culhane, and P. Eugster. 2016. Exploiting order independence for scalable and expressive packet classification. IEEE/ACM Transactions on Networking 24, 2 (April 2016), 1251--1264. Google ScholarDigital Library
Jungwon Lee, Hayoung Byun, Ju Hyoung Mun, and Hyesook Lim. 2017. Utilizing 2-D leaf-pushing for packet classification. Computer Communications 103 (2017), 116--129. Google ScholarDigital Library
B. Leibe, K. Mikolajczyk, and B. Schiele. 2006. Efficient clustering and matching for object class recognition. In Proc. BMVC. 81.1--81.10.Google Scholar
Pierre Lemarinier, Khalid Hasanov, Srikumar Venugopal, and Kostas Katrinis. 2016. Architecting malleable MPI applications for priority-driven adaptive scheduling. In Proceedings of the 23rd European MPI Users’ Group Meeting (EuroMPI'16). ACM, New York, NY, 74--81. Google ScholarDigital Library
Yinan Li and Jignesh M. Patel. 2013. BitWeaving: Fast scans for main memory data processing. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 289--300. Google ScholarDigital Library
Yinan Li and Jignesh M. Patel. 2014. WideTable: An accelerator for analytical data processing. Proc. VLDB Endow. 7, 10 (June 2014), 907--918. Google ScholarDigital Library
Hyesook Lim and Ha Young Byun. 2015. Packet classification using a bloom filter in a leaf-pushing area-based quad-trie. In Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS’15). IEEE Computer Society, Washington, D.C., 183--184. Google ScholarDigital Library
A. X. Liu, C. R. Meiners, and E. Torng. 2010. TCAM razor: A systematic approach towards minimizing packet classifiers in TCAMs. IEEE/ACM Transactions on Networking 18, 2 (April 2010), 490--500. Google ScholarDigital Library
A. X. Liu, C. R. Meiners, and Y. Zhou. 2008. All-match based complete redundancy removal for packet classifiers in TCAMs. In IEEE INFOCOM 2008—The 27th Conference on Computer Communications.Google Scholar
Mohammad Lotfollahi, Ramin Shirali Hossein Zade, Mahdi Jafari Siavoshani, and Mohammdsadegh Saberian. 2017. Deep packet: A novel approach for encrypted traffic classification using deep learning. CoRR abs/1709.02656 (2017). arxiv:1709.02656 DOI:http://arxiv.org/abs/1709.02656Google Scholar
H. Lu and S. Sahni. 2007. <i&g;O<i&g;(log W) multidimensional packet classification. IEEE/ACM Transactions on Networking 15, 2 (April 2007), 462--472. Google ScholarDigital Library
Chen Luo, Wei Pang, and Zhe Wang. 2014. Semi-supervised Clustering on Heterogeneous Information Networks. Springer International Publishing, Cham, 548--559.Google Scholar
Yadi Ma, Suman Banerjee, Shan Lu, and Cristian Estan. 2010. Leveraging parallelism for multi-dimensional packet classification on software routers. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’10). ACM, New York, NY, 227--238. Google ScholarDigital Library
Andrew McCallum, Kamal Nigam, and Lyle H. Ungar. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). ACM, New York, NY, 169--178. Google ScholarDigital Library
Luke Mchale, Jasson Case, Paul V. Gratz, and Alex Sprintson. 2014. Stochastic pre-classification for SDN data plane matching. In Proceedings of the 2014 IEEE 22nd International Conference on Network Protocols (ICNP’14). IEEE Computer Society, Washington, D.C., 596--602. Google ScholarDigital Library
C. R. Meiners, A. X. Liu, and E. Torng. 2012. Bit weaving: A non-prefix approach to compressing packet classifiers in TCAMs. IEEE/ACM Transactions on Networking 20, 2 (April 2012), 488--500. Google ScholarDigital Library
C. R. Meiners, A. X. Liu, E. Torng, and J. Patel. 2011. Split: Optimizing space, power, and throughput for TCAM-based classification. In Proceedings of the 2011 ACM/IEEE 7th Symposium on Architectures for Networking and Communications Systems. 200--210. Google ScholarDigital Library
S. I. Nikolenko, K. Kogan, G. Rétvári, E. R. Bérczi-Kovács, and A. Shalimov. 2016. How to represent IPv6 forwarding tables on IPv4 or MPLS dataplanes. In Proceedings of the 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 521--526.Google Scholar
Xinghao Pan, Dimitris S. Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. 2015. Parallel correlation clustering on big graphs. CoRR abs/1507.05086 (2015). arxiv:1507.05086Google Scholar
Y. Qi, B. Xu, F. He, X. Zhou, J. Yu, and J. Li. 2007. Towards optimized packet classification algorithms for multi-core network processors. In Proceedings of the 2007 International Conference on Parallel Processing (ICPP’07). Google ScholarDigital Library
Y. Qu, S. Zhou, and V. K. Prasanna. 2013. Scalable many-field packet classification on multi-core processors. In Proceedings of the 2013 25th International Symposium on Computer Architecture and High Performance Computing. 33--40. Google ScholarDigital Library
Akhtar Rasool and Nilay Khare. 2013. Generalized parallelization of string matching algorithms on SIMD architecture. International Journal of Computer Science and Infomation Security 11 (2013), 6--16.Google Scholar
Gábor Rétvári, János Tapolcai, Attila Kőrösi, András Majdán, and Zalán Heszberger. 2013. Compressing IP forwarding tables: Towards entropy bounds and beyond. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM’13). 111--122. Google ScholarDigital Library
M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. 2001. Survey and taxonomy of IP address lookup algorithms. IEEE Network 15, 2 (Mar 2001), 8--23. Google ScholarDigital Library
Y. K. Sia, H. G. Goh, S. Y. Liew, and M. L. Gan. 2015. Spanning multi-tree algorithm for node and traffic balancing in multi-sink wireless sensor networks. In Proceedings of the 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD). 2190--2195.Google Scholar
David Sidler, Zsolt István, Muhsen Owaida, and Gustavo Alonso. 2017. Accelerating pattern matching queries in hybrid CPU-FPGA architectures. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD’17). ACM, New York, NY, 403--415. Google ScholarDigital Library
H. Song, M. Kodialam, F. Hao, and T. V. Lakshman. 2010. Building scalable virtual routers with trie braiding. In 2010 Proceedings IEEE INFOCOM. 1--9. Google ScholarDigital Library
Kiri Wagstaff and Claire Cardie. 2000. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). Morgan Kaufmann, San Francisco, CA, 1103--1110. Google ScholarDigital Library
Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. 2001. Constrained K-means clustering with background knowledge. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, 577--584. Google ScholarDigital Library
Tong Yang, Gaogang Xie, YanBiao Li, Qiaobin Fu, Alex X. Liu, Qi Li, and Laurent Mathy. 2014. Guarantee IP lookup performance with FIB explosion. SIGCOMM Comput. Commun. Rev. 44, 4 (Aug. 2014), 39--50. Google ScholarDigital Library
Fang Yu, R. H. Katz, and T. V. Lakshman. 2004. Gigabit rate packet pattern-matching using TCAM. In Proceedings of the 12th IEEE International Conference on Network Protocols (ICNP’04). 174--183. Google ScholarDigital Library
Wang Z. 2015. The application of deep learning on traffic identification. BlackHat (2015). https://www.blackhat.com/docs/us-15/materials/us-15-Wang-The-Applications-Of-Deep-Learning-On-Traffic-Identification-wp.pdf.Google Scholar

Index Terms

GenMatcher: A Generic Clustering-Based Arbitrary Matching Framework
1. Networks

Recommendations

SIMD-Matcher: A SIMD-based Arbitrary Matching Framework
Packet classification methods rely upon matching packet content/header against pre-defined rules, which are generated by network applications and their configurations. With the rapid development of network technology and the fast-growing network ...
Read More
Clustering aggregation

We consider the following problem: given a set of clusterings, find a single clustering that agrees as much as possible with the input clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering ...
Read More
A fast bit-parallel multi-patterns string matching algorithm for biological sequences
ISB '10: Proceedings of the International Symposium on Biocomputing

The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 15, Issue 4
December 2018
706 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3284745
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 November 2018
- Accepted: 1 September 2018
- Revised: 1 August 2018
- Received: 1 May 2018
Published in taco Volume 15, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Arbitrary matching
correlation clustering
performance tradeoff
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 512
  Total Downloads
- Downloads (Last 12 months)65
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

GenMatcher: A Generic Clustering-Based Arbitrary Matching Framework

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

SIMD-Matcher: A SIMD-based Arbitrary Matching Framework

Clustering aggregation

A fast bit-parallel multi-patterns string matching algorithm for biological sequences