ABSTRACT
Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. The efficiency of a cache for this application critically depends on the placement function to reduce conflict misses. Traditional placement functions use a one-level mapping that naively partitions trie-nodes into cache sets. However, as a significant percentage of trie nodes are not useful, these schemes suffer from a non-uniform distribution of useful nodes to sets. This in turn results in increased conflict misses. Newer organizations such as variable associativity caches achieve flexibility in placement at the expense of increased hit-latency. This makes them unsuitable for L1 caches.We propose a novel two-level mapping framework that retains the hit-latency of one-level mapping yet incurs fewer conflict misses. This is achieved by introducing a secondlevel mapping which reorganizes the nodes in the naive initial partitions into refined partitions with near-uniform distribution of nodes. Further as this remapping is accomplished by simply adapting the index bits to a given routing table the hit-latency is not affected. We propose three new schemes which result in up to 16% reduction in the number of misses and 13% speedup in memory access time. In comparison, an XOR-based placement scheme known to perform extremely well for general purpose architectures, can obtain up to 2% speedup in memory access time.
- J-L. Baer, D. Low, P. Crowley and N. Sidhwaney. Memory hierarchy design for a multiprocessor look-up engine. In Proc. of Int. Conf. on Parallel Architectures and Compilation Techniques, 2003. Google ScholarDigital Library
- A. Brodnik, S. Carlsson, M. Degermark, and S. Pink. Small forwarding tables for fast routing lookups.In Proc. of ACM SIGCOMM'97, 1997. Google ScholarDigital Library
- H.J.Chao. Next Generation Routers. Invited paper, IEEE Proceeding, vol. 90, no. 9, pp. 1518--1558, 2002.Google ScholarCross Ref
- T. Chieueh and K. Gopalan. Improving Route Lookup performance using network processor cache. In IEEE/ACM SC2002 Conf., 2002. Google ScholarDigital Library
- T. Chieueh and P. Pradhan. Cache memory design for network processors. In Proc.of Int. Symp. on High Performance Computer Architecture, 2000.Google Scholar
- J. Edler and M.D. Hill. Dinero IV trace-driven uniprocessor cache simulator. www.cs.wisc.edu/markhill/DineroIV/, 1998.Google Scholar
- A. González, M. Valero, N. Topham, and J.M. Parcerisa. Eliminating cache conflict misses through XOR-based placement functions. Int. Conf. on Supercomputing, 1997. Google ScholarDigital Library
- E.G.Hallnor and S.K.Reinhardt. A fully associative software-managed cache design. In Proc. of Int. Symp. Computer Architecture 2000, 2000. Google ScholarDigital Library
- J.L. Hennessey and D.A. Patterson. Computer architecture: a quantitative approach-3rd Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 2003. Google ScholarDigital Library
- M.Kharbutli, K.Irwin, Y.Solilin, J.Lee. Using prime numbers for for cache indexing to eliminate conflict misses. In Proc. of IEEE Int. Symp. on High Performance Computer Architecture, 2004. Google ScholarDigital Library
- B.Lampson, V.Srinivasan, and G.Varghese. IP lookup using multiway and multicolumn search. Proc. of IEEE Infocom, 98Google Scholar
- J. Mudigonda, H.M. Vin and R. Yavatkar. Overcoming the memory wall in packet processing: hammers or ladders? .In Proc. of Int. Symp. on Architectures for Networking and Communication Systems, 2005. Google ScholarDigital Library
- G. Narlikar and F. Zane. Performance modeling for fast IP lookups. In Proc. of ACM SIGMETRICS, 2001 Google ScholarDigital Library
- S.Nilsson and G.Karlsson. IP-address lookup using LC-tries. IEEE Journal on Selected Areas in Communications, 1999. Google ScholarDigital Library
- M.K. Qureshi, D. Thompson, Y.N. Patt. The V-Way Cache: Demand Based Associativity via Global Replacement. In Proc. of Int. Symp. Computer Architecture, 2005. Google ScholarDigital Library
- K. Rajan and R. Govindarajan. A Heterogeneously Segmented Cache architecture for a packet forwarding engine. In Int. Conf. on Supercomputing, 2005. Google ScholarDigital Library
- V.C.Ravikumar, R.Mahapatra, J.C.Liu. Modified LC-trie based efficient routing lookup. IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'02), 2002. Google ScholarDigital Library
- M. Ruiz-Sanchez, E. Biersack, and W. Dabbous. Survey and taxonomy of IP address lookup algorithms. IEEE Network Magazine, 2001. Google ScholarDigital Library
- D. Thiebaut, J.L. Wolf, H.S.Stone. Synthetic traces for trace-driven simulation of cache memories. IEEE Transactions on Computers, 1992. Google ScholarDigital Library
- B.Talbot, T.Sherwood, and B.Lin. IP caching for terabit speed routers. Proc. of IEEE Globcom, 1999.Google ScholarCross Ref
- University of Oregon Route Views Project www.routeviews.org.Google Scholar
Index Terms
- Two-level mapping based cache index selection for packet forwarding engines
Recommendations
A Novel Cache Architecture and Placement Framework for Packet Forwarding Engines
Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. With the requirement to process packets at line rates, high-performance routers need to forward millions of packets every second with each packet ...
Design and performance evaluation of a cache assist to implement selective caching
ICCD '97: Proceedings of the 1997 International Conference on Computer Design (ICCD '97)Conventional cache architectures exploit locality, but do so rather blindly. By forcing all references through a single structure, the cache's effectiveness on many references is reduced. This paper presents a cache assist namely the annex cache which ...
Randomized Cache Placement for Eliminating Conflicts
Special issue on cache memory and related problemsApplications with regular patterns of memory access can experience high levels of cache conflict misses. In shared-memory multiprocessors conflict misses can be increased significantly by the data transpositions required for parallelization. Techniques ...
Comments