To implement virtual memory efficiently, virtual-to-physical address translation information is stored in page tables and cached in translation-lookaside buffers (TLBs). In multiprocessors with multiple TLBs, page-table modifications can result in outdated TLBs entries, the use of which can cause erroneous memory accesses.
We propose three new solutions to this TLB consistency problem, which unlike other solutions for highly-parallel shared-memory multiprocessors do not require interprocessor synchronization and communication, and neither interrupt processor execution nor introduce unnecessary serialization. The cost of these solutions is embodied in the cost of TLB reloads, which load into TLBs translation information for referenced pages. Two assume TLBs at processors and one assumes TLBs at memory.
We study their performance in scalable multiprocessor architectures with multi-stage interconnection networks via a trace-driven simulation system capable of stimulating a range of architectures using just one address trace.
Our results show that system performance improves if TLBs are located at memory, rather than processors, provided that memory is organized as multiple paging arenas, where the mapping of pages to arenas is fixed.
A class of parallel workloads can produce a number of TLB reloads, R, that grows linearly with N. A set of our simulations for processor-based TLBs validate this model.
A processor-based TLB reload costs O(log N) because of network transit. Thus, the overhead of managing processor-based TLBs, be it consistency ensuring or not, grows as R log N.
The cost of a memory-based TLB reload within a paging arena can be made smaller than that of a processor-based TLB reload, since additional network transits are not required. Simulation results show that memory-based TLBs with one paging arena exhibit generally larger miss rates than processor-based TLBs of equal size, and the related overhead is generally larger. Memory-based TLBs with two paging arenas produce smaller miss rates than processor-based TLBs of equal size, and the related overhead is generally smaller. For memory-based TLBs to maintain low overhead for large machines, it is likely that the number of paging arenas must grow as O(N).
Index Terms
- Translation-lookaside buffer consistency in highly-parallel shared-memory multiprocessors
Recommendations
Translation-Lookaside Buffer Consistency
Nine solutions to the cache consistency problem for shared-memory multiprocessors with multiple translation-lookaside buffers (TLBs) are described. A TLB's function is defined, and it is shown how TLB inconsistency arises in uniprocessor and ...
Scalable directory architecture for distributed shared memory chip multiprocessors
Traditional Directory-based cache coherence protocol is far from optimal for large-scale cache coherent shared memory multiprocessors due to the increasing latency to access directories stored in DRAM memory. Instead of keeping directories in main ...
Low-synchronization translation lookaside buffer consistency in large-scale shared-memory multiprocessors
SOSP '89: Proceedings of the twelfth ACM symposium on Operating systems principlesOperating systems for most current shared-memory multiprocessors must maintain translation lookaside buffer (TLB) consistency across processors. A processor that changes a shared page table must flush outdated mapping information from its own TLB, and ...