This dissertation addresses the problem of how logic programs can be made to execute at high speeds. Prolog, chosen as a representative logic programming language, differs from procedural languages in that it is applicative, nondeterminate and uses unification as its primary operation. Program performance is directly related to memory performance because high-speed processors are ultimately limited by memory bandwidth and architectures that require less bandwidth have greater potential for high performance. This dissertation reports the dynamic data and instruction referencing characteristics of both sequential and parallel Prolog architectures and corresponding uniprocessor and multiprocessor memory-hierarchy performance tradeoffs.
Initially, a family of canonical architectures, corresponding closely to Prolog, is defined from the principles of ideal machine architectures of Flynn, and is then refined into the realizable Warren Abstract Machine (WAM) architecture. The memory-referencing behavior of these architectures is examined by tracing memory references during emulation of a set of Prolog benchmarks. Measurements of the canonical architectures indicate the upper memory-performance bounds of sequential execution. Measurements of the WAM provide frequencies of memory references and indicate that the WAM approaches the performance of the canonical Prolog architectures on current hosts.
Two-level memory hierarchies for both sequential (WAM) and parallel (PWAM) Prolog architectures are modeled. PWAM is the Restricted-AND Parallel architecture of Hermenegildo. Local memory designs are simulated using memory traces, whereas main memory designs are analyzed with queueing models. The results show that small buffers (256 words or less) can significantly reduce Prolog's memory bandwidth requirement, primarily by capturing shallow backtracking information. Larger, more general local memories, such as caches, are necessary in high-performance systems to further reduce memory traffic. Local memory consistency protocols for a shared memory PWAM multiprocessor are analyzed. Measurements indicate that the memory-referencing overheads of exploiting Restricted-AND Parallelism are minor. These results show, however, that as few as eight high-performance processing elements can saturate a shared bus. With emerging bus technology and properly interleaved shared-memory, limited-size multiprocessors of this type have great potential for cost-effective speedups. This dissertation provides previously unavailable information concerning the memory-referencing characteristics of logic programming languages executing on hierarchical memory organizations, thus contributing to processor memory design.
Cited By
- Singhal A and Patt Y A high performance Prolog processor with multiple function units Proceedings of the 16th annual international symposium on Computer architecture, (195-202)
- Singhal A and Patt Y (1989). A high performance Prolog processor with multiple function units, ACM SIGARCH Computer Architecture News, 17:3, (195-202), Online publication date: 1-Jun-1989.
- Humphrey S and Krovetz B (1988). Selected AI-related dissertations, ACM SIGART Bulletin:104, (26), Online publication date: 1-Apr-1988.
- Singhal A and Patt Y Implementing a Prolog machine with multiple functional units Proceedings of the 21st annual workshop on Microprogramming and microarchitecture, (41-49)
- Hermenegildo M and Warren R (1987). Designing a high performance parallel logic programming system, ACM SIGARCH Computer Architecture News, 15:1, (43-52), Online publication date: 1-Mar-1987.
Index Terms
- Studies in Prolog architectures
Recommendations
Towards a jitting VM for prolog execution
PPDP '10: Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programmingMost Prolog implementations are implemented in low-level languages such as C and are based on a variation of the WAM instruction set, which enhances their performance but makes them hard to write. In addition, many of the more dynamic features of Prolog ...
Performance studies of a parallel Prolog architecture
ISCA '87: Proceedings of the 14th annual international symposium on Computer architectureThis paper presents a new multiprocessor architecture for the parallel execution of logic programs, developed as part of the Aquarius Project. This architecture is designed to support AND-parallelism, OR-parallelism, and intelligent backtracking. We ...