skip to main content
Active Messages Implementations for the Meiko CS-2February 1995
1995 Technical Report
Publisher:
  • University of California at Santa Barbara
  • Computer Science Dept. College of Engineering Santa Barbara, CA
  • United States
Published:09 February 1995
Bibliometrics
Skip Abstract Section
Abstract

Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. It is used by library and compiler writers to obtain the utmost performance and has been used to implement the novel parallel language Split-C. This paper discusses the experience we gained while implementing active messages on the Meiko CS-2, and discusses implementations for similar architectures. The CS-2 is an interesting experimental platform, as it resembles a cluster of Sparc workstations, each equipped with a dedicated communication co-processor. During our work we have identified two mismatches between the requirements of active message and the Meiko CS-2 architecture. First, architectures which only support efficient remote write operations (or DMA transfers as in the case of the CS-2) make it difficult to transfer both data and control as required by active messages. Traditional network interfaces avoid this problem because they have a single point of entry which essentially acts as a queue. To efficiently support active messages on modern network communication co-processors, hardware primitives are required which support this queue behavior. We overcame this problem by producing specialized code which runs on the communications co-processor and supports the active messages protocol. We also identify hardware primitives which are required to efficiently support active messages. The second mismatch is that active messages do not provide a non-blocking form of send, which is required to achieve the highest possible bandwidth while allowing the overlap of communication and computation when a communications co-processor is present. We propose to extend the current active message definition to include a non-blocking form of send. Our implementation of active messages results in a one-way latency of 12.3 us and achieves up to 39 MB/s for bulk transfers. Both numbers are close to optimal for the current Meiko hardware and are competitive with performance of active messages on other hardware platforms.

Contributors
  • University of California, Santa Barbara
  • California Polytechnic State University, San Luis Obispo

Recommendations