Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. It is used by library and compiler writers to obtain the utmost performance and has been used to implement the novel parallel language Split-C. This paper discusses the experience we gained while implementing active messages on the Meiko CS-2, and discusses implementations for similar architectures. The CS-2 is an interesting experimental platform, as it resembles a cluster of Sparc workstations, each equipped with a dedicated communication co-processor. During our work we have identified two mismatches between the requirements of active message and the Meiko CS-2 architecture. First, architectures which only support efficient remote write operations (or DMA transfers as in the case of the CS-2) make it difficult to transfer both data and control as required by active messages. Traditional network interfaces avoid this problem because they have a single point of entry which essentially acts as a queue. To efficiently support active messages on modern network communication co-processors, hardware primitives are required which support this queue behavior. We overcame this problem by producing specialized code which runs on the communications co-processor and supports the active messages protocol. We also identify hardware primitives which are required to efficiently support active messages. The second mismatch is that active messages do not provide a non-blocking form of send, which is required to achieve the highest possible bandwidth while allowing the overlap of communication and computation when a communications co-processor is present. We propose to extend the current active message definition to include a non-blocking form of send. Our implementation of active messages results in a one-way latency of 12.3 us and achieves up to 39 MB/s for bulk transfers. Both numbers are close to optimal for the current Meiko hardware and are competitive with performance of active messages on other hardware platforms.
Recommendations
Experience with active messages on the Meiko CS-2
IPPS '95: Proceedings of the 9th International Symposium on Parallel ProcessingActive messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. This paper discusses the experience we ...
Active messages: a mechanism for integrated communication and computation
Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message ...