Computer vision has been identified as a Grand Challenge application by the High Performance Computing and Communication initiative. With the advancement of microprocessor technology and network technology, current massively parallel machines can achieve hundreds of Gigaflops performance. These parallel machines have a distributed-memory architecture, so they can scale to large system sizes. Examples of such machines include TMC CM-5, IBM SP-2, Intel Paragon, Meiko CS-2, and Cray T3D among others. These high-performance computing platforms seem to have opened new avenues to meet the computational challenge of vision. Even though many "Gigaflops" machines have become available, straightforward approaches to parallelizing vision applications on these architectures do not yield satisfactory performance. In the distributed-memory architecture, communication operations incur considerable overheads. Due to the irregular nature of the communication in intermediate- and high-level vision algorithms, the overheads could increase with the size of the parallel system, leading to poor performance. As a consequence, the algorithms do not scale to large system sizes. It is therefore necessary to develop efficient algorithmic techniques for various vision processes to achieve larger speed-ups.The focus of our work is to develop scalable and portable parallel algorithms for computer vision tasks on distributed-memory machines. We propose a computational model for distributed-memory machines which considers communication startup cost and data transmission rate to account for the cost in data communication. To illustrate our algorithms and implementations, we parallelize vision tasks in a building detection system and in an object recognition system. Based on the model, we show scalable algorithms for several key steps in the building system, including a linear feature extraction task and a perceptual grouping task, as well as a high-level task in an object recognition system. For portable implementations, our codes are written in C and message passing standard MPI. These codes are portable to run on several high-performance platforms. Currently, they have been ported to CM-5, SP-2, and T3D. These implementations achieve fast execution of the vision tasks. For example, given a 2048 x 2048 image, the extraction of linear feature on a 512-node CM-5 can be completed in 1.118 seconds. The same task takes more than 8 minutes on a state-of-the-art Sun Sparcstation.
Cited By
- Ferruz J and Ollero A (2019). Real-Time Feature Matching in Image Sequences for Non-Structured Environments. Applications to Vehicle Guidance, Journal of Intelligent and Robotic Systems, 28:1-2, (85-123), Online publication date: 1-Jun-2000.
- Ratha N and Jain A (1999). Computer Vision Algorithms on Reconfigurable Logic Arrays, IEEE Transactions on Parallel and Distributed Systems, 10:1, (29-43), Online publication date: 1-Jan-1999.
- Bhat P, Lim Y and Prasanna V Issues in using heterogeneous HPC systems for embedded real time signal processing applications Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Index Terms
- High performance computing for vision on distributed-memory machines
Recommendations
Teaching shared memory parallel concepts with OpenMP (abstract only)
SIGCSE '14: Proceedings of the 45th ACM technical symposium on Computer science educationCurriculum 2013 brings parallelism into the CS curricular mainstream. This hands-on workshop is intended for faculty with little or no background in parallel computing. OpenMP is a platform independent, industry-standard library for shared-memory ...