Abstract
Clustering large numbers of data points is a very computationally demanding task that often needs to be accelerated in order to be useful in practical applications. This work focuses on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which is one of the state-of-the-art clustering algorithms, and targets its acceleration using an FPGA device. The article presents an optimized, scalable, and parameterizable architecture that takes advantage of the internal memory structure of modern FPGAs in order to deliver a high-performance clustering system. Post-synthesis simulation results show that the developed system can obtain mean speedups of 31× in real-world tests and 202× in synthetic tests when compared to state-of-the-art software counterparts running on a quad-core 3.4GHz Intel i7-2600k. Additionally, this implementation is also capable of clustering data with any number of dimensions without impacting the performance.
- Elke Achtert, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek. 2013. Interactive data mining with 3D-parallel-coordinate-trees. In 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD'13). ACM, New York, NY, 1009--1012. DOI:http://dx.doi.org/10.1145/2463676.2463696 Google ScholarDigital Library
- Guilherme Andrade, Gabriel Ramos, Daniel Madeira, Rafael Sachetto, Renato Ferreira, and Leonardo Rocha. 2013. G-DBSCAN: A {GPU} accelerated algorithm for density-based clustering. Procedia Computer Science 18, 0 (2013), 369--378. DOI:http://dx.doi.org/10.1016/j.procs.2013.05.200Google ScholarCross Ref
- Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering points to identify the clustering structure. ACM Press, 49--60. Google ScholarDigital Library
- A. Annovi and M. Beretta. 2010. A fast general-purpose clustering algorithm based on FPGAs for high-throughput data processing. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 617, 13 (2010), 254--257. DOI:http://dx.doi.org/10.1016/j.nima.2009.10.046Google ScholarCross Ref
- Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD'90). ACM, New York, NY, USA, 322--331. DOI:http://dx.doi.org/10.1145/93597.98741 Google ScholarDigital Library
- Min Chen, Xuedong Gao, and HuiFei Li. 2010. Parallel DBSCAN with priority R-tree. In The 2010 2nd IEEE International Conference on Information Management and Engineering (ICIME). 508--511. DOI:http://dx.doi.org/10.1109/ICIME.2010.5477926Google ScholarCross Ref
- M. Daszykowski, B. Walczak, and D. L. Massart. 2001. Looking for natural patterns in data: Part 1. Density-based approach. Chemometrics and Intelligent Laboratory Systems 56, 2 (2001), 83--92. DOI:http://dx.doi.org/10.1016/S0169-7439(01)00111-3Google ScholarCross Ref
- Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference. 147--151.Google ScholarCross Ref
- J. A. Hartigan and M. A. Wong. 1979. A K-means clustering algorithm. Applied Statistics 28 (1979), 100--108.Google ScholarCross Ref
- Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng, and Jianping Fan. 2011. MR-DBSCAN: An efficient parallel density-based clustering algorithm using mapreduce. In 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS). 473--480. DOI:http://dx.doi.org/10.1109/ICPADS.2011.83 Google ScholarDigital Library
- Hanaa M. Hussain, Khaled Benkrid, Ahmet T. Erdogan, and Huseyin Seker. 2011. Highly parameterized K-means clustering on FPGAs: Comparative results with GPPs and GPUs. In ReConFig, Peter M. Athanas, Jrgen Becker, and Ren Cumplido (Eds.). IEEE Computer Society, 475--480. Google ScholarDigital Library
- Lingjuan Li and Yang Xi. 2011. Research on clustering algorithm and its parallelization strategy. 2012 4th International Conference on Computational and Information Sciences 0 (2011), 325--328. DOI:http://dx.doi.org/10.1109/ICCIS.2011.223 Google ScholarDigital Library
- R. Llet, M. C. Ortiz, L. A. Sarabia, and M. S. Snchez. 2004. Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes. Analytica Chimica Acta 515, 1 (2004), 87--100. DOI:http://dx.doi.org/10.1016/j.aca.2003.12.020 Papers presented at the 5th Colloquium Chemiometricum Mediterraneum.Google ScholarCross Ref
- Hans-peter Kriegel Martin Ester, Jrg S, and Xiaowei Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. AAAI Press, 226--231.Google Scholar
- Tsutomu Maruyama. 2006. Real-time K-means clustering for color images on reconfigurable hardware. In ICPR (2) (2006-09-25). IEEE Computer Society, 816--819. Google ScholarDigital Library
- Microsoft. 2014. Most Cited Data Mining Articles on Microsoft Academic Search. Retrieved from http://academic.research.microsoft.com/RankList?entitytype=1&topDomainID=2&subDomainID=7&last=0&start=1&end=100.Google Scholar
- Neil Scicluna and Christos-Savvas Bouganis. 2014. FPGA-based parallel DBSCAN architecture. In Reconfigurable Computing: Architectures, Tools, and Applications, Diana Goehringer, MarcoDomenico Santambrogio, Joo M. P. Cardoso, and Koen Bertels (Eds.). Lecture Notes in Computer Science, Vol. 8405. Springer International Publishing, 1--12. DOI:http://dx.doi.org/10.1007/978-3-319-05960-0_1Google Scholar
- Qi Yue Shaobo Shi and Qin Wang. 2014. FPGA based accelerator for parallel DBSCAN algorithm. Computer Modelling & New Technologies 18, 2 (2014), 135--142.Google Scholar
- A. Shimada, Hongbo Zhu, and T. Shibata. 2013. A VLSI DBSCAN processor composed as an array of micro agents having self-growing interconnects. In 2013 IEEE International Symposium on Circuits and Systems (ISCAS). 2062--2065. DOI:http://dx.doi.org/10.1109/ISCAS.2013.6572278Google ScholarCross Ref
- R. J. Thapa, C. Trefftz, and G. Wolffe. 2010. Memory-efficient implementation of a graphics processor-based cluster detection algorithm for large spatial databases. In 2010 IEEE International Conference on Electro/Information Technology (EIT). 1--5. DOI:http://dx.doi.org/10.1109/EIT.2010.5612134Google ScholarCross Ref
- Andrea Vattani. 2011. k-means requires exponentially many iterations even in the plane. Discrete & Computational Geometry 45, 4 (2011), 596--616. DOI:http://dx.doi.org/10.1007/s00454-011-9340-1 Google ScholarDigital Library
- Tom White. 2009. Hadoop: The Definitive Guide (1st ed.). O'Reilly Media, Inc. Google ScholarDigital Library
- S. Bayliss, F. Winterstein, and G. A. Constantinides. 2013. FPGA-based K-means clustering using tree-based data structures. In 2013 23rd International Conference on Field Programmable Logic and Applications (FPL). 1--6. DOI:http://dx.doi.org/10.1109/FPL.2013.6645501Google Scholar
- Xiang Xiao, Tuo Shi, Pranav Vaidya, and Jaehwan John Lee. 2008. R-tree: A hardware implementation. In CDES (2009-12-05), Hamid R. Arabnia (Ed.). CSREA Press, 3--9.Google Scholar
Index Terms
- ARC 2014: A Multidimensional FPGA-Based Parallel DBSCAN Architecture
Recommendations
A new hybrid method based on partitioning-based DBSCAN and ant clustering
Clustering problem is an unsupervised learning problem. It is a procedure that partition data objects into matching clusters. The data objects in the same cluster are quite similar to each other and dissimilar in the other clusters. Density-based ...
ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree
Special Section on the 2014 International Symposium on Applied Reconfigurable ComputingThe embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we ...
Merging DBSCAN and Density Peak for Robust Clustering
Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time SeriesAbstractIn data clustering, density based algorithms are well known for the ability of detecting clusters of arbitrary shapes. DBSCAN is a widely used density based clustering approach, and the recently proposed density peak algorithm has shown ...
Comments