This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs, and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce, and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised, and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms and deep dives into several applications make the book equally useful for researchers, students, and practitioners.
Cited By
- Chen L, Liu W, Chen Y and Wang W (2024). Communication-Efficient Design for Quantized Decentralized Federated Learning, IEEE Transactions on Signal Processing, 72, (1175-1188), Online publication date: 1-Jan-2024.
- Ye H, He S and Chang X (2024). DINE: Decentralized Inexact Newton With Exact Linear Convergence Rate, IEEE Transactions on Signal Processing, 72, (143-156), Online publication date: 1-Jan-2024.
- Fereydounian M, Mokhtari A, Pedarsani R and Hassani H (2023). Provably Private Distributed Averaging Consensus: An Information-Theoretic Approach, IEEE Transactions on Information Theory, 69:11, (7317-7335), Online publication date: 1-Nov-2023.
- Maros M and Scutari G Acceleration in distributed sparse regression Proceedings of the 36th International Conference on Neural Information Processing Systems, (36832-36844)
- Kovalev D, Beznosikov A, Borodich E, Gasnikov A and Scutari G Optimal gradient sliding and its application to distributed optimization under similarity Proceedings of the 36th International Conference on Neural Information Processing Systems, (33494-33507)
- Wang B, Safaryan M and Richtárik P Theoretically better and numerically faster distributed optimization with smoothness-aware quantization techniques Proceedings of the 36th International Conference on Neural Information Processing Systems, (9841-9852)
- Jin C, Li F, Ma S and Wang Y (2022). Sampling scheme-based classification rule mining method using decision tree in big data environment, Knowledge-Based Systems, 244:C, Online publication date: 23-May-2022.
- Ghosh S, Aquino B and Gupta V (2022). EventGraD, Neurocomputing, 483:C, (474-487), Online publication date: 28-Apr-2022.
- Eetha S, P.K. S, Pant V, Vikram S, Mody M and Purnaprajna M (2021). TileNET, Microprocessors & Microsystems, 83:C, Online publication date: 1-Jun-2021.
- Šabić E, Keeley D, Henderson B and Nannemann S (2021). Healthcare and anomaly detection: using machine learning to predict anomalies in heart rate data, AI & Society, 36:1, (149-158), Online publication date: 1-Mar-2021.
- Quoc D, Gregor F, Arnautov S, Kunkel R, Bhatotia P and Fetzer C secureTF Proceedings of the 21st International Middleware Conference, (44-59)
- Heidarshenas A, Gangwani T, Yesil S, Morrison A and Torrellas J Snug Proceedings of the 34th ACM International Conference on Supercomputing, (1-13)
- Du B, Zhou J and Sun D (2020). Improving the Convergence of Distributed Gradient Descent via Inexact Average Consensus, Journal of Optimization Theory and Applications, 185:2, (504-521), Online publication date: 1-May-2020.
- Zhao Y and Liu Q (2020). A consensus algorithm based on collective neurodynamic system for distributed optimization with linear and bound constraints, Neural Networks, 122:C, (144-151), Online publication date: 1-Feb-2020.
- Yu Y, Wu J and Huang L Double quantization for communication-efficient distributed optimization Proceedings of the 33rd International Conference on Neural Information Processing Systems, (4438-4449)
- Bolón-Canedo V and Alonso-Betanzos A (2019). Ensembles for feature selection, Information Fusion, 52:C, (1-12), Online publication date: 1-Dec-2019.
- Wang H and He K (2019). Improving Test and Diagnosis Efficiency through Ensemble Reduction and Learning, ACM Transactions on Design Automation of Electronic Systems, 24:5, (1-26), Online publication date: 19-Oct-2019.
- Iakovidou C and Wei E Nested Distributed Gradient Methods with Stochastic Computation Errors 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), (339-346)
- Vogel R, Bellet A, Clémençon S, Jelassi O and Papa G Trade-Offs in Large-Scale Distributed Tuplewise Estimation And Learning Machine Learning and Knowledge Discovery in Databases, (229-245)
- Bolón-Canedo V, Sechidis K, Sánchez-Maroño N, Alonso-Betanzos A and Brown G (2022). Insights into distributed feature ranking, Information Sciences: an International Journal, 496:C, (378-398), Online publication date: 1-Sep-2019.
- Kabra A, Xue Y and Gomes C GPU-accelerated principal-agent game for scalable citizen science Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies, (165-173)
- Tavara S (2019). Parallel Computing of Support Vector Machines, ACM Computing Surveys, 51:6, (1-38), Online publication date: 27-Feb-2019.
- Zhu J, Xie P, Zhang M, Zheng R, Xing L, Wu Q and Bueno Á (2019). Distributed Stochastic Subgradient Projection Algorithms Based on Weight-Balancing over Time-Varying Directed Graphs, Complexity, 2019, Online publication date: 1-Jan-2019.
- Alistarh D, Allen-Zhu Z and Li J Byzantine stochastic gradient descent Proceedings of the 32nd International Conference on Neural Information Processing Systems, (4618-4628)
- Wang H, Li J, He K and Cai W Hierarchical ensemble learning for resource-aware FPGA computing Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, (1-2)
- Liu Y, Liu J and Basar T Gossip Gradient Descent Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, (1995-1997)
- Golubtsov P (2018). The Transition from A Priori to A Posteriori Information, Automatic Documentation and Mathematical Linguistics, 52:4, (203-213), Online publication date: 1-Jul-2018.
- Yang Z, Wang C, Zhang Z and Li J (2018). Random Barzilai–Borwein step size for mini-batch algorithms, Engineering Applications of Artificial Intelligence, 72:C, (124-135), Online publication date: 1-Jun-2018.
- Jo S, Yoo J and Kang U Fast and Scalable Distributed Loopy Belief Propagation on Real-World Graphs Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, (297-305)
- Barbos A, Caron F, Giovannelli J and Doucet A Clone MCMC Proceedings of the 31st International Conference on Neural Information Processing Systems, (5027-5035)
- Alistarh D, Grubic D, Li J, Tomioka R and Vojnovic M QSGD Proceedings of the 31st International Conference on Neural Information Processing Systems, (1707-1718)
- Zhang H, Hao C, Wu Y and Li M (2017). Towards a scalable and energy-efficient resource manager for coupling cluster computing with distributed embedded computing, Cluster Computing, 20:4, (3707-3720), Online publication date: 1-Dec-2017.
- Luo G (2017). Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution, ACM SIGKDD Explorations Newsletter, 19:2, (13-24), Online publication date: 21-Nov-2017.
- Fathi F, Abghour N and Ouzzif M From Big data platforms to smarter solution, with intelligent learning Proceedings of the 2017 International Conference on Cloud and Big Data Computing, (11-16)
- Ai W, Chen W and Xie J (2017). Distributed learning for feedforward neural networks with random weights using an event-triggered communication scheme, Neurocomputing, 224:C, (184-194), Online publication date: 8-Feb-2017.
- Ai W, Chen W and Xie J (2016). A zero-gradient-sum algorithm for distributed cooperative learning using a feedforward neural network with random weights, Information Sciences: an International Journal, 373:C, (404-418), Online publication date: 10-Dec-2016.
- Petrou C and Paraskevas M Signal Processing Techniques Restructure The Big Data Era Proceedings of the 20th Pan-Hellenic Conference on Informatics, (1-6)
- Wu Z, Hahn E, Günay A, Zhang L and Liu Y GPU-accelerated value iteration for the computation of reachability probabilities in MDPs Proceedings of the Twenty-second European Conference on Artificial Intelligence, (1726-1727)
- Chen T and Guestrin C XGBoost Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (785-794)
- (2016). MapReduce based distributed learning algorithm for Restricted Boltzmann Machine, Neurocomputing, 198:C, (4-11), Online publication date: 19-Jul-2016.
- Martínez-Angeles C, Wu H, Dutra I, Costa V and Buenabad-Chávez J (2016). Relational Learning with GPUs, International Journal of Parallel Programming, 44:3, (663-685), Online publication date: 1-Jun-2016.
- (2015). Towards scalable fuzzy-rough feature selection, Information Sciences: an International Journal, 323:C, (1-15), Online publication date: 1-Dec-2015.
- Vranjković V, Struharik R and Novak L (2015). Hardware acceleration of homogeneous and heterogeneous ensemble classifiers, Microprocessors & Microsystems, 39:8, (782-795), Online publication date: 1-Nov-2015.
- Bolón-Canedo V, Sánchez-Maroño N and Alonso-Betanzos A (2015). Recent advances and emerging challenges of feature selection in the context of big data, Knowledge-Based Systems, 86:C, (33-45), Online publication date: 1-Sep-2015.
- Hadian A and Shahrivari S (2014). High performance parallel $$k$$k-means clustering for disk-resident datasets on multi-core CPUs, The Journal of Supercomputing, 69:2, (845-863), Online publication date: 1-Aug-2014.
- Devooght R, Mantrach A, Kivimäki I, Bersini H, Jaimes A and Saerens M Random walks based modularity Proceedings of the 23rd international conference on World wide web, (213-224)
- Bordawekar R, Blainey B and Apte C (2014). Analyzing analytics, ACM SIGMOD Record, 42:4, (17-28), Online publication date: 28-Feb-2014.
- Miller L, Gazan R and Still S Unsupervised classification and visualization of unstructured text for the support of interdisciplinary collaboration Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, (1033-1042)
- McMahan H, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, Chikkerur S, Liu D, Wattenberg M, Hrafnkelsson A, Boulos T and Kubica J Ad click prediction Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (1222-1230)
- Zheng L and Mengshoel O Optimizing parallel belief propagation in junction treesusing regression Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (757-765)
- Zheng L and Mengshoel O Exploring multiple dimensions of parallelism in junction tree message passing Proceedings of the 2013 UAI Conference on Application Workshops: Big Data meet Complex Models and Models for Spatial, Temporal and Network Data - Volume 1024, (87-96)
- Chrysos G, Dagritzikos P, Papaefstathiou I and Dollas A (2013). HC-CART, ACM Transactions on Architecture and Code Optimization, 9:4, (1-25), Online publication date: 1-Jan-2013.
- Daumé H, Phillips J, Saha A and Venkatasubramanian S Efficient protocols for distributed classification and optimization Proceedings of the 23rd international conference on Algorithmic Learning Theory, (154-168)
- Langford J (2012). Parallel machine learning on big data, XRDS: Crossroads, The ACM Magazine for Students, 19:1, (60-62), Online publication date: 1-Sep-2012.
- Yang Z and Bajwa W RD-SVM: A resilient distributed support vector machine 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2444-2448)
Index Terms
- Scaling up Machine Learning: Parallel and Distributed Approaches
Recommendations
Machine Learning: The State of the Art
The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...