skip to main content
Skip header Section
Scaling up Machine Learning: Parallel and Distributed ApproachesDecember 2011
Publisher:
  • Cambridge University Press
  • 40 W. 20 St. New York, NY
  • United States
ISBN:978-0-521-19224-8
Published:30 December 2011
Pages:
496
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs, and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce, and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised, and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms and deep dives into several applications make the book equally useful for researchers, students, and practitioners.

Cited By

  1. Chen L, Liu W, Chen Y and Wang W (2024). Communication-Efficient Design for Quantized Decentralized Federated Learning, IEEE Transactions on Signal Processing, 72, (1175-1188), Online publication date: 1-Jan-2024.
  2. Ye H, He S and Chang X (2024). DINE: Decentralized Inexact Newton With Exact Linear Convergence Rate, IEEE Transactions on Signal Processing, 72, (143-156), Online publication date: 1-Jan-2024.
  3. Fereydounian M, Mokhtari A, Pedarsani R and Hassani H (2023). Provably Private Distributed Averaging Consensus: An Information-Theoretic Approach, IEEE Transactions on Information Theory, 69:11, (7317-7335), Online publication date: 1-Nov-2023.
  4. Maros M and Scutari G Acceleration in distributed sparse regression Proceedings of the 36th International Conference on Neural Information Processing Systems, (36832-36844)
  5. Kovalev D, Beznosikov A, Borodich E, Gasnikov A and Scutari G Optimal gradient sliding and its application to distributed optimization under similarity Proceedings of the 36th International Conference on Neural Information Processing Systems, (33494-33507)
  6. Wang B, Safaryan M and Richtárik P Theoretically better and numerically faster distributed optimization with smoothness-aware quantization techniques Proceedings of the 36th International Conference on Neural Information Processing Systems, (9841-9852)
  7. Jin C, Li F, Ma S and Wang Y (2022). Sampling scheme-based classification rule mining method using decision tree in big data environment, Knowledge-Based Systems, 244:C, Online publication date: 23-May-2022.
  8. Ghosh S, Aquino B and Gupta V (2022). EventGraD, Neurocomputing, 483:C, (474-487), Online publication date: 28-Apr-2022.
  9. Eetha S, P.K. S, Pant V, Vikram S, Mody M and Purnaprajna M (2021). TileNET, Microprocessors & Microsystems, 83:C, Online publication date: 1-Jun-2021.
  10. Šabić E, Keeley D, Henderson B and Nannemann S (2021). Healthcare and anomaly detection: using machine learning to predict anomalies in heart rate data, AI & Society, 36:1, (149-158), Online publication date: 1-Mar-2021.
  11. ACM
    Quoc D, Gregor F, Arnautov S, Kunkel R, Bhatotia P and Fetzer C secureTF Proceedings of the 21st International Middleware Conference, (44-59)
  12. ACM
    Heidarshenas A, Gangwani T, Yesil S, Morrison A and Torrellas J Snug Proceedings of the 34th ACM International Conference on Supercomputing, (1-13)
  13. Du B, Zhou J and Sun D (2020). Improving the Convergence of Distributed Gradient Descent via Inexact Average Consensus, Journal of Optimization Theory and Applications, 185:2, (504-521), Online publication date: 1-May-2020.
  14. Zhao Y and Liu Q (2020). A consensus algorithm based on collective neurodynamic system for distributed optimization with linear and bound constraints, Neural Networks, 122:C, (144-151), Online publication date: 1-Feb-2020.
  15. Yu Y, Wu J and Huang L Double quantization for communication-efficient distributed optimization Proceedings of the 33rd International Conference on Neural Information Processing Systems, (4438-4449)
  16. Bolón-Canedo V and Alonso-Betanzos A (2019). Ensembles for feature selection, Information Fusion, 52:C, (1-12), Online publication date: 1-Dec-2019.
  17. ACM
    Wang H and He K (2019). Improving Test and Diagnosis Efficiency through Ensemble Reduction and Learning, ACM Transactions on Design Automation of Electronic Systems, 24:5, (1-26), Online publication date: 19-Oct-2019.
  18. Iakovidou C and Wei E Nested Distributed Gradient Methods with Stochastic Computation Errors 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), (339-346)
  19. Vogel R, Bellet A, Clémençon S, Jelassi O and Papa G Trade-Offs in Large-Scale Distributed Tuplewise Estimation And Learning Machine Learning and Knowledge Discovery in Databases, (229-245)
  20. Bolón-Canedo V, Sechidis K, Sánchez-Maroño N, Alonso-Betanzos A and Brown G (2022). Insights into distributed feature ranking, Information Sciences: an International Journal, 496:C, (378-398), Online publication date: 1-Sep-2019.
  21. ACM
    Kabra A, Xue Y and Gomes C GPU-accelerated principal-agent game for scalable citizen science Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies, (165-173)
  22. ACM
    Tavara S (2019). Parallel Computing of Support Vector Machines, ACM Computing Surveys, 51:6, (1-38), Online publication date: 27-Feb-2019.
  23. Zhu J, Xie P, Zhang M, Zheng R, Xing L, Wu Q and Bueno Á (2019). Distributed Stochastic Subgradient Projection Algorithms Based on Weight-Balancing over Time-Varying Directed Graphs, Complexity, 2019, Online publication date: 1-Jan-2019.
  24. Alistarh D, Allen-Zhu Z and Li J Byzantine stochastic gradient descent Proceedings of the 32nd International Conference on Neural Information Processing Systems, (4618-4628)
  25. Wang H, Li J, He K and Cai W Hierarchical ensemble learning for resource-aware FPGA computing Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, (1-2)
  26. Liu Y, Liu J and Basar T Gossip Gradient Descent Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, (1995-1997)
  27. Golubtsov P (2018). The Transition from A Priori to A Posteriori Information, Automatic Documentation and Mathematical Linguistics, 52:4, (203-213), Online publication date: 1-Jul-2018.
  28. Yang Z, Wang C, Zhang Z and Li J (2018). Random Barzilai–Borwein step size for mini-batch algorithms, Engineering Applications of Artificial Intelligence, 72:C, (124-135), Online publication date: 1-Jun-2018.
  29. ACM
    Jo S, Yoo J and Kang U Fast and Scalable Distributed Loopy Belief Propagation on Real-World Graphs Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, (297-305)
  30. Barbos A, Caron F, Giovannelli J and Doucet A Clone MCMC Proceedings of the 31st International Conference on Neural Information Processing Systems, (5027-5035)
  31. Alistarh D, Grubic D, Li J, Tomioka R and Vojnovic M QSGD Proceedings of the 31st International Conference on Neural Information Processing Systems, (1707-1718)
  32. Zhang H, Hao C, Wu Y and Li M (2017). Towards a scalable and energy-efficient resource manager for coupling cluster computing with distributed embedded computing, Cluster Computing, 20:4, (3707-3720), Online publication date: 1-Dec-2017.
  33. ACM
    Luo G (2017). Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution, ACM SIGKDD Explorations Newsletter, 19:2, (13-24), Online publication date: 21-Nov-2017.
  34. ACM
    Fathi F, Abghour N and Ouzzif M From Big data platforms to smarter solution, with intelligent learning Proceedings of the 2017 International Conference on Cloud and Big Data Computing, (11-16)
  35. Ai W, Chen W and Xie J (2017). Distributed learning for feedforward neural networks with random weights using an event-triggered communication scheme, Neurocomputing, 224:C, (184-194), Online publication date: 8-Feb-2017.
  36. Ai W, Chen W and Xie J (2016). A zero-gradient-sum algorithm for distributed cooperative learning using a feedforward neural network with random weights, Information Sciences: an International Journal, 373:C, (404-418), Online publication date: 10-Dec-2016.
  37. ACM
    Petrou C and Paraskevas M Signal Processing Techniques Restructure The Big Data Era Proceedings of the 20th Pan-Hellenic Conference on Informatics, (1-6)
  38. Wu Z, Hahn E, Günay A, Zhang L and Liu Y GPU-accelerated value iteration for the computation of reachability probabilities in MDPs Proceedings of the Twenty-second European Conference on Artificial Intelligence, (1726-1727)
  39. ACM
    Chen T and Guestrin C XGBoost Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (785-794)
  40. (2016). MapReduce based distributed learning algorithm for Restricted Boltzmann Machine, Neurocomputing, 198:C, (4-11), Online publication date: 19-Jul-2016.
  41. Martínez-Angeles C, Wu H, Dutra I, Costa V and Buenabad-Chávez J (2016). Relational Learning with GPUs, International Journal of Parallel Programming, 44:3, (663-685), Online publication date: 1-Jun-2016.
  42. (2015). Towards scalable fuzzy-rough feature selection, Information Sciences: an International Journal, 323:C, (1-15), Online publication date: 1-Dec-2015.
  43. Vranjković V, Struharik R and Novak L (2015). Hardware acceleration of homogeneous and heterogeneous ensemble classifiers, Microprocessors & Microsystems, 39:8, (782-795), Online publication date: 1-Nov-2015.
  44. Bolón-Canedo V, Sánchez-Maroño N and Alonso-Betanzos A (2015). Recent advances and emerging challenges of feature selection in the context of big data, Knowledge-Based Systems, 86:C, (33-45), Online publication date: 1-Sep-2015.
  45. Hadian A and Shahrivari S (2014). High performance parallel $$k$$k-means clustering for disk-resident datasets on multi-core CPUs, The Journal of Supercomputing, 69:2, (845-863), Online publication date: 1-Aug-2014.
  46. ACM
    Devooght R, Mantrach A, Kivimäki I, Bersini H, Jaimes A and Saerens M Random walks based modularity Proceedings of the 23rd international conference on World wide web, (213-224)
  47. ACM
    Bordawekar R, Blainey B and Apte C (2014). Analyzing analytics, ACM SIGMOD Record, 42:4, (17-28), Online publication date: 28-Feb-2014.
  48. ACM
    Miller L, Gazan R and Still S Unsupervised classification and visualization of unstructured text for the support of interdisciplinary collaboration Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, (1033-1042)
  49. ACM
    McMahan H, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, Chikkerur S, Liu D, Wattenberg M, Hrafnkelsson A, Boulos T and Kubica J Ad click prediction Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (1222-1230)
  50. ACM
    Zheng L and Mengshoel O Optimizing parallel belief propagation in junction treesusing regression Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (757-765)
  51. Zheng L and Mengshoel O Exploring multiple dimensions of parallelism in junction tree message passing Proceedings of the 2013 UAI Conference on Application Workshops: Big Data meet Complex Models and Models for Spatial, Temporal and Network Data - Volume 1024, (87-96)
  52. ACM
    Chrysos G, Dagritzikos P, Papaefstathiou I and Dollas A (2013). HC-CART, ACM Transactions on Architecture and Code Optimization, 9:4, (1-25), Online publication date: 1-Jan-2013.
  53. Daumé H, Phillips J, Saha A and Venkatasubramanian S Efficient protocols for distributed classification and optimization Proceedings of the 23rd international conference on Algorithmic Learning Theory, (154-168)
  54. ACM
    Langford J (2012). Parallel machine learning on big data, XRDS: Crossroads, The ACM Magazine for Students, 19:1, (60-62), Online publication date: 1-Sep-2012.
  55. Yang Z and Bajwa W RD-SVM: A resilient distributed support vector machine 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2444-2448)
Contributors
  • University of Haifa
  • Microsoft Research

Recommendations

Joseph M. Arul

The current research in parallel and distributed machine learning for large datasets is presented in this book. The applications discussed are mainly in the financial and petroleum industries. The preface states: "The book will be useful to the broad audience of researchers, practitioners, and anyone who wants to grasp the future of machine learning." However, this book will be more useful to current researchers in the field and those who are familiar with the techniques, rather than beginners. It is a collection of contemporary studies, which can enhance the knowledge of those already in this field who are well established in parallel and distributed machine learning. It is highly technical. Various scholars present contemporary topics in machine learning and scaling up, to motivate researchers to explore further. An in-depth exploration of techniques such as supervised and unsupervised algorithms is included for anyone interested in delving deep into the field of machine learning. Recent evolution in hardware architectures and programming frameworks has made it convenient to exploit the parallelism possible in many learning algorithms. In many modern applications, large datasets are often accumulated on distributed storage platforms, which further motivates the development of learning algorithms in a parallel or distributed environment for existing sequential algorithms. Anyone aiming to achieve speedup, efficiency, the scaling up of their algorithms, and performance gain will find this textbook very useful. The book is divided into four parts. The first part describes frameworks for scaling up machine learning, mostly using the k -means algorithm, which can be used in various applications such as decision tree ensembles, frequent pattern mining, and regression. Currently used frameworks for such scaling machines are MapReduce, DryadLINQ, the message passing interface (MPI), and CUDA. The second part of the book deals with supervised and unsupervised learning algorithms. The supervised learning algorithms utilize the training of data to construct a prediction function f ; thus, f is applied to test instances. In the unsupervised learning algorithm, the data is clustered to construct a function f that partitions an unlabeled dataset into k =| Y | clusters, with Y being the set of cluster indices. It deals with parallelizing support vector machines (SVMs), boosted decision trees, belief propagation (BP), and a Markov chain Monte Carlo (MCMC) technique. Part 3 presents the traditional supervised and unsupervised learning formulations, focusing on parallelizing online, semi-supervised, and transfer learning. The final part presents several learning applications in distinct domains with the main focus on scaling up, which is crucial to computational efficiency and improving accuracy. This book explains in detail the computing platforms, learning algorithms, prediction problems, and application domains for a variety of parallelization techniques for scaling up machine learning. It is not for anyone trying to understand the basic concepts of parallelization and distributed environments, but could be an excellent resource for researchers in the field. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.