Scaling up Machine Learning: Parallel and Distributed Approaches

Scaling up Machine Learning: Parallel and Distributed ApproachesDecember 2011

December 2011

Publisher:

Cambridge University Press
40 W. 20 St. New York, NY
United States

ISBN:978-0-521-19224-8

Published:30 December 2011

Pages:

496

Available at Amazon

Bibliometrics

Abstract

This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs, and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce, and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised, and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms and deep dives into several applications make the book equally useful for researchers, students, and practitioners.

Cited By

Contributors

Ron Bekkerman
University of Haifa
- Publication Years2001 - 2023
- Publication counts21
- Citation count631
- Available for Download12
- Downloads (cumulative)9,785
- Downloads (12 months)147
- Downloads (6 weeks)26
- Average Downloads per Article815
- Average Citation per Article30
View Full Profile
Mikhail Bilenko
- Publication Years2011 - 2019
- Publication counts4
- Citation count138
- Available for Download1
- Downloads (cumulative)2,413
- Downloads (12 months)69
- Downloads (6 weeks)9
- Average Downloads per Article2,413
- Average Citation per Article35
View Full Profile
John C Langford
Microsoft Research
- Publication Years1998 - 2024
- Publication counts112
- Citation count5,757
- Available for Download51
- Downloads (cumulative)57,292
- Downloads (12 months)3,874
- Downloads (6 weeks)609
- Average Downloads per Article1,123
- Average Citation per Article51
View Full Profile

Index Terms

Scaling up Machine Learning: Parallel and Distributed Approaches

Recommendations

Lifelong Machine Learning
Read More
Scaling up Machine Learning: Parallel and Distributed Approaches
Read More
Machine Learning: The State of the Art

The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Read More

Reviewer: Joseph M. Arul

The current research in parallel and distributed machine learning for large datasets is presented in this book. The applications discussed are mainly in the financial and petroleum industries. The preface states: "The book will be useful to the broad audience of researchers, practitioners, and anyone who wants to grasp the future of machine learning." However, this book will be more useful to current researchers in the field and those who are familiar with the techniques, rather than beginners. It is a collection of contemporary studies, which can enhance the knowledge of those already in this field who are well established in parallel and distributed machine learning. It is highly technical. Various scholars present contemporary topics in machine learning and scaling up, to motivate researchers to explore further. An in-depth exploration of techniques such as supervised and unsupervised algorithms is included for anyone interested in delving deep into the field of machine learning. Recent evolution in hardware architectures and programming frameworks has made it convenient to exploit the parallelism possible in many learning algorithms. In many modern applications, large datasets are often accumulated on distributed storage platforms, which further motivates the development of learning algorithms in a parallel or distributed environment for existing sequential algorithms. Anyone aiming to achieve speedup, efficiency, the scaling up of their algorithms, and performance gain will find this textbook very useful. The book is divided into four parts. The first part describes frameworks for scaling up machine learning, mostly using the k -means algorithm, which can be used in various applications such as decision tree ensembles, frequent pattern mining, and regression. Currently used frameworks for such scaling machines are MapReduce, DryadLINQ, the message passing interface (MPI), and CUDA. The second part of the book deals with supervised and unsupervised learning algorithms. The supervised learning algorithms utilize the training of data to construct a prediction function f ; thus, f is applied to test instances. In the unsupervised learning algorithm, the data is clustered to construct a function f that partitions an unlabeled dataset into k =| Y | clusters, with Y being the set of cluster indices. It deals with parallelizing support vector machines (SVMs), boosted decision trees, belief propagation (BP), and a Markov chain Monte Carlo (MCMC) technique. Part 3 presents the traditional supervised and unsupervised learning formulations, focusing on parallelizing online, semi-supervised, and transfer learning. The final part presents several learning applications in distinct domains with the main focus on scaling up, which is crucial to computational efficiency and improving accuracy. This book explains in detail the computing platforms, learning algorithms, prediction problems, and application domains for a variety of parallelization techniques for scaling up machine learning. It is not for anyone trying to understand the basic concepts of parallelization and distributed environments, but could be an excellent resource for researchers in the field. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Browse Books

Sections

Cited By

Index Terms

Lifelong Machine Learning

Machine Learning: The State of the Art

Access critical reviews of Computing literature here

Save to Binder

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Lifelong Machine Learning

Scaling up Machine Learning: Parallel and Distributed Approaches

Machine Learning: The State of the Art

Access critical reviews of Computing literature here