Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 | Guide books

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2March 2014

March 2014

Publisher:

Addison-Wesley Professional

ISBN:978-0-321-93450-5

Published:29 March 2014

Pages:

400

Available at Amazon

Bibliometrics

Sections

2014

Abstract

This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm. From the Foreword by Raymie Stata, CEO of Altiscale The Insiders Guide to Building Distributed, Big Data Applications with Apache Hadoop YARN Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances. YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. Youll find many examples drawn from the authors cutting-edge experiencefirst as Hadoops earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it. Coverage includes YARNs goals, design, architecture, and componentshow it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN

Cited By

Contributors

Arun C Murthy
Cloudera, Inc
- Publication Years2013 - 2015
- Publication counts3
- Citation count1,475
- Available for Download2
- Downloads (cumulative)12,220
- Downloads (12 months)469
- Downloads (6 weeks)60
- Average Downloads per Article6,110
- Average Citation per Article492
View Full Profile
Vinod Kumar Vavilapalli
- Publication Years2013 - 2014
- Publication counts2
- Citation count1,334
- Available for Download1
- Downloads (cumulative)9,857
- Downloads (12 months)433
- Downloads (6 weeks)50
- Average Downloads per Article9,857
- Average Citation per Article667
View Full Profile
Doug Eadline
- Publication Years2013 - 2016
- Publication counts5
- Citation count10
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article2
View Full Profile
Joseph Niemiec
- Publication Years2014 - 2014
- Publication counts1
- Citation count9
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article9
View Full Profile
Jeff Markham
- Publication Years2014 - 2014
- Publication counts1
- Citation count9
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article9
View Full Profile

Index Terms

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

Recommendations

Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing Research

The term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...
Read More
Apache Hadoop YARN: yet another resource negotiator
SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá---the de facto place where data and ...
Read More
Pro Apache Hadoop
Read More

Reviews

Reviewer: Aake Edlund

MapReduce from Apache Hadoop 1 (MapReduce MRv1) has in the next-generation MapReduce (MRv2, or YARN) been divided into two components, where the cluster resource management capabilities have become YARN (Yet Another Resource Negotiator), and the MapReduce-specific capabilities remain MapReduce. While in the MapReduce MRv1 architecture, the cluster was managed by a service called the JobTracker, with TaskTracker services on each host launching tasks on behalf of jobs, and the JobTracker serving information about completed jobs. In MapReduce MRv2, the functions of the JobTracker have been split between three services. First is the ResourceManager, a persistent YARN service that receives and runs applications on the cluster. It contains the scheduler, which is pluggable. Next, the MapReduce-specific capabilities of the JobTracker have been moved into the MapReduce Application Master, which is started to manage each MapReduce job and terminated when the job completes. Finally, the JobTracker function of serving information about completed jobs has been moved to the JobHistory Server, while the TaskTracker has been replaced with the NodeManager, a YARN service that manages resources and deployment on a host. It is responsible for launching containers, each of which can house a map or reduce task. The authors give a good background on the reasoning behind the above move from MRv1 to MRv2, or YARN, and the resulting huge change this brings to the data stacks ecosystem overall. The reader who wants more details, for example, on configuration and tuning, and walk-through examples, needs to go to the web. This area is under constant development, with YARN as no exception. This is evident when it comes to the scripting parts and links. Practical details aside, this book is very useful for the reader to get an overview of the architecture, its capabilities, feature set, and related frameworks. The current source code provided for the book needs to be updated; this is something that would considerably increase the usability of the book, especially if all code (not only from selected chapters) would be added. In its current form, the text is less useful for actual testing of the deployment and management of YARN; however, the core concepts of YARN are well described and explained in a pedagogical way to the reader, with an initial focus on the underlying motivations for the evolution toward YARN. The reader is introduced to the core concepts and functional overview of the YARN components in a stepwise manner. The installation steps are described in detail, helping the user into the machinery of setting up his own YARN environment; however, to get it actually in place, the reader needs to go to the web. Throughout the book, the reader is helped to better understand what is needed, the components' functionality, and what to look for and consider when moving to YARN. A number of installation alternatives are described and the user gets a good idea of today's existing support for managing and tuning the environment. Further details on administration and monitoring are given, with source code for these specific chapters. Building on the initial functional descriptions of YARN, the authors add a deeper level of insight with respect to the inner-workings of YARN in a dedicated section on its architecture. The detailed YARN application would benefit from available (and updated) source code to help the user to reproduce the examples as much as possible. The YARN frameworks section gives the user a hint on the importance of YARN, but could be further detailed and extended. Overall, the book is best viewed as a guide to understanding YARN, and less as a hands-on guide to get the details in place. When the authors update the source code for the book, the reader will find it even more useful. More reviews about this item: Amazon , Goodreads , i-Programmer Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Save to Binder