This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm. From the Foreword by Raymie Stata, CEO of Altiscale The Insiders Guide to Building Distributed, Big Data Applications with Apache Hadoop YARN Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances. YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. Youll find many examples drawn from the authors cutting-edge experiencefirst as Hadoops earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it. Coverage includes YARNs goals, design, architecture, and componentshow it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN
Cited By
- Huang L, Zhao Y, Mestre P, Han L, Wang K, Gao W and Zhang R (2022). Research on Reverse Skyline Query Algorithm Based on Decision Set, Journal of Database Management, 33:1, (1-28), Online publication date: 21-Jul-2022.
- Nguyen C, Hwang S and Kim J (2017). Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster, Cluster Computing, 20:3, (2095-2106), Online publication date: 1-Sep-2017.
- Lin J, Yu I, Johnsen E and Lee M ABS-YARN Proceedings of the 19th International Conference on Fundamental Approaches to Software Engineering - Volume 9633, (49-65)
- Kumar M, Rath N and Rath S (2016). Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier, Journal of Biomedical Informatics, 60:C, (395-409), Online publication date: 1-Apr-2016.
- Kumar M and Kumar Rath S (2015). Classification of microarray using MapReduce based proximal support vector machine classifier, Knowledge-Based Systems, 89:C, (584-602), Online publication date: 1-Nov-2015.
- Zafar H, Khan F, Carpenter B, Shafi A and Malik A (2015). MPJ Express Meets YARN, Procedia Computer Science, 51:C, (2678-2682), Online publication date: 1-Sep-2015.
- Huang B, Boehm M, Tian Y, Reinwald B, Tatikonda S and Reiss F Resource Elasticity for Large-Scale Machine Learning Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, (137-152)
- Xu L, Li M and Butt A Gerbil Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (627-636)
Index Terms
- Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2
Recommendations
Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing ResearchThe term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...
Apache Hadoop YARN: yet another resource negotiator
SOCC '13: Proceedings of the 4th annual Symposium on Cloud ComputingThe initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá---the de facto place where data and ...