Pro Apache Hadoop: | Guide books | ACM Digital Library

Pro Apache HadoopSeptember 2014

September 2014

Publisher:

Apress
901 Grayson Street Suite 204 Berkely, CA
United States

ISBN:978-1-4302-4863-7

Published:10 September 2014

Pages:

444

Available at Amazon

Bibliometrics

Abstract

Pro Apache Hadoop, Second Edition brings you up to speed on Hadoop the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest developments such as YARN (aka MapReduce 2.0), new HDFS high-availability features, and increased scalability in the form of HDFS Federations. All the old content has been revised too, giving the latest on the ins and outs of MapReduce, cluster design, the Hadoop Distributed File System, and more. This book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data. Learn to solve big-data problems the MapReduce way, by breaking a big problem into chunks and creating small-scale solutions that can be flung across thousands upon thousands of nodes to analyze large data volumes in a short amount of wall-clock time. Learn how to let Hadoop take care of distributing and parallelizing your softwareyou just focus on the code; Hadoop takes care of the rest. Covers all that is new in Hadoop 2.0 Written by a professional involved in Hadoop since day one Takes you quickly to the seasoned pro level on the hottest cloud-computing framework What youll learn Build a resilient and scalable Hadoop compute cluster. Analyze large volumes of data in amazingly short time. Optimize Hadoop tasks like a seasoned professional. Implement bulletproof patterns that are proven successful. Scale out using the new HDFS Federations feature set. Chunk large problems into highly-parallel, MapReduce modules Who this book is for This book is aimed at I.T. professionals investigating Hadoop and implementing it in their organizations. Existing Hadoop users will deepen their toolkits and come up to speed on whats new Hadoop 2.0. New Hadoop users will quickly move to the seasoned professional level in their use of the toolset.

Cited By

Idhammad M, Afdel K and Belouch M (2018). Distributed Intrusion Detection System for Cloud Environments based on Data Mining techniques, Procedia Computer Science, 127:C, (35-41), Online publication date: 1-May-2018.

Contributors

Sameer Wadkar
- Publication Years2014 - 2014
- Publication counts1
- Citation count1
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article1
View Full Profile
Madhu Siddalingaiah
- Publication Years2014 - 2014
- Publication counts1
- Citation count1
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article1
View Full Profile
Jason Venner
- Publication Years2009 - 2014
- Publication counts2
- Citation count15
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article8
View Full Profile

Index Terms

Pro Apache Hadoop

Recommendations

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2
Read More
Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing Research

The term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...
Read More
Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop
Read More

Reviews

Reviewer: Shane Chang

Big data analysis is an emerging field. Every day there is a tremendous amount of data being generated in all areas. Large volumes of data cannot be handled by traditional computing paradigms. This book introduces Apache Hadoop to process big data. The first chapter introduces the needs of big data, its difficulties, and various analysis concepts. Chapter 2 further introduces Hadoop 2.0, the YARN ("Yet Another Resource Negotiator") framework, and fundamentals of Hadoop. Chapters 3 and 4 start with basic Hadoop exercises, including MapReduce scripts and how to manage the Hadoop platform. Chapters 5 to 7 focus on the core of Hadoop. The chapters disclose details of the MapReduce frameworks. Furthermore, the book addresses the differences between structured query language (SQL) and Hadoop, and how to mimic commonly used SQL scripts using the Hadoop language. The book also presents examples of big data processing. Multiple application programming interface (API) examples are included to explain how to access and process data via Hadoop. Chapters 8 to 14 describe various advanced topics. Chapter 8 introduces how to test MapReduce frameworks. Chapter 9 describes monitoring the MapReduce frameworks by analyzing the log files. Chapter 10 further teaches how to host a data warehouse (that is, the Hive framework) based on MapReduce. Chapter 11 lets readers learn about data processing pipelines based on Hadoop. Chapter 12 is tailored to enterprise users who can exploit Hadoop to access data stored in Hadoop systems. Chapter 13 is directed to streaming log analysis. Chapter 14 describes the NoSQL database within Hadoop systems. Chapters 15 and 16 switch to the topic of data science. Data science is important for big data analysis. This chapter leads readers to this field with the use of Hadoop, and introduces how to use the Spark and Hama frameworks for data science. Since big data needs cloud computing to facilitate fast processing, chapter 16 presents Hadoop in the cloud environment. Chapter 17 closes the book by teaching readers how to create their own software applications based on Hadoop. This chapter is short, but the previous chapters have already laid the foundation for readers to become big data professionals. Another feature of this book is example-based teaching. Sample code is provided from the first chapter to the last, so readers can learn by doing. In summary, this book is highly recommended for big data scientists and engineers. More reviews about this item: Amazon Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Browse Books

Sections

Cited By

Index Terms

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

Performance comparison of Apache Hadoop and Apache Spark

Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop

Reviews

Access critical reviews of Computing literature here

Save to Binder

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

Performance comparison of Apache Hadoop and Apache Spark

Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop

Reviews

Access critical reviews of Computing literature here