skip to main content
Skip header Section
Pro Apache HadoopSeptember 2014
Publisher:
  • Apress
  • 901 Grayson Street Suite 204 Berkely, CA
  • United States
ISBN:978-1-4302-4863-7
Published:10 September 2014
Pages:
444
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

Pro Apache Hadoop, Second Edition brings you up to speed on Hadoop the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest developments such as YARN (aka MapReduce 2.0), new HDFS high-availability features, and increased scalability in the form of HDFS Federations. All the old content has been revised too, giving the latest on the ins and outs of MapReduce, cluster design, the Hadoop Distributed File System, and more. This book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data. Learn to solve big-data problems the MapReduce way, by breaking a big problem into chunks and creating small-scale solutions that can be flung across thousands upon thousands of nodes to analyze large data volumes in a short amount of wall-clock time. Learn how to let Hadoop take care of distributing and parallelizing your softwareyou just focus on the code; Hadoop takes care of the rest. Covers all that is new in Hadoop 2.0 Written by a professional involved in Hadoop since day one Takes you quickly to the seasoned pro level on the hottest cloud-computing framework What youll learn Build a resilient and scalable Hadoop compute cluster. Analyze large volumes of data in amazingly short time. Optimize Hadoop tasks like a seasoned professional. Implement bulletproof patterns that are proven successful. Scale out using the new HDFS Federations feature set. Chunk large problems into highly-parallel, MapReduce modules Who this book is for This book is aimed at I.T. professionals investigating Hadoop and implementing it in their organizations. Existing Hadoop users will deepen their toolkits and come up to speed on whats new Hadoop 2.0. New Hadoop users will quickly move to the seasoned professional level in their use of the toolset.

Contributors

Recommendations

Reviews

Shane Chang

Big data analysis is an emerging field. Every day there is a tremendous amount of data being generated in all areas. Large volumes of data cannot be handled by traditional computing paradigms. This book introduces Apache Hadoop to process big data. The first chapter introduces the needs of big data, its difficulties, and various analysis concepts. Chapter 2 further introduces Hadoop 2.0, the YARN ("Yet Another Resource Negotiator") framework, and fundamentals of Hadoop. Chapters 3 and 4 start with basic Hadoop exercises, including MapReduce scripts and how to manage the Hadoop platform. Chapters 5 to 7 focus on the core of Hadoop. The chapters disclose details of the MapReduce frameworks. Furthermore, the book addresses the differences between structured query language (SQL) and Hadoop, and how to mimic commonly used SQL scripts using the Hadoop language. The book also presents examples of big data processing. Multiple application programming interface (API) examples are included to explain how to access and process data via Hadoop. Chapters 8 to 14 describe various advanced topics. Chapter 8 introduces how to test MapReduce frameworks. Chapter 9 describes monitoring the MapReduce frameworks by analyzing the log files. Chapter 10 further teaches how to host a data warehouse (that is, the Hive framework) based on MapReduce. Chapter 11 lets readers learn about data processing pipelines based on Hadoop. Chapter 12 is tailored to enterprise users who can exploit Hadoop to access data stored in Hadoop systems. Chapter 13 is directed to streaming log analysis. Chapter 14 describes the NoSQL database within Hadoop systems. Chapters 15 and 16 switch to the topic of data science. Data science is important for big data analysis. This chapter leads readers to this field with the use of Hadoop, and introduces how to use the Spark and Hama frameworks for data science. Since big data needs cloud computing to facilitate fast processing, chapter 16 presents Hadoop in the cloud environment. Chapter 17 closes the book by teaching readers how to create their own software applications based on Hadoop. This chapter is short, but the previous chapters have already laid the foundation for readers to become big data professionals. Another feature of this book is example-based teaching. Sample code is provided from the first chapter to the last, so readers can learn by doing. In summary, this book is highly recommended for big data scientists and engineers. More reviews about this item: Amazon Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.