Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools | Guide books

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and ToolsOctober 2016

October 2016

Author:
Deepak Vohra

Publisher:

Apress
901 Grayson Street Suite 204 Berkely, CA
United States

ISBN:978-1-4842-2198-3

Published:01 October 2016

Pages:

421

Available at Amazon

Bibliometrics

Sections

2016

Abstract

This book is a practical guide on using the Apache Hadoop projects including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr. From setting up the environment to running sample applications each chapter is a practical tutorial on using a Apache Hadoop ecosystem project. While several books on Apache Hadoop are available, most are based on the main projects MapReduce and HDFS and none discusses the other Apache Hadoop ecosystem projects and how these all work together as a cohesive big data development platform. What you'll learn How to set up environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5. How to run a MapReduce job How to store data with Apache Hive, Apache HBase How to index data in HDFS with Apache Solr How to develop a Kafka messaging system How to develop a Mahout User Recommender System How to stream Logs to HDFS with Apache Flume How to transfer data from MySQL database to Hive, HDFS and HBase with Sqoop How create a Hive table over Apache Solr Who this book is for: The primary audience is Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.

Contributors

Deepak Vohra
- Publication Years2000 - 2018
- Publication counts20
- Citation count3
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile

Index Terms

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

Recommendations

Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale
Read More
Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop
Read More
Multi-Layer Authorization Framework for a Representative Hadoop Ecosystem Deployment
SACMAT '17 Abstracts: Proceedings of the 22nd ACM on Symposium on Access Control Models and Technologies

Apache Hadoop is a predominant software framework to store and process vast amount of data, produced in varied formats. Data stored in Hadoop multi-tenant data lake often includes sensitive data such as social security numbers, intelligence sources and ...
Read More

Comments

Save to Binder

Sections

Save to Binder

Index Terms

Recommendations

Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale

Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop

Multi-Layer Authorization Framework for a Representative Hadoop Ecosystem Deployment