Learning to maintain a Hadoop cluster? This book is an absolute must-read for any Hadoop operations engineer. DEVOPS for Hadoop is not like traditional application maintenance. This requires you to maintain hundreds of nodes. Using right tools and techniques is important while maintaining large Hadoop clusters.
This book is one of the best books on Hadoop and we highly recommend it for development operations team members to read it. It may also be a good book for Hadoop development managers and project leads.
(By: Eric Sammer )
As Hadoop is steadily becoming the de facto standard software framework for large-scale data processing centers, the users require information regarding a comprehensive operations-specific material.
This is due to the fact that he or she needs to maintain a large and complex Hadoop clusters on a continuous basis and complete this task in a comprehensive manner. Eric Sammer, the author is also working as the Principal Solution Architect at Cloudera.
A technical lead and an Engineering Manager, the author’s background is mainly in the operations and development of highly concurrent, distributed, processing systems and data ingest. Since the last decade, he is a member of open source community and is involved in a lot of projects.
Who Should Read This Book?
The main focus of this book is maintaining of Hadoop at any large-scale data center. The range of this maintenance includes various processes such as planning, installation and configuration of the system.
Essentially this is a pragmatic operation guide rather than a guide running through all the possible scenarios of any regular software operation. The book offers information regarding the things that work out during critical deployments. Further, following the tradition, the author also offers various war-time stories as well as the mystery bottlenecks that make for a great narrative.
Specifically targeted towards the operation or administration people, this book is a good supporting material for developers as it offers a high-level overview of MapReduce and HDFS.
What Is Good About This Book?
As Hadoop is steadily becoming the popular standard software framework for the large-scale data centers, the demand for information for operation-specific Hadoop is steadily increasing across the world. As this book is specifically targeted towards the administrators who regularly maintain complex and large Hadoop clusters, this is a go-to book for them.
Further, this operation guide also maintains a pragmatic approach, as that enables the administrators to prevent any issue during the critical deployments.
With his experience of several decades, the author offers comprehensive information that covers a vast range of the topics starting from planning and installation to actual configuration of the system. Further, he offers a complete overview of the HDFS and MapReduce including the reason for their existence and their function.
The planning section of the book includes the deployment of Hadoop, OS and hardware selection, to requirements of the network. The next section deals with the configuration and setup details that contain the critical properties list.
The readers can learn about the management of resources with the sharing of one cluster across various groups. Further, readers will be aware of the common runbook or manual of the routine maintenance tasks. The author also breaks the monotony of the book with narration of war stories intermittently.
These war stories in addition to the entertainment, also enables the reader to understand the vitality of troubleshooting during a real crisis. Moreover, readers learn to maintain composure during stress, use the basic tools as well as techniques, in order to prevent or resolve any catastrophic or backup failure.
The language of the book is very easy and the author is experienced enough, so as to provide comprehensive resolution to the common issues encountered by software professionals.
Further, the author has offered a vast range of information in a very systematic manner. After the first reading, the readers can reference their concern by opening the required page of the book. In all, this book is a collectible item for every Hadoop professional.
What Is Not So Good About This Book?
This book is completely focused on operation of Hadoop at large-scale data centers and is the best ally of operation people or the administrators. Also, although the book offers something to developers such as an overview of the system, it is not in-depth information.
Like any other book, this book is also bound to have some downsides, however they are very few. Another one of the prominent downsides is relevance of the book in the future; though this fact is common in all books in the software world.
As the software market continues to introduce upgraded versions of a language, the earlier versions become irrelevant.
An operational guide with a realistic approach, this book is a quintessential guide to large-scale data center projects using Hadoop systems. The book explores a vast range of topics that starts from the planning, installation, and configuration of regular maintenance systems.
All in all, this book is primarily targeted for people who are working as software administrators; it’s also ideal for developers who would like to learn about admin tasks.