Pro Apache Hadoop - Exclusive Book Review
Hadoop is the best available tool for processing big data. Many information technology companies are dealing with petabytes of data and traditional approaches are not scaling to the limit.
Hadoop has helped many organization by simplifying the large data management. Many organization now also have a dedicated Big Data Architect position to deal with large data problems.
Learning hadoop is not cakewalk. Its a huge ecosystem of multiple co-related components. Therefore learning only from hadoop interview questions or free hadoop tutorials may not be enough. You must pick one or more books.
Below is a review of a good book by experienced Hadoop engineers. Pro Apache Hadoop is a popular choice in books for big data. It is a good companion for hadoop engineers and big data architects.
(By: Sameer Wadkar, Madhu Siddalingaiah, Jason Venner )
If you are a Hadoop developer, the second edition of the Pro Apache Hadoop is the essential tool for enhancing your knowledge and speed up your Hadoop developing skills. This is the revised version of the previous edition and it covers topics like Hadoop 2.0, augmented scalability in the form of HDFS Federation and high availability features of new HDFS.
The existing content like cluster design and MapReducer has also been enhanced. The author of the book, Jason Venner, is a software engineer with 20 years of experience in designing, coding and managing software development. His interest in Hadoop, Java and cloud computing is clearly visible in this great creation.
Target AudienceThis book is the best companion if you are a software engineer who is investigating Hadoop and interested in implementing the same in their organizations. This book is also a useful tool for those who are looking forward to deepen and sharpen their Hadoop knowledge. Since this book comes with the newer addition of concepts of Hadoop 2.0, it would be the best toolkit for enhancing your knowledge base about Hadoop and Hadoop 2.0.
This book will prove to be useful for the new Hadoop users who quickly moved to the level of a seasoned professional in using their existing Hadoop toolsets. If you hate reading books on Hadoop, this book will surely change your mind and attitude towards Hadoop coding as a whole. If you're new entrant in the Hadoop world, especially to programming with MapReduce toolkit, this is a great book that you can think to get started. This book is strongly recommended for anyone having SQL background and wants to enter the Big Data world.
The Good Part of Pro Apache Hadoop BookAs far as the pros about the book are concerned, there is a couple of stuff that differentiates this book from other books. Those are the thorough and detailed explanation of all versions of Hadoop and there compatibility or incompatibility along with YARN in details. And second one is the explanation of old and new APIs of Hadoop.
If you struggled in running Hadoop in the past, the Hadoop Administrator chapter comes to your rescue. This chapter focuses on the practical aspects that you can use at work. However, none of them are easy and the author has used multiple ways like Amazon’s cloud, Cloud era’s VM etc. The explanation of the config files and their functions are also well described.
The code examples used in this book are consistent and help in illustrating the points of Hadoop coding.
One of the most prominent areas of the book is how it treated the Large Table Joins topic. It also covered Secondary Sorting and Partitioning, which is a practical and extremely important concept and generally mentioned in the other books.
Two unique concepts that are incorporated in this great book are Hadoop libraries like Pig UDFs and HBase. This book does not contain any irrelevant stories about personal life of the author or something similar; irrelevant topics included in technical books are generally disliked by IT professionals.
The detailed explanations of HDFS architecture are really helpful and are best ones in this book. This great practical guide of Hadoop 2 is just the perfect one that is optimized with right amount of depth and breadth.
Few highlights of Hadoop are data analysis, data warehousing and ETL. This book does not summarize them but covered them in detail. A lot of attention is paid to details and presentation of SQL clauses with MapReduce constructs.
The authors also have spent a good amount of time to look at source codes and offered readers useful insights. The book leaves you with an impression of not only how Hadoop needs to be used but also how Hadoop works intrinsically.
Improvement AreasThough this book is one of the best in the segment, one major drawback is that the author has written this book assuming the readers to be fluent Java developers. There are other professionals who work on others languages and want to learn Hadoop.
The authors should have framed their content keeping neutrality in mind as there is a strong and big community of other programming language users like Python and Ruby. Though most of the Hadoop users are Java developers, there are other languages that are supported by Hadoop.
Instead of targeting the Java community, it would have been better to widen the reach for a larger software community.