Data Science from Scratch: First Principles with Python : Exclusive Book Review

Data Science from Scratch: First Principles with Python
Python is one of the most popular programming language for big data analysis. Python is also favorite language of many data scientists. 
The simplicity of python makes it a preferred choice for many developers. This book will help any beginner at python to become a good data scientist. 
In this exclusive book review we are going to share the good and bad part of the book.

Data Science from Scratch: First Principles with Python

(By: Joel Grus )

The book is all about the most fundamental tools of data science along with algorithms. The book has housed these concepts from a scratch and implements the same through coded examples.

The author of the book, Joel Grus, helps you get comfortable with statistics and math that is core of any data science projects. This book will help learners dive into the fundamental aspects of machine learning.

The author of the book is a software engineer at Google and also has the experience as a work data scientist in multiple startups.

Target Audience

The book is the perfect choice for professionals who are looking to take a crash course on Python. For any software professional who has interest in learning linear algebra, probability and statistics along with when and how they can be used in data science, this book will serve as a knowledge bank.

If you are a software professional working on Python language and want to brush up your knowledge about the language, then the book is highly recommended for you. Excellent pedagogy and clear Python codes are the ones that is required for you to look into.

This book is ideal for data scientists or programmers using Python 2.6 or 2.7 branches as it contains appropriate libraries such as Anaconda distribution. This book is among a handful books that any software professionals would love to have in their repository.

The book is an ideal choice for software professionals who have experience in Statistical Analysis and Data Mining.


The most positive thing about this book is its content, which is very well-written and also covers topics of wide range that are related to machine learning and data science. Few chapters which are particularly eye-catching are the intro or refresher section for Python, probability and statistics, the ML techniques explanation and the methods of data manipulation.

The author has put together enough details for the underlying theory and pointers that readers can use for further reading. The easy to follow simple steps for building codes are the ones that will surely catch your attention.

The author has given emphasis on the understandable and excellent pedagogy of Python code. Three introduction chapters – for Python, for Linear Algebra and for Practical Statistics – are the best ones for setting up the base of your understanding.

The book is true to its name “from Scratch” as it starts from the basics of every concept and is built on the same to make it easily understandable for readers. The objective of the book to make you understand the fundamentals is well served.

After reading this book, you will better be able to utilize the pre-packaged software more meaningfully – whether it’s scikit-learn or Matlab, R or anything else – which you will use in ‘real life’. The knowledge that it renders is generally independent of any programming language.

The author has beautifully explained the concepts of Statistical Analysis and Data Mining as a whole. This perfect data science starter book takes the approach of assuming the reader to be the chief data scientist of a hypothetical company and also provides business context as and when required, which is web-oriented but is easy to understand.

The approach of the book to build the tooling in layman Python language prior to moving to the libraries; is the uniqueness of this book, which supports in explaining the underlying math and also avoids the magic effect of the libraries. The concepts, ideas and libraries explained in the book are attuned to the modern concepts.


Though the author has captured all the concepts and ideas in the book, it would have been even better if they were presented in more structured manner. Despite the mentioning of a lot of Python language and concepts, usage of specific variants is very less. It would have been better if the author would not limit the usage of explanation. For example, the description of standard deviation as a Python is a one-liner in the Statistics section.

Overall the book is a good resource for data scientists and is recommended. This review can help you understand the book in a nutshell. Get you copy of the book now and refresh your knowledge of Python, statistical analysis, data mining and data science.

Exit mobile version