A Post By: Gary Ernest Davis
The area of machine learning can be confusing for a beginner, and the path you might take is very dependent on your pre-existing skills: programming, mathematical, statistical, for example.
So where should a beginner begin?
“Begin at the beginning,” the King said, very gravely, “and go on till you come to the end: then stop.”
― Lewis Carroll, Alice in Wonderland
There is no definitive answer to the question of where to begin. However, Jason Brownlee (Ph.D. in Artificial Intelligence) has 3 very useful tips for machine learning beginners in his Self Study Guide to Machine Learning:
- Start with a small project that you can complete in one hour.
- Aim to complete one project per week in order to build up and maintain your momentum and a workspace of projects that you can build upon.
- Share your results on your blog, Facebook, Google+, GitHub (or wherever) you can to demonstrate your interest, increasing skills, knowledge and to get feedback.
That said, I agree with Jason in his recommendations for two books to get and read.
The first is:
O’Reilly, ISBN-13: 978-1449303716x
This book approaches machine learning through the R programming language. This, in my view, is a good thing because all wannabe data analysts and data scientists need to know R, and R is a relatively simple, open source route into programming for data analysis. The first two chapters of the book deal with getting, installing and using R, especially for exploratory data analysis. The book then gets into machine learning via classification – a fundamental topic. The book has very good examples, and a beginner could build off these examples with their own data. There are no exercises, and no applied projects, so the book falls down a little in that regard.
The second book is:
Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, by Ian H. Witten, Eibe Frank & Mark A. Hall.
Morgan Kaufmann, ISBN-13: 978-0123748560
This book has very good explanations. It gives a broad overview of machine learning, discusses the details through excellent examples, and presents those details in a very readable way. The mathematics is fairly light, and understandable by anyone with high school mathematics. The cons are that while the book has many relevant examples it has no exercises, and no projects on which to work. It’s a good book to have and to read for its detailed explanations of machine learning and its applications. It does not, however, promote learning through applied project work.
It’s clear to me that a lot more work needs to be done, especially from the perspective of evidence-based data analysis, in providing productive, orienting projects for beginners in machine learning. This is a topic we will come back to soon.
- Some machine learning projects by Ziv-Bar Joseph , School of Computer Science, Carnegie Mellon University (Spring 2012)
- For useful data sets for machine learning visit the UC Irvine Machine Learning Repository