Recently I received some responses from readers who asked me to write about how to start in machine learning/ data science, and what are the right resource to start with. For a second I was hesitant since there are already many "beginner's guide" articles on this topic.

However, I also noticed many of the articles out there are more like a collection of endless lists of online courses. And the fun component of learning is largely overlooked. In addition, coming from a non-tech background (I studied Economics in my first degree), I know how easy it is to give up. The learning path requires a lot of effort, a lot of dedication and curiosity. There's no shortcuts, but you can certainly make it more enjoyable and exciting. In this post I attempt to show you a most clear picture of machine learning (ML), and how you should start with it without getting overwhelmed. Here's our plan:

  • Understand the building blocks of machine learning;
  • Required fundamentals;
  • Learning strategy + tips on how to keep yourself entertained;
  • How to make the knowledge yours.

Let's dive right in.

1. The building blocks of machine learning

The building blocks of machine learning can be summarized as follows:

None
The building blocks of machine learning.

This is definitely not an exhaustive list, but you get the idea. All models and algorithms you encounter start with some overarching theory behind. This is because machine learning can be seen from many points of view: statistical point of view, computational complexity point of view, etc. Don't get confused when you see some ML books that presents mostly mathematical theorems and proofs, while some others are more practical with data and programming-language specific. They are just looking at ML from different angles.

None
Source: https://anacuder.com/o-que-significa-in-a-nutshell/

From these different theories, we came up with different types of learning models (linear, decision trees, neural networks, etc.).

However, to make a model that can generalize well on real-life dataset, we need to employ some extra techniques (e.g. regularization to avoid over-fitting, ensemble methods to reduce bias/ variance in the learned model). This is where all kinds of variations of the same model come from. For example, linear regression adding some regularization terms creates new types of models (e.g. Lasso, Ridge or Elastic nets). The same goes for decision trees: using different ensemble methods we come up with Random Forest, AdaBoost, etc.

Finally, models fall into different paradigms depending on how you apply them on the real datasets. For example, neural networks are supervised learning algorithm when trained on a labeled dataset. However, when it's trained to reconstruct itself (e.g. auto-encoder), then it becomes an unsupervised algorithm. There are also other mixed paradigms such as semi-supervised learning, but once you get the basics right, you'll understand it in an eye blink. So, know the core well and things will start falling into place.

2. Required fundamentals

None
Source: https://www.bouvet.no/bouvet-deler/6-tips-for-getting-started-with-machine-learning

You might already realize that we clearly need to brush up on our math and statistics to be able to conquer machine learning.

In fact, you also need some computer science knowledge and programming skills. But these skills you will pick up naturally with practice while applying machine learning to your problems.

3. Learning strategy

Still up? Now, let's see how we should conquer this vast field of machine learning.

First, get the basics right

It's tempting to go from an online course to the other, while you still don't have a good basic foundation. Many online courses are basically the same in terms of content at their cores, although they might be delivered differently or using different programming languages. So use your time wisely! What you need is to do ONE single course thoroughly, the one that gives you all the fundamentals you need. Avoid programming language-specific courses, like "Machine learning in Python", because they will make you lose focus on the fundamentals.

I have learned this lesson the hard way. I found myself pondering over very basic concepts such as Maximum Likelihood Estimation or Single Value Decomposition even after finishing quite a few ML courses. These gaps are hard to cover no matter how many times you have written model.fit(X_train, y_train) and model.predict(X_test) for your ML projects. Therefore, getting the basics right and you can always expand your knowledge on "extensions" such as Image Processing and Natural Language Processing in the future. Remember, you aim to be a competent data scientist/ machine learner, not someone who just knows enough to apply Sklearn on a dataset.

Recommendations for Intro to ML:

  1. Coursera's Machine Learning — Andrew Ng (the best intuitive introduction to ML. The programming exercises are in Matlab, but if you are not familiar with Matlab you can skip the exercises for now).
  2. CS540 Machine Learning by Nando de Freitas (comprehensive mathematical foundation for almost all ML concepts and algorithms).
None
The learning curve for most of us looks like this. Source: https://medium.com/@pwalukagga/learning-curve-experience-at-slc-bootcamp-day3-9f8f34458959 (adapted)

Tips on how to keep yourself entertained:

  • Youtube is your best friend. Whenever things get too abstract or you're stuck at something, look it up on Youtube! Some useful Youtube channels (to mention a few) are:
  1. 3Blue1Brown channel
  2. Alexander Ihler channel
  3. jbstatistics
  4. Khan academy
  • Use visual tools for learning, as described in my other post.
  • When you start getting bored? Here are some really fun books that will keep you entertained learning Math and Statistics:
  1. How to Lie with Statistics (by Darrell Huff)
  2. How Not to Be Wrong: The Power of Mathematical Thinking (by Jordan Ellenberg)

Second, basic programming

There are many good online courses to learn R/ Python on Coursera, Udemy and DataCamp, so I won't go through them here. The best way to learn, at least in my case, is to practice, lots of it. Download a toy dataset, do some manipulations and see how far you can get.

Tips on how to keep yourself entertained:

  • You can collect your own dataset (about yourself). Keep a dataset on your daily groceries, get the bank statements on your monthly expenses, scrape your favourite website. You can then do tons of interesting analyses with them.
  • Find a community and interact with them. If you're stuck, you can post your questions on Stackoverflow. Answering questions from others is also a very good way to learn. In addition, in many online classes you can find classmates who might know more or less than you. In any case the interaction will make you feel connected and give you the little joy along the way.

When you first start, it's fine to google every single thing you want to do. The more and more you program, the less you'll need to google and you start learning to be an efficient and clean coder. You can basically write code like writing poetry (well, I suppose you're better at poetry than I am).

4. Make the knowledge yours

You might have feelings at some point that you still don't feel confident about the knowledge you have gained. Here is what you can do:

Teach others

The single most effective way to internalize your knowledge is to teach others. Write a blog post on what you've learned. Explain to a friend, a colleague who doesn't know about the concept. Make a presentation on the topic. Try to explain the best you can, get through the shallow layer, trim it down to the core. Use graphics, diagrams, drawings or any means to get your messages across.

Apply programming at your work/ study

If you have a job, try to apply programming to automate some tasks at work, whether it is to manipulate the data for a dashboard, or to manipulate an Excel file! This will help you quickly gain familiarity and proficiency in programming.

If you're learning Python, here's what you can play with:

  1. Automate the Boring Stuff with Python

Side project (with a friend)

Find a ML challenge on a topic you care about, and do it with a friend. You can always "survive" a Kaggle competition by forking the code from some popular kernels, you might also learn something new, but it's not really fun. With a friend you have more motivation to stay committed and get better than others.

I wish you the best in your machine learning journey. Enjoy learning!