As one of the most popular Massive Open Online Courses (MOOC) for data science with over 2.6M enrolled (as of Nov 2019) and currently hitting an average user rating of 4.9/5… It's no doubt that the Machine Learning certification offered by Stanford University via Coursera is a massive success.

This is undoubtedly in-part thanks to the excellent ability of the course's creator Andrew Ng to simplify some of the more complex aspects of ML into intuitive and easy-to-learn concepts.

However, I had two reservations as I was going into this course:

  1. This course was produced in 2011, is all it still relevant?
  2. Will I just be going over things I already know?

Regarding the latter, in Summer 2018 I got lucky and landed a sponsorship followed by job as a Data Scientist researching NLP for automation on a big contract my employer had been working on (manually, with a big team of people) for almost three years. It was an amazing opportunity and where the majority of my current knowledge stems from.

With the remainder of this article I wanted to answer these two concerns and give an overview of what to expect from the course.

What will this course cover?

The course covers ALOT, it manages to cram a surprising amount of detail into a seemingly small period. Not that it lacks any depth, in fact it is the depth of the material that I believe is the strong point of this course.

You will cover Linear and Logistic Regression, Vectorisation, Regularisation, Neural Networks, Feedforward and Back Propagation (this is very good), Cost Functions, Network Initialisation, SVMs, Dimensionality Reduction, Supervised/Unsupervised Learning, Principal Component Analysis (PCA), K-Means Clustering, Anomaly Detection, Recommender Systems and much more.

There is also LOTS of advice for applying machine learning such as diagnosing bias vs variance error, implementing train-validation-test sets, how to measure model performance (accuracy, precision, recall and F1 scores), which algorithms work better with lots of/lack of data and how to adjust said algorithms to better suit our needs and/or situation.

My Experience, week by week

Week 1 — Introduction, Linear Regression and Linear Algebra

This week is pretty straight-forward, I think it is a great introduction especially if you have not studied or worked with mathematics for a little while. There's also some simple logical questions and an introduction to the basics of statistics and machine learning such as classification v regression and supervised v unsupervised learning.

The assessment for this week is very easy with just two quizzes. There is also an optional section on Linear Algebra. I think even if you are comfortable with linear algebra it is good to just go over the section as I found Andy Ng sometimes explaining things in a different light to how I would usually think, and I believe it is always useful to understand concepts from as many angles as possible.

Week 2 — More Linear Regression, Introduction to Octave (or MatLab)

This week we look at linear regression with multiple variables. It is not difficult to move from univariate to multivariate linear regression and I do not think many people will find this too difficult.

There is also an introduction to the normal equation which I had never used before, again this was not difficult but fun to use! The most challenging part of the week is translating this into Octave code during the first programming assignment.

Ofcourse, prior to the assignment we are introduced to Octave (or MatLab if you prefer). Octave is simply open-source MatLab, so it's pretty easy to pick up if you have used MatLab (or Python, the syntax is very similiar).

Week 3 — Logistic Regression, Regularisation

I think this is where the course starts to pick up into more complex concepts. Here you will cover a lot of the important ML concepts such as classification, hypothesis, decision boundaries, cost functions, gradient descent (and a brief look at advanced optimisation techniques), multiclass classification, overfitting, regulatisation and so on.

This week is not too hard but I think it covers a lot of important topics so it is certainly a crucial week. The actual focus of the week is upon logistic regression, which after linear regression is a nice (not too hard) step up!

The programming assignment is simple but I did get stuck briefly due to minor errors in my code (think (1/n*x) rather than (1/n)*x), embarrassingly this took me much longer to solve than I'd like to admit…

Week 4 — Neural Networks: Representation

We start this week with two motivational videos looking at non-linear hypotheses and the allusion between neural networks and neurons in the brain.

The remainder of the week then goes into depth in how neural networks work, Andy does a brilliant job in explaining the intuition behind neural networks in this week.

The week finishes with another programming assignment. This one is more involved but not too difficult and I found it to be quite fun!

Week 5 — Neural Networks: Learning

During this week we touch on the Cost Function (again, briefly) and Backpropagation. The first half is backpropagation, the mathematics and intuition behind it. The second half is how to check we're implementing it correctly (gradient checking, super useful) and how/why we randomly initialise the network weights.

This week is hard. Even though I covered this around a year ago using NumPy, I found that I had forgotten all but the high-level comprehension of what was happening. Backpropagation intuition is difficult. Is only when I got to the programming assignment that I realised how little intuition I had for this algorithm.

I found it really helps to just keep repeating the steps you take. Going through the feedforward and backpropagation steps on paper, examining how the array dimensions change with each step and trying to grasp at why it works.

Week 6 — Advice for Applying Machine Learning

Surprisingly it was at this point that my 2nd reservation "Will I just be going over things I already know?" came up. However I found myself really benefiting from this week because I felt that redoing this material really compounded on my prior experience.

None
A relationship that explained a lot, was very simple, but I had simply never seen it displayed visually before.

Additionally the way that Andy elegantly details the pros and cons behind each optimisation concept whilst tying this with mathematical notation was excellent. Although I knew of this concepts, I had never really looked at them in notation before. Altogether this was an excellent week.

Week 7 — Support Vector Machines

This was the first week where I didn't really know the subject already. I was aware of SVMs and I knew they were relatively simplistic, but I hadn't actually used them before. Despite this, as I suspected the theory and math behind SVMs was quite simple and so I got through this week during a couple of workday evenings.

I enjoyed this week, I was still recovering from week 5 so the lack of intensity was a relief. It covered the intuition behind SVMs very well and also introduced Gaussian kernels, which again is something I was aware of but I had unconsciously avoided. I found the parts on when/where to use SVMs and the intuition behind this particularly useful.

None
The SVM model decision boundary built during the programming exercise.

The programming exercise was maybe too simple, I would have liked to implement an SVM model manually, but rather we just define a Gaussian kernel function which is pretty straightforward. The latter half of the exercise was interesting (but also simple) as we look at spam classification using SVMs. This reminded me a lot of the big commercial ML project I mentioned in the intro, in which I used Recurrent Neural Networks (RNNs) with word2vec and multiple other methods to classify client emails, which was cool!

Week 8 — Unsupervised Learning

I headed into this week pretty unexposed to unsupervised learning. I aware of the basic concept behind K-means clustering, but nothing more. The first thing I realised is that K-means is super simple, which was a relief! The optimisation is essentially the same as the Gaussian kernel function (and many other optimisation functions now I think of it). Nonetheless it was very interesting and I can see it being very useful.

The latter half of the week looks at Principle Component Analysis (PCA). At a high level I had some knowledge of this, but very little. I had used a modified version of this called t-SNE for word vector visualisation, but never got into the inner workings of it. For the scope of this course I think PCA is covered well and again is not too difficult which I suspect again is largely due to Andrew Ng's deep understanding and teaching ability.

Finally, the programming exercise was not too difficult. I found the coding logic to be a little more difficult than weeks 6 and 7, but again not too difficult. I found this exercise really useful for finishing up with K-means and PCA.

Week 9 — Anomaly Detection and Recommender Systems

Again I have no real experience with what this week is covering. The first half looks at Anomaly Detection using Gaussian and Multivariate Gaussian distributions (or density estimation). This is pretty straight forward but really useful, I am intending test implementing Multivariate Gaussian distribution for anomaly detection as an additional feature to a data analysis tool I often use!

The latter half of this week focuses on recommendation systems. Personally I don't think I will find this useful, but it was definitely interesting and I liked the format of learning several algorithms with increasing complexity.

Week 10 & 11

These 2 weeks are only short, with no programming assignments. More of a finishing up and concluding on what we have already learned. I finished both of these weeks during a very long flight from Beijing to London! Fortunately as there's no programming assignment you can download the videos and complete the 2 quizzes from your phone, which is amazing!

Although short, I still found both 10 & 11 useful and a very nice way to finish the course.

Conclusion

An excellent, all-round foundation to machine learning. Pros include:

  • Covers a wide range of ML methods
  • Is not afraid to tackle the mathematics and Andrew Ng is excellent in teaching the intuitions
  • Significant portion of course focuses on Neural Network fundamentals
  • Amazing coverage of how to actually apply methods and typical pitfalls of most ML engineers

Cons

  • The later programming assignments are sometimes too abstracted, later weeks felt like, «write one calculation and let rest of prewritten code do the heavy lifting».

Final pro/con

  • The programming assignments are done in Octave – this is a pro for allowing learners to bypass many of the syntax/logic problems that would likely come with other languages. But a con as it is not a particularly widely-used language in industry…

Overall, I believe that more challenging and in-depth assignments could have been given towards the end of the course on the smaller sections, but other than this I really enjoyed the course and I feel that I got a lot of value from it. I would highly recommend to anyone looking to either break into ML/data science, or strengthen their foundations.

Please let me know what you think or ask questions in the responses below!

Following this course I begun IBM's Advanced Data Science Specialization, if you are interested in my key takeaways on the first course, see here:

Thanks,