I have recently decided to start learning about artificial intelligence, and specifically about neural networks, because I believe it would allow me to solve new problems I am currently unable to tackle.
As a first step, I would like to be able to write a simple neural network that would allow me to recognize handwritten digits (using the MNIST dataset), and I would rather like to be able to do that in the next few weeks/months.
I have found many resources for beginners, some of which seem really good (like this one : http://neuralnetworksanddeeplearning.com/chap1.html). I have read the beginning of 5-6 books about neural networks, but the problem I always have is that after some point, I get lost in the explanation, due to my lack of knowledge in math.
In the linked tutorial for example, I have trouble with the following symbols : ∇, ∂, →. More generally, I stumble upon mathematical notation : the constants, which I don’t know but that authors assume the reader knows about, and special symbols.
So here comes my question : what kind of mathematics do I need in order to get me quickly started into neural networks, and that would allow me to read without too much trouble one of those great books/tutorials I found. I have some very basic notions in linear algebra, but that’s about it.
3
Answers
I’ll just go out and say it: Have you looked for a course which is less mathematical?
While you certainly need the advanced math to fully master the topic, you could find an introductory course which takes more of an overview of the field, along with some specific, hands-on practical working examples. That’s how I learned Python (thanks, YouTube!)
EDIT:
@sputnik
Very simply, I think you don’t.
I can’t recommend highly enough the on-line visual introduction to machine learning.
The main point to the primer I referred to is that they start you off on Decision Trees. This is reminiscent of the Expert Systems employed half-a-century ago, and serve as an excellent narrative for someone just getting to know the subject.
Tree based models are famously the most common technique among Kaggle competition winners, XGB trees* in particular. Only next come Neural Nets, so why settle for second best? 😉
For example, check out this latest Kaggle interview: Airbnb New User Bookings, Winner’s Interview: 2nd place, Keiichi Kuroyanagi (@Keiku)
You can do Statistical Inference without advanced calculus. Even without formal math at all, if you’re only interested in getting to know the subject.
* The ‘G’ in XGBT does stand for Gradient, so you’ll eventually have to do some math, but why not just have fun starting out? You’ll learn more, quicker!
Other XGBT links:
Story and Lessons Behind the Evolution of XGBT
XGBT — Let Us Learn From Its Author
You need a grounding in calculus in order to understand the math underlying basic neural network training. There’s really no way around it — most neural network training is some variant on “gradient descent” optimization. The gradient is a form of derivative; in order to find the gradient, you must take the derivative.
If you’re looking for a bare minimum, then you can get by with understanding derivatives, rather than pushing on to integral calculus — e.g., two of the symbols in your question stand for taking a particular kind of derivative.
Linear algebra is important as well, and it is good that you have some. You don’t really need anything fancy in order to understand basic neural network stuff, but knowing what a vector is, how matrix multiplication works and why, is also essential.
I recommend you audit the Coursera course on Machine Learning taught by Andrew Ng. It’s very well structured and paced and he presents the math in a very comprehensible manner. If you can stick to it through Week 5, the work assignment is actually the very problem you’re interested in: a neural network for hand-written digit recognition.