Recently my mom asked me the following question:
I reflected on this question and decided to write an article that explains Artificial Intelligence in simple terms. The goal is that any non-technical person can understand the high-level concepts around Artificial Intelligence and Machine Learning. It isn’t an easy task but let’s go for it!
To start this explanation, we will first define some concepts and then give two real-life analogies to keep things simple. We have to define concepts first so that we are all on the same ground about what are all these concepts and buzz words. Do not give up on the first level because it is on the next two levels that things get interesting — through analogies and visualizations, it is easier to learn concepts.
The article is divided into three levels:
You don’t have to read the entire article in one go. You can read one level per day.
We have all heard terms like Artificial Intelligence, Machine Learning and Deep Learning.But what do they mean and what’s the difference?
To understand the difference, it’s easier to look at the following image. From the image, we take the Artificial Intelligence is the big fieldthat tries to allow machines to mimic humanbehaviour, and a big part of humans is learning. Inside the field of Artificial Intelligence, we have Machine Learning. Machine Learning is a subfield of AIcomposed of techniques that enable computers to learn without being explicitly programmed to do it. Finally, inside Machine Learning, we have Traditional Algorithms and Deep Learning. These two subfields are just two types of broader techniques that enable computers to learn.
As said, Artificial Intelligence is the big field that tries to give computers human features like intelligence, reasoning, logic and learning. The goal is to have a computer that can mimic human intellect and behaviour. This can be done by a simple rule-based approach (if sunny and no clouds then no rain — yep this is artificial intelligence! You give a computer a set of rules to decide if it’s going to rain) or by more difficult algorithms like machine learning algorithms.
In one phrase, Machine Learning is the ability for a system to learn and improve from experience, without being explicitly programmed.But what the hell means “without being explicitly programmed”?
Well, I gave an example of a rule-based approach. In this example, we programmed a computer to tell us if it’s going to rain. We gave it a set of rules (if sunny and no clouds then no rain) and now the computer can make predictions based on new data. So, each day we give data (sun level, cloud level) to the computer and based on the data and the rules, the computer says if it’s going to rain or not. By giving the rules we explicitly programmed the computer to make predictions based on rules and new data.
The new paradigm that Machine Learning brought is that we don’t have to give rules to the computer, ie. don’t have to explicitly program it. We only have to give to the computer tons of data and it will alone learn the rules. The data that we give the computer is composed of information explaining the event and the answer. Then the computer tries to map the information to the answer. In the example above, the information could be temperature, sun level, cloud level and the answer is rain or no rain.The assumption here is: by seeing a lot of examples (experiences), the computer will eventually learn. Just like we do: the more days we live and the more experiences we have, the better we can tell if it’s going to rain or not. The main idea is to give huge amounts of data (experiences) and let the computer/program figure out how to map X to Y
But how does it learn? Keep calm, I will give an example in the next chapter
For now just keep this in mind: before we had to think and program rules to make computers do what we wanted. But now, we only have to give data and the computers try to learn the rules on their own using Machine Learning Algorithms.
There are always multiple ways of achieving a goal. In this case, the goal is to give the computer the ability to learn from data. Traditional algorithms are a bit simpler but it doesn’t mean that they don’t get the job done! Whereas deep learning algorithms are loosely inspired by the structure of the human brain and neural networks. Our brain is composed of neural networks that control everything from moving to learning. And Deep Learning took some concepts from our brains and implemented them in computers. Deep Learning uses artificial neural networks to enable a computer to learn.
So I said that we only need to give a lot of data to the computer and it figures out the rules and answers on its own. But how does it learn?Well, more a less like you and every other human:through experience.
How does a baby know whether an image represents a dog or a cat? First, the baby has to learn. Whenever you find a cat you show it to the baby and say “cat”. By doing this multiple times, the baby will eventually learn what is a dog and what is a cat. This ability of the baby was done by experience — by showing multiple examples and giving the answers. And that’s is what we do with a Machine Learning algorithm:we give it examples (images of dogs and cats) and say which image corresponds to a cat or a dog. By seeing all of the images and the labels, the algorithms try to figure out ways to predict the label based on the images.
On the first tries, the algorithm will fail but as we show more images, the algorithm will learn from previous mistakes and will eventually learn how to map the image to the corresponding label. The part “learn from previous mistakes” is an important one, because mistakes give information about what the algorithm did wrong and it will correct it for the next try.
We call the training phase the phase where the algorithm learns to map the images to the correct label. The training phase can be divided into three main tasks:
At each iteration of the training phase, the algorithm will see how and why it made bad predictions and correct itself for future predictions. Eventually, after many iterations, the error will get smaller because the algorithm is always trying to improve and to remove all mistakes and errors.
That’s why we say that Machine Learning learns by experience and is based on trial and error. The algorithm tests multiple hypothesis, evaluates what is bad with each hypothesis and comes up with a better hypothesis each time.
Now that you know the iterative process behind machine learning, it’s time to learn how a Machine Learning algorithm does better predictions at each iteration based on the error of the previous iteration. Let’s go, it’s the final level ????
In this section, I will try to explain how an algorithm can learn from previous mistakes and errors to improve its own predictions.
Imagine that you want to buy a house. There a lot of houses and the prices vary. You don’t really know what is the fair price to pay for the house. So, you implement a Machine Learning algorithm that based on the features of the house predicts its price. In order to do this, you first have to give data to your algorithm to train it — ie. learn. The algorithm will learn by seeing a lot of examples. In this case, we will give as examples the size of the house. The goal is to based on the size of the house predict the price.
Imagine that you went to a real estate agency website and gathered the price and the size of the house of various listings. Now you have the data and you plot it. It will look something like the following image.
These are all the examples that you have. They map the size of a house to its price. For example, we can see that example where a house is 400 m² and its price is 50k €. We can also infer, that the bigger the house the higher the price.
Imagine that you are the Machine Learning Algorithm. How would you figure out the price of the house given the size? Well, we can see that it's fairly easy to fit a line into the data. Take a look at the blue line in the following image.
With that line, we can predict the price given the size of the house. For example, if we see a house with 2000 m² we can put it in our plot and with the help of the blue line, we can predict that the price of the house is 250k €
Niceeeeeee! But how does the algorithm find the best line?
A line in a 2d plot is defined by y = m*x + b. Y and X are our data — price and size of the house. So, the only thing that the algorithm can tweak is m and b. In this case, the algorithm has to find the best m and b values that create the perfect line (blue line). It finds the values by testing a lot of lines, checking the errors in the predictions and finding better lines. Just like the steps we said above — predict, evaluate, improve.
The algorithm starts by randomly selecting m1 and b1 which correspond to the green line. Then, it compares the predicted values with the real values. For example, for a house with 1000 m², our algorithm would predict that the price is 50k € but the real price is 150k €. Wow, that’s an error of 100k €.
So we see that our algorithm is undervaluing the houses. I.e it’s predicting lower values than the real ones. Based on the errors, the algorithm thinks “Well I’m predicting low prices so I must find an m and b that create a line that doesn’t predict lower prices”. Off he goes and tries the red linethat we can see in the next image.
Well, it looks like our algorithm is now predicting higher prices but the prices are much higher than the real ones. The algorithms thinks “Ups, the line predicts really big values. Must find a line that is between the green (low predictions) and the red line (high predictions)”.
Off he goes and tests the yellow line. After analyzing the errors, figures out that he can do better and tries the purple line. Still not satisfied, analyzes the errors of the purple line and thinks “Well I’m almost there. For some houses, I’m still predicting the price higher than the real price. Must lower the line just a little bit”.
Finally, the algorithm finds the blue line that fits almost perfectly the data. The algorithm is satisfied because the errors between its predictions and the real values are small.
Here we saw the algorithm testing multiple hypotheses (in this case, lines), analyzing the errors of each hypothesis and creating new and better hypotheses.
One thing to note is that Machine Learning algorithms are really good at finding patterns. In this example, the pattern is that the bigger the size of the house the higher the price. So, when the algorithm finds this pattern it tries to create lines that match that pattern.
To summarize, the algorithm will iteratively:
The algorithm's goal was to find m and b that created a line that would reduce the errors.
Imagine that you are trying to tune in to a radio frequency. You turn the knobs to find the perfect value that reduces the amount of noise and interference.
That’s exactly what a machine learning algorithm does: tunes the knobs to reduce the errors. In this case, the knobs were m and b. The algorithm went out to tune m and b so that the error between the predictions and the real values was small.Different Machine Learning Algorithms have different knobs and will produce different results.
Well done, you’ve got to the end of the article! I really hope this article helped you to understand a little bit more about AI and Machine Learning.
If something wasn’t clear or if you have suggestions to improve this article please let me know in the comments :)