Models:
Introduction

Processing with AI

Now that we have our dataset on Lobe.ai, we can train one of the models available and start making predictions! We saw that the two models ResNet and MobileNet are Deep Learning models.

In this chapter we are going to discover the different kinds of model we encounter in Machine Learning, and their specificities.

Machine Learning models

You already know that AI is a mix between Computer Science and Statictics and that Machine Learning models are mainly based on statistics and logic. We can even talk about "statistical models" to describe Machine Learning models and by extension Deep Learning models.

A statistical model is an approximate mathematical description of the mechanism that generated observations.

Take this definition above, we first have the input data, like physical data or pictures. Then, we have a trigger that would act as the statistical model. And finally as the observations we have an inference, or a price for example.

Now let's think of this definition in a real use case example:

Example

How to estimate a real estate price:

Traditionally:

  1. We have as input data the description of the real estate with all its specificities. For example, the number of rooms, the surface in square meters, etc.
  2. Then we have a real estate agent agent that can fix a price depending on the description using his experience and analysing the current market.
  3. Finally we have the estimated price.

Using a statistical model:

  1. We have as input data the description of the real estate with all its specificities.
  2. Then we have a Machine Learning model perfectly trained on all the descriptions.
  3. Finally we have the estimated price.

Just like a child, a Machine Learning model will learn from experiences to define its behavior in new situations. The past experiences are the training dataset and the new situation is the production dataset composed of data never encountered. This is the difference between the training phase and the inference phase. The training phase corresponds to the training of the chosen model with the collected dataset. The inference phase is the use of the pre-trained model on unknown inputs.

It is therefore necessary to distinguish between a model and a pre-trained model.

  • A model corresponds to a system that is capable of self-tuning in order to make predictions.
  • A pre-trained model is that same system but this time well-tuned thanks to the dataset it has learned and tuned on.

Depending on the type of prediction we want to make and the dataset we have, we will have to choose a suitable model. Let's see the choices we have in Machine Learning.

machine learning subfields

Like you can see it in the scheme above, there are three subfields in Machine Learning, three ways to train models:

Deep Learning is not part of this scheme because it is a not really a subfield but a type of model, even if its name can be confusing.

Then there are four big categories of models in Machine Learning:

machine learning subfields

Different types of models can be used in these categories: Neural Networks, Decision Tree, etc. The difference between these types of models is their structure, the way they adjust. Machine learning models that use Neural Networks algorithms are the most common.

Which model for which use?

The model used depends on the problem we are trying to solve. Dimensionality reduction models are often used during the data-cleaning process, in order to simplify the problem. It is like suppressing columns (variables) in an Excel sheet. Then, let’s say we are building:

However, the data at hand also defines what is possible. If our data:

These non-exhaustive data properties add constraints to what is possible.

During this course, you will mostly use Deep Learning models, like in Lobe.ai and Akkio. So let's have a look at them!

Neural Networks and Deep Learning

Neural Networks

But first, what are Neural Networks? Let's start by stating two things that most people get wrong about neural networks:

  1. Artificial "neurons" are just a fancy way to describe a programming building block that processes numbers and only numbers.
  2. We use the term "neuron" only because it has some high-level similarities with a biological neuron.

Let's compare a biological neuron and an artificial neuron also named a perceptron:

Neuron

Biological neuron schematic
A simplified schematic of a biological neuron.

A biological neuron is a cell, mainly found in our brains, that communicates with other neurons by using chemical or electrical signals. It receives inputs via multiple channels called dendrites and transmits its output (notice the singular here) through an axon that develops into synapses.

Perceptron

Perceptron schematic
A schematic representation of a perceptron,
often called an "artificial neuron".

A perceptron takes one or several numbers as input and outputs a category. Each input is given a weight, meaning that a variable represents how each input influences the output. To predict the category, the perceptron will sum the weighted inputs and apply a predefined function (the activation function). Depending on the result of this function the perceptron will output one category or another.

When we say that we are "training" a model, what the algorithm is actually doing is just tweaking every weight to get the best accuracy on the training dataset and trying to minimize the error rate.

You might now see why we tend to call a perceptron an "artificial neuron", they sure share some similarities: they each are a single entity that takes one or several inputs and output one signal depending on its inputs. Except, perceptrons are just maths!

Perceptrons were created in 1957 by Frank Rosenblatt. His first implementation was 100% analog and used motors that turned potentiometers during the learning phase to set weights. It could process 20x20 pixels images, what a beast!

The first implementation of a perceptron
The first perceptron.

Networking

Let's see how to combine perceptrons and create a Neural Network!

Artificial Neural Network
A neural network with three inputs, three layers, and two outputs.
A layer that is not an input or an output layer is called a hidden layer.

One of the simplest forms of Artificial Neural Networks (ANN) is the Multi-Layer Perceptron. Multiple perceptrons are simply linked:

You can play with an ANN directly in your browser on TensorFlow's Playground to see how adding layers changes the behavior of a network.

You might be wondering how Data Scientists choose the shape of their networks ? Well, that's a great question! For each neuron that you add, that's a new set of weights that will need to be trained, which results in longer training times (and time is money!) and a need for more and more data to train the model. So, from a technical point of view, it involves finding the right balance between complexity and learning time. From a managerial point of view, they have to find the project scale that best allows to optimize performance, keeping in mind time and budget constraints.

In 2019, GPT-2 one of the largest Natural Language Processing (NLP) model have 1.5 billion parameters, it was trained basically on all Reddit posts!

Since the launch of this course, GPT-3 was released, it now contains 175 billion parameters. You can discover some projects based on GPT-3 here.

Since this course update, chat GPT-4 has been released, and is even more powerful. It can even use images as input!

Deep Learning

Deep Learning models are based on Neural Networks algorithms. They are complex multi-layered perceptrons with at least one hidden layer. We call it "deep" because of the multi-layers. It is most often used to process language, speech, noise, writing, images. Indeed, the multi-layers allow to decompose in a hierarchical way the content of a complex data such as voice or image to classify it: identify words for voice or associate descriptive tags to images.

AlexNet network architecture
The picture above is the graphical representation of AlexNet this network from 2012 is (partly) what started the current hype for Deep Learning.

A Deep Learning model is a Machine Learning model that:

  1. Has a complex architecture, its building blocs and the way it is trained are more complex than a Multi-Layer Perceptron, it includes more than one hidden layer.
  2. Needs a lot of data and processing time and power to be trained.

Learn how to use Runway, a website that makes anyone able to use state of the art Deep Learning models without coding.

How to train a model correctly?

There are two frequent issues we can encounteer while training a Machine Learning model. It happens when the trained model is biased. It comes from the dataset and the variances inside it.

In this module:

  • a bias measures how far off in general a model prediction is from the correct value. A gun that systematically shoots on the left, money coin that mostly falls on one side. A model can similarily be biased toward a specific prediction.
  • a variance a variance is how much the predictions changes for a small change in the input data. For example, if for a house of 60m2 the model predicts a price of 60k, we expect the model to output a close price for a similar surface (e.g. 61k for 61m2, then 63k for 63m2). A model with high variance would probably predict a very variable output like (70k for 61m2, 63k for 63m2). A high change in the output for a small change in the input variable.
  • Underfitting

    The first issue is the underfitting, it is when your trained-model does not perfom well due to a lack of diversity in your dataset.

    Example

    A classification model to detect balls 🏀

    Goal: we want to train our model to learn what is a ball.

    Dataset: our dataset is composed of some pictures of basket balls, soccer balls and baseball balls.

    Result: It does not perform well, indeed when you show an apple it detects a ball!

    In this example, our model is biased because it has not trained on enough diverse pictures and "think" a ball is just a round object. To correct the underfitting, it is simple, we just need to add more variance in our dataset. It means that we have to work on its completeness by adding more pictures of ball in more different contexts!

    Overfitting

    The second issue is the overfitting, it is when our trained-model does not perform well on unknown data due to an "over-train" on one dataset.

    Example

    A classification model to detect balls 🏀

    Goal: we want to train our model to learn what is a ball.

    Dataset: our dataset is composed of many pictures of basket balls, soccer balls and baseball balls, in a lot of different contexts.

    Result: It does not perform well, indeed if you show an apple it does not recognise a ball but when you show a golf ball it does not detect a ball either!

    In this example, our model is not working properly because it has been too much trained on certain balls and it knows them "by heart". To correct the overfitting, we need to separate our dataset in two parts. We will go deeper on the overfitting process in the Ethics of AI module.

    machine learning subfields
    The center of the target is a model that perfectly predicts the correct values. "Understanding the Bias-Variance Tradeoff"

    Underfitting happens when the variance is low and the biases high while Overfitting happens when the variance is high and the biases low. Actually, all the difficulty stands in finding the right balance between bias and variance.

    machine learning subfields
    We are looking for the convergence between low biases and the right variance, the "fit point".

    We can represente an underfit, an overtfit and a regularized model with three linear regressions:

    machine learning subfields
    The green line represents an overfitted model, the yellow line an undertrained model, and the black line a regularized model.
    While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line.
    Wikipedia

    Measuring a model performance

    You ensured the model has a good fit, congratulations! Is it finished? Well, how did you measure the model performance? Did you compute its accuracy? (average good answers). What if it was not enough?

    Let’s say we are an insurer trying to build a tool to predict for each day for each client if it will have a car accident. However, a car accident occurs rarely for each person, say once every 10 years. A dummy (stupid) model always predicting that clients will have no accident would be right 365 x 10 - 1 (3649) times over a 10 years period! It seems excellent with a 99.97% of accuracy. But the model missed the single time when the accident happened. You don’t need AI if it is to make the same prediction for any client every day. That is where you need to use a different type of performance measure that seriously takes into account the number of times your model correctly predicted when an accident happened (for example the recall and F1-score).

    PROJECT

    Let's finish what we started. Now that you know a lot more about Machine Learning models, let's train one on our dataset in Lobe!

    Lobe doesn't work on recent IOS devices, but you can use Teachable Machine instead!

    1. The last step of this exercice is the training part. Watch the video below and train your own model on Lobe.ai!
    2. Go to the Use tab in Lobe to make your first predictions 🚀

    Sum up


    Before starting the Project module, we must stop on ethics in the next chapter. Undoubtedly it is really important to have in mind the side effects AI can have in our lives.