This assignment is mandatory.

Now that you have experimented with finding a good dataset and made your first predictions, it's time to get started and make your own project!

Build your model

From choosing a dataset to making predictions

During this assignment you will have to think about and choose an use case in which the introduction of a prediction model would optimize, streamline or transform the way things are done. Explain your choices and why the predictions made would be interesting and to whom it can benefit/be useful.

For exemple you could make :

  1. You may choose your strongest and preferred language to complete this assignment, between English and French. Start by formulating your use case, and fill all the relevant fields in your Notion page in English or in French. At the end of this page you will find instructions about Notion. Keep reading the instruction for now.
  2. What do we expect when we ask for a "use case"?

    Generally explaining at least the problem you are solving, a basic description of your users, and how your project will help them is enough.

    For example:

    • We can use Akkio to predict the price of houses.
    • Using a model trained on Akkio, a real estate company would like to automatically and continuously estimate the price of properties in Paris. Thanks to data retrieved on site, the objective would be to create a model capable of automating the valuation of a property or finding good deals. It would therefore be a tool to help the employees of this company and thus give them a better understanding on the field.
      1. The dataset will need to be updated continuously, and the model re-trained to keep up with market prices.
      2. This kind of model can lack precision, it should not replace the expertise of an agent but support it
  3. Once you know what your want to do and why, it's time to collect some data!
    Go to Kaggle and search for a good dataset, one that would meet your requirements. Think about what we saw in the Datasets chapters and what we are looking for in a dataset.

    Before downloading your dataset, make sure to look at the usability score, at how the file is made and at how the data were gathered. If something seems wrong about the dataset, don't use it. There are a lot of other options on the site, take the time to find one that suits you, everything will be easier.

    Be careful about the file extension, make sure you are downloading a ".csv".

  4. You will then have to clean your dataset to make it usable. However, if you consider it "clean" you can skip this step. In both cases, you will document your thought process.

    Select the correct option according to your choice:

    • If you think that the dataset is useable as it is and you don't need to clean, explain why.
    • If this isn't the case, and you have to clean it in order to use, explain what you did and why.
    • You have to go into details here!

  5. Once you have your dataset, you can move on to Akkio where you will use it to make your first predictions! (Do not use your em-lyon email address to create an account).

    If Akkio isn't working in your case, even without an em-lyon address, you use an other solution: Obviously.ai. Use your em-lyon address for this one!.

Tools

For this assignement you will have to fill this page on Notion in English or in French.

Make sure to properly duplicate the template before using it!

Then copy the sharable link to your notion page as shown in the video below:

Paste the copied link in the submit section below.

Before submitting, make sure you check all of the criteria below.

The due date has expired. You can no longer submit your work.

Evaluation criteria

  • Definition of the use case (5 pts)
  • Your use case is well defined and answer a real problem.
    The implementation of the ai described would improve the initial situation.

  • Dataset (5 pts)
  • You chose a proper dataset, and you managed to make it usable. Your choices were properly explained.

  • Prediction analysis (5 pts)
  • Good selection of data to use in your model.
    Prediction result were properly explained, good analysis of the impact on your use case.

  • Ethical considerations (5 pts)
  • Ethical concerns were identified and detailed
    Future improvements are explained properly.