Intro to Computer Vision
with PoseNet

Processing with AI

For a computer, an image (or a video), is just a series of numbers each representing its pixel color. Using Computer Vision, they can try to make sense of these numbers and start detecting objects, shapes, people, roads, etc. In this module, we are going to use PoseNet and Teachable Machine to create a pose detection app.

PoseNet is a computer vision model that does an operation called pose estimation. Basically, it tries to detect if one or more people are present in a picture and, if so, estimate where are some of their features (nose, eyes, wrists, etc.)

Definitions

In this video, you can see the output of PoseNet. As you can see this is not perfect, but it can run in real-time, even on a smartphone!

Before we start, watch this video on how to train a pose estimation model with Teachable Machine.

Using Teachable Machine, we can start to detect some basic positions. Thanks to the export options and by implementing different feedbacks (pictures, sound, etc.) depending on the user pose or position in a p5js canvas, you will be able to create a lot of fun things.

From games (ever played Just Dance?) to interactive art… but also much more serious applications: in medicine, for example, pose estimation can be used for Gait analysis for re-education, in sports for analyzing players movement on a field, but also in retail to follow how people move in a shop.


Quiz

Quiz

Answer the quiz to make sure you understand the main notions. Some questions may need to be looked up elsewhere through a quick Internet search!

This quiz is mandatory. You can answer this quiz as many times as you want, only your best score will be taken into account. Simply reload the page to get a new quiz.

The due date has expired. You can no longer answer this quiz.


Let's do it!

Put your hands up in the air! 🙋‍♂️🙋‍♀️
This code tries to detect when someone's hand is above its head by comparing the height of their right wrist with the height of their nose.

PRACTICE

It's your turn to play around with the pose estimation model in Teachable Machine. Create three classes, build your dataset and train your model. If you want to go further, export your model to p5js and try to add sound, animations, or multiple images when a specific pose is detected!

If you're having a lack of inspiration, you could use one of these positions:

  • Arms raised (both? only the left/right one? any of the two?)
  • Standing/seating
  • Facing the camera/turning your back to the camera
  • Hands in front of the face 🙈
  • Doing the "plane" with your arms (if you're super ambitious, using this to steer in a game)

Going further

As you have seen in the video, when using PoseNet, you will get an array of 17 keypoints with their coordinates that you can use in your project if you choose to export your Teachable Machine to a p5js project.

To be more precise, for each keypoint you will get a Javascript Object that looks like this:

{
                    score: 0.999515175819397,   
                    part: "nose",
                    position: {
                        x: 448.71417687560796,
                        y: 481.8986350924124
                    }
                }
  • score: A value between 0 and 1 representing the "confidence" of the model for this keypoint location
  • part: A short description of what this keypoint represent
  • position: Its position in the source image, x for the x-axis and y for the y-axis

Before we continue, a small reminder about how the p5.js canvas is oriented 😉

Tools

Resources

Visit the J avaScript and p5.js pages to help you if you haven't done so already.

Project examples