Intro to Computer Vision
with PoseNet

Processing with AI

For a computer, an image (or a video), is just a series of numbers each representing its pixel color. Using Computer Vision, they can try to make sense of these numbers and start detecting objects, shapes, people, roads, etc. In this module, we are going to use PoseNet and Teachable Machine to create a pose detection app.

PoseNet is a computer vision model that does an operation called pose estimation. Basically, it tries to detect if one or more people are present in a picture and, if so, estimate where are some of their features (nose, eyes, wrists, etc.)

Definitions

Computer Vision is a subfield of computer science that focuses on giving the computer a higher level of understanding of images (photos or videos). Examples of computer vision research fields are: image segmentation, optical character recognition (OCR), face recognition, etc.
Augmented Reality (AR), is the superposition of computer-generated elements (including but not limited to: text, pictures, or sounds) on a representation of the real world.

In this video, you can see the output of PoseNet. As you can see this is not perfect, but it can run in real-time, even on a smartphone!

Before we start, watch this video on how to train a pose estimation model with Teachable Machine.

Using Teachable Machine, we can start to detect some basic positions. Thanks to the export options and by implementing different feedbacks (pictures, sound, etc.) depending on the user pose or position in a p5js canvas, you will be able to create a lot of fun things.

From games (ever played Just Dance?) to interactive art… but also much more serious applications: in medicine, for example, pose estimation can be used for Gait analysis for re-education, in sports for analyzing players movement on a field, but also in retail to follow how people move in a shop.

Quiz

Answer the quiz to make sure you understand the main notions. Some questions may need to be looked up elsewhere through a quick Internet search!

This quiz is mandatory. You can answer this quiz as many times as you want, only your best score will be taken into account. Simply reload the page to get a new quiz.

The due date has expired. You can no longer answer this quiz.

Let's do it!

Put your hands up in the air! 🙋‍♂️🙋‍♀️
This code tries to detect when someone's hand is above its head by comparing the height of their right wrist with the height of their nose.

PRACTICE

It's your turn to play around with the pose estimation model in Teachable Machine. Create three classes, build your dataset and train your model. If you want to go further, export your model to p5js and try to add sound, animations, or multiple images when a specific pose is detected!

If you're having a lack of inspiration, you could use one of these positions:

Arms raised (both? only the left/right one? any of the two?)
Standing/seating
Facing the camera/turning your back to the camera
Hands in front of the face 🙈
Doing the "plane" with your arms (if you're super ambitious, using this to steer in a game)

Going further

As you have seen in the video, when using PoseNet, you will get an array of 17 keypoints with their coordinates that you can use in your project if you choose to export your Teachable Machine to a p5js project.

To be more precise, for each keypoint you will get a Javascript Object that looks like this:

{
                    score: 0.999515175819397,   
                    part: "nose",
                    position: {
                        x: 448.71417687560796,
                        y: 481.8986350924124
                    }
                }

score: A value between 0 and 1 representing the "confidence" of the model for this keypoint location
part: A short description of what this keypoint represent
position: Its position in the source image, x for the x-axis and y for the y-axis

Before we continue, a small reminder about how the p5.js canvas is oriented 😉

Tools

TensorFlow is an open-source Python framework developed by Google, originally developed for doing complex calculations, it quickly became one of the most used tools for machine learning.
Tensorflow.js, is its Javascript implementation. You just learned how to use PoseNet, one of its pre-trained models.
ml5.js is a Javascript framework developed by teachers from NYU. Built on top of Tensorflow.js it enables you to quickly use a pre-trained model in your browser.
Runway ML is a desktop app that makes it easy to try a lot of machine learning models.
Lens Studio is the official app by Snapchat for creating custom filters.

Resources

PoseNet model documentation
In-depth Medium post by the TensorFlow team, going into the details of how Posenet works, it will tell you everything that you've always wanted to know!

Visit the J avaScript and p5.js pages to help you if you haven't done so already.

Project examples

**Body, Movement, Language**
Dance performance

**remove.bg**
Automatic image background removal