Intro to Computer Vision
with PoseNet
Processing with AI
For a computer, an image (or a video), is just a series of numbers each representing its pixel color. Using Computer Vision, they can try to make sense of these numbers and start detecting objects, shapes, people, roads, etc. In this module, we are going to use PoseNet and Teachable Machine to create a pose detection app.
PoseNet is a computer vision model that does an operation called pose estimation. Basically, it tries to detect if one or more people are present in a picture and, if so, estimate where are some of their features (nose, eyes, wrists, etc.)
Definitions
- Computer Vision is a subfield of computer science that focuses on giving the computer a higher level of understanding of images (photos or videos). Examples of computer vision research fields are: image segmentation, optical character recognition (OCR), face recognition, etc.
- Augmented Reality (AR), is the superposition of computer-generated elements (including but not limited to: text, pictures, or sounds) on a representation of the real world.
Before we start, watch this video on how to train a pose estimation model with Teachable Machine.
Using Teachable Machine, we can start to detect some basic positions. Thanks to the export options and by implementing different feedbacks (pictures, sound, etc.) depending on the user pose or position in a p5js canvas, you will be able to create a lot of fun things.
From games (ever played Just Dance?) to interactive art… but also much more serious applications: in medicine, for example, pose estimation can be used for Gait analysis for re-education, in sports for analyzing players movement on a field, but also in retail to follow how people move in a shop.
Quiz
Quiz
Answer the quiz to make sure you understand the main notions. Some questions may need to be looked up elsewhere through a quick Internet search!
This quiz is mandatory. You can answer this quiz as many times as you want, only your best score will be taken into account. Simply reload the page to get a new quiz.
The due date has expired. You can no longer answer this quiz.
Let's do it!
PRACTICE
It's your turn to play around with the pose estimation model in Teachable Machine. Create three classes, build your dataset and train your model. If you want to go further, export your model to p5js and try to add sound, animations, or multiple images when a specific pose is detected!
If you're having a lack of inspiration, you could use one of these positions:
- Arms raised (both? only the left/right one? any of the two?)
- Standing/seating
- Facing the camera/turning your back to the camera
- Hands in front of the face 🙈
- Doing the "plane" with your arms (if you're super ambitious, using this to steer in a game)
Going further
As you have seen in the video, when using PoseNet, you will get an array of 17 keypoints with their coordinates that you can use in your project if you choose to export your Teachable Machine to a p5js project.
To be more precise, for each keypoint you will get a Javascript Object that looks like this:
{
score: 0.999515175819397,
part: "nose",
position: {
x: 448.71417687560796,
y: 481.8986350924124
}
}
score
: A value between 0 and 1 representing the "confidence" of the model for this keypoint locationpart
: A short description of what this keypoint representposition
: Its position in the source image,x
for the x-axis andy
for the y-axis
Before we continue, a small reminder about how the p5.js canvas is oriented 😉
Tools
- TensorFlow is an open-source Python framework developed by Google, originally developed for doing complex calculations, it quickly became one of the most used tools for machine learning.
Tensorflow.js, is its Javascript implementation. You just learned how to use PoseNet, one of its pre-trained models. - ml5.js is a Javascript framework developed by teachers from NYU. Built on top of Tensorflow.js it enables you to quickly use a pre-trained model in your browser.
- Runway ML is a desktop app that makes it easy to try a lot of machine learning models.
- Lens Studio is the official app by Snapchat for creating custom filters.
Resources
- PoseNet model documentation
- In-depth Medium post by the TensorFlow team, going into the details of how Posenet works, it will tell you everything that you've always wanted to know!
Visit the J avaScript and p5.js pages to help you if you haven't done so already.