Have you ever wondered how Snapchat filters work? Three words, facial landmark recognition. Referred to also as face masking, this is a computer vision method that will exactly identify and map the geometry of your face, which can then be represented by dots and segments across all of your features. Doing this means that you know exactly where your eyes are in relation to your eyebrows, or where your nose is in relation to your lips.
Using very similar geometry mapping principles, pose estimation expands on this by identifying the location of every key part of your body. So hey gang, Tim here at Core Electronics and I'm demonstrating both these computer vision methods with a Raspberry Pi single board computer. Here on the table is everything you need to start.
You're going to need the most powerful Raspberry Pi you can get your hands on. Here I'm using the Raspberry Pi 4 model B. As this is a computer vision project, you're also going to need a camera. In my case, I'm using a HQ Camera with a 6mm wide lens attached. Simply, I'm going to set up this Raspberry Pi as a desktop computer with a camera attached.
There's a link in the description if you need a hand doing this. Keep in mind, you're also going to want a microSD card that has been flashed with Raspberry Pi Buster OS, that has been connected to the internet, that has a number of packages such as OpenCV and MediaPipe installed, and has the camera enabled in the configuration. Check the written guide, link in the description as well, for the step-by-step process on doing each of these. And just like that, the Raspberry Pi is all set up. So, I've created two fully annotated Python scripts, so let's start with the FaceMask1.
As a note, you can find and download these scripts by jumping to the bottom of the written article page. With the file downloaded, just unzip the contents onto your Raspberry Pi desktop or wherever you deem appropriate.
All you need to do now is right-click on the facemesh.py and open with phony IDE or any other Python interpreter that you would like. Then with it open, run the script by pressing the big green run button. As soon as you do it, it's going to open up a live preview window of what the camera is seeing and initiate face masking. These points that you can see on my face now, which are often referred to as landmarks, all have coordinate data associated with them.
The MediaPipe face masking system I'm using here has 468 landmarks at around 10 FPS on the Raspberry Pi 4 Model B. It's quite incredible how accurately it knows where your face features are, even when you turn or hide your head from the camera. Landmark recognition exactly like this is occurring in the background whenever you run a Snapchat face filter on your phone. And with some directed coding, you too can create your own custom filters.
For some inspiration, check out the results achieved by the MediaPipe team. Allow me to quickly jump into the script to show you what's going on behind the curtains.
We start by importing all the important functionality to the script. We also create two variables up here that make drawing the dots and framework possible. After this, we set up the style of the lines drawn. The next line, we decide some of the settings for our face mesh script, like the max number of faces and the possible confidence factor for detection. Next, we bump into this depth section. This is used later on in the program and is really taking each frame captured by the camera, identifying the face and drawing onto it the mesh framework.
After this, we just decide on the font for the FPS tally, then choosing our camera module as the video capturing device. Here is an if statement, so just in case our camera module isn't connected correctly, it will spit out an unable to read camera feed warning to the user via the shell. Then we're setting up the pixel dimensions of our preview window. 640 by 480 is a good size for this Raspberry Pi. You can use a higher value for more resolution, but this will result in a smaller FPS.
Then we have a while loop, which will run indefinitely. We create a simple variable S so we can calculate an FPS tally later on in the code, and that gets overlaid onto the preview window. Next, we bump into another if statement, just taking care of what to do if no video feed is coming in, and then we call back to that depth statement. That depth section is performing the heavy lifting of adding the face mesh info to each frame.
This next section is calculating the FPS, which then gets added to our preview window. Then there's one last if statement, which adds the ability to stop the program by pressing the escape key.
The last line releases CV2 software resources from being used and is really just there to stop errors upon repeated use of the script. So with that explored, the next script to open and run in the same way is pose.py.
With it running and the preview open, our Raspberry Pi is now attempting successfully to accurately identify sections of the human body. The script that we're looking at here is going to start by identifying human bodies. Once a human body is identified, it will place dots all over the important joints and face of the identified human. This allows the script to track the movements of the person. The script is constantly outputting to the shell a number between 0 and 1 that represents visibility. If only part of the body is seen, then that visibility number will start getting smaller.
Moving into the script, there are many similarities with the first one. It starts by importing the necessary functionality and then creates a variable called cap that uses the data from the attached camera module. It then sets up details on how we want our mediapipe pose and draw methods to operate. The script then creates an endless loop that processes each frame, identifies the person inside it, and then draws the framework over the top and then sends that frame out to the preview window. Here, it prints out that visibility number and general results to the shell. Just before the end, an if statement occurs. This statement stops the script whenever you press Q.
If you want to know the X, Y coordinates of the center of a face in real-time, you can alter the script by uncommenting just some of the lines. When the Raspberry Pi identifies a human body and can see the nose of the person, it will spit out the X, Y coordinates of that. You can also adjust what part of the body the script focuses on. For example, take a look at the diagram from Mediapipe and see how the script can be altered to track any point on the body. For instance, if I wanted to focus only on my left index, I would replace this here with left underscore index. And I'd make sure to type it all in caps.
Running it now and running it as is, running the script now, you can see it's telling me exactly where my finger is.
This means in a similar vein to the Raspberry Pi hand recognition and finger identification guide linked down below, you're going to be able to identify when certain body parts are above or below others.
This change can then be used as a counter to track certain movements. Or perhaps certain body arrangements could be used to control software or hardware through the GPIO pins.
And that's it for today. Hopefully this provides you with something new to sink your teeth into and a couple new project ideas are running around in your head. If you want any extra information, pop me a message down below or on our Core Electronics forum.
We are full-time makers and here to help. So until next time, stay cozy.
Makers love reviews as much as you do, please follow this link to review the products you have purchased.