YOLO Pose Estimation on the Raspberry Pi AI HAT | Writing Python Scripts

Home
Videos
YOLO Pose Estimation on the Raspberry Pi AI HAT | Writing Python Scripts

In this video, we will be setting up and using YOLO pose estimation with the Raspberry Pi AI HAT, as well as exploring how to use it alongside your own Python code so you can use pose estimation in your projects. We will be taking a look at how to install the required hardware and firmware as well as how to set up and use the pose estimation pipelines. The result of this guide will have you equipped with an understanding of this whole setup, as well as three different example scripts we have prepared. With these demo scripts, we will be using gestures to control media players, controlling servos with the angles of our arms, and playing a game of Fruit Ninja with our bodies.

Transcript

In this video, we'll be setting up the AI HAT to run pose estimation in Python. We'll guide you through the process of setting up the HAT from scratch, installing the required Python pipelines, and exploring some demo scripts for exciting applications like controlling hardware or even a game like Fruit Ninja. More importantly, we'll show you how to apply pose estimation in your own Python projects. To follow along, you'll need a Pi5, a 2GB or larger model will suffice. You'll also need an AI hat; this guide works with both the 13 and 26 top versions. A camera module is necessary; we're using the V3 module, and you might need a camera cable adapter. The Pi5 has a smaller camera connector, and your camera might not include the needed adapter, so double-check this. Links to these items and a written guide with all the commands and code are provided below. Installing the HAT is straightforward, but be aware of the header extender that lengthens your pins. After installing the hat, these pins might not fully poke through. If you need these pins exposed for hardware, there's a link below to longer pins that resolve this issue. To install your hat, first attach the header extension to the Pi's GPIO pins. Then, screw on the four standoffs that come with the hat. Lift the PCIe tab on the Pi, insert the HAT cable so it sits squarely, and push down on the tab to secure it. Gently slide the HAT onto the pins, being careful not to bend them. Connect the camera cable to the camera using the same tab locking system, then connect the camera to the Pi. Secure the HAT with four screws, and you're ready to proceed.

Next, install PiOS onto the microSD card, insert it into your Pi, and complete the first-time installation. There's nothing special required in this process. Once you're on the desktop, open a new terminal window to update your Pi with Update and Upgrade. Install all the necessary drivers and software for the hat, which might take about five minutes, so feel free to grab a beverage. After installation, restart your Pi. Now, install the basic Python pipelines from Halo's GitHub. Clone the repository from GitHub using a simple command. Once downloaded, navigate to your Pi's home folder to find the downloaded folder, which will house all our Python scripts and activities for this video. Return to the terminal and change the directory to this folder using a command. The terminal's blue text indicates it's now working in this folder. Run the installation file provided by Halo to set up the remaining components. This process might also take around five minutes. If you change the hat, such as switching from a 13 to a 26 top hat, rerun this command for the Pi to recognize it correctly. After installation, reboot your Pi again. With everything set up, explore the Halo's example folder, which contains two crucial folders: the resources folder with YOLO models converted to the HEF file format for the AI hats, and the basic pipelines folder with pipeline code and demo codes.

A pipeline in this context is a code series that facilitates interaction with the hat. While you can write custom pipelines for specific models or projects, we'll use the provided example pipeline. Run the pipeline to test pose estimation. Note that Halo's setup requires running Python scripts from the command line, and you need to set up the terminal beforehand. Use the change directory command to set the terminal to work in the desired directory, then run a source command to activate the virtual environment set up by the install command. If you close the terminal or restart your Raspberry Pi, repeat these steps before running Python scripts again. Execute the Python script in the basic pipelines folder named pose_estimation.py. It takes a moment for the HAT to boot up, but you'll see that our setup is functioning, and pose estimation is running. To stop the script, select the terminal window and press control C. To explore different inputs, run the same command with --help at the end, revealing various options for the pose estimation pipeline. You can specify --input and rpi to use one of the Pi's camera modules. Clear the terminal for better visibility, then press up to return to the last command. Replace help with input --rpi, and after a brief boot-up, pose estimation runs on the camera. The yellow model estimates the position of key points on the body, drawing lines to determine orientation. These key points are crucial for further applications. A peculiar artifact occurs when the model estimates unseen legs, but this is hardcoded and unchangeable. However, it's not an issue when fully visible in the frame. With pose estimation running, delve into the code to understand its workings. In the basic pipelines folder, find the Python file pose_estimation, the code we're executing. This file is also where you can modify and implement your own computer vision projects. The code may seem complex with many moving parts, but it's essential for customizing your applications.

We went ahead and repackaged the code to make it more user-friendly and easy to use. However, if you prefer the original code, we have a breakdown in our object recognition video. Although it's for object recognition, it's similar enough to get you started. To use our repackaged code, refer to the written guide linked below, where you'll find the first set of demo code. Create a new script, paste the code, and save it in the same basic pipelines folder. You can name it anything, but ensure it has the ".py" extension. We'll call this "pose_simple." The code includes a regular import section for all necessary imports. The custom processing thread function is where you place the rest of your code, such as setting up a server or defining constants. The while true loop is nested inside this function, allowing the use of threading, which lets two parts of your Python code run simultaneously. There's an additional 200 lines of code at the end, handling the AI HAT and pose estimation, which runs in the background while your custom code pulls the latest pose estimation data. The script is designed so you don't have to modify that part, and it operates independently.

Within the while true loop, we've created two functions to simplify the process: get_body_part_coordinates and calculate_body_part_angle. The first function returns the position of a specified body part on the screen, and you can find a list of body part names further down in the code. The second function calculates the angle between three positions, such as the left shoulder, left elbow, and left wrist. Note that the angle is measured clockwise from the camera's perspective, with 90, 180, 270, and 360-degrees corresponding to specific positions. The code starts with a two-second sleep, which is not essential but recommended to allow the HAT to boot up and start processing data. To run the new script, use the same command as before, but change the file name from pose_estimation to pose_simple. Running the script will display pose estimation data in the shell, including angle data and the position of the left wrist, with X and Y coordinates using relative values. The X-axis ranges from 0 to 1 across the screen, and the Y-axis ranges from 0 at the top to 1 at the bottom. This information should be sufficient to implement the code in your projects.

To enhance your understanding, we created some short projects using this code. Let's address the annoying FPS readout in the shell, which clutters the feed. In the basic pipelines folder, open the RPi Commons folder using Thonny. This folder contains code controlling fundamental behavior. Be cautious when modifying anything here, as it can break the pipeline. Changes will affect both pose estimation and object detection code. Locate the onFPSMeasurement function and comment out the line printing the FPS. You can also change the camera resolution, though it won't affect YOLO's processing resolution. Stick to standard sizes, like 1920x1080, for a balanced image size and smooth FPS. Now, let's create custom code to solve a practical issue. In my workshop, I might need to pause a podcast or music on YouTube while on the other side of the room. We'll use pose estimation to pause and play the video by raising hands above the head, an uncommon gesture to avoid accidental pauses. We'll use the Wtype library to simulate keystrokes with Python code. This external library must be installed in the virtual environment. With the terminal set up, simply enter the installation command. The second demo code is available on our written page. Create a new script, paste the code, and save it in the pipelines folder with the other scripts.

The code is straightforward. It imports subprocess to run Wtype elegantly. In the while true loop, it retrieves the positions of the left wrist, right wrist, and nose, storing X and Y coordinates in variables. The code checks if the Y coordinate of the left wrist is lower than the nose, and the same for the right wrist. If both conditions are met, Wtype simulates pressing the K button, which pauses YouTube. To prevent continuous play-pause actions, the code includes a two-second sleep after detecting the gesture.

That just gives us two seconds to lower the hands before it pauses or plays it again. We're just gonna go ahead and run that. We need to change the name to postkeyboard.py, which is the name that we saved it under. If we run that, we're just gonna use our object detection AI HAT guide as an example. You should check it out if you haven't yet. I should be able to raise my hands, and it pauses. If I raise my hands again, it plays again. As you can see, that gives us like a two-second window here to lower our hands before it does the thing again. It's quite responsive as well, like it only needs to see a frame above my head for it to play and pause. We're in the two-second window there.

All right, second lot of sample code. We're gonna go ahead and control some servos with the position of our body parts. I've gone ahead and BluTex some servos and Legos together. It's a little bit jank, but it demonstrates what you can do with it. Codes on the course page, new script, paste it in, save it. And here we are. Really straightforward. We import our servo library up here as you usually do. Then in this part of the code that runs once at the top of our processing thread, we set up our elbow and our shoulder servo. Very importantly, we keep our two seconds sleep in there to let the HAT set up. Then we come into our while true loop and we grab the elbow angle by, if we scroll over, getting the angle between the left shoulder, the left elbow, and the left wrist. We then grab the shoulder angle by grabbing the right shoulder, left shoulder, and elbow. This kind of angle here. After that, we have these bits of code here. This is essentially assuring that our angles stick between zero and 180. Because if we tried to feed 190-degrees into the servo library, it might not like it. You can extend your arm beyond 180 if I did like that. So we get the angle from YOLO, ensure it's between zero and 180, print them out, and then we just simply send it to the servo. We have a little bit of a sleep on there. It's a bit jittery. My servos are a little bit noisy. We could definitely benefit from some filtering here. My servos are a little jittery, and the data coming out of the YOLO model is a little bit jittery as well. But it works. That's the end of it.

All right. Final code. You know the drill. New script. Copy, paste. This one is pretty darn cool. We created a game of Fruit Ninja, that really old mobile game, using the wrist key points as the blade to cut the fruits. And by "we created," I mean Claude and me. Claude is a large language model like ChatGPT. I just copy and pasted that initial sample code into Claude and asked it to make a game of Fruit Ninja with the pose estimation data. We've also done some similar things in the past with pose estimation, like making a game of Space Invaders and Breakout. It can be a bit of a tricky thing, and it took a few attempts to get it right. But after sorting out some issues, we were able to get this working. This code is really dense and a little hard to understand, so I wouldn't expect anyone to dive into this to try and figure out how it works. But we just included it because it's a fun demo. It really shows what you can do with this sort of stuff, where you can take it, and how you can use LLMs to create some really cool stuff with pose estimation. It's really disorientating because it's your wrists, not your hands or your fists, so it's not a body part you're used to orientating in 3D space. It does also speed up and get harder the longer the game goes. Well, I think that about wraps that up. Hopefully, this guide has gotten those creative juices flowing of what you can do with this. It's also showing you how to do it. If you use this guide to make something cool, or you just need a hand with anything we covered in this video, we have a maker forum, which you'll find at the bottom of the written guide. You can ask your questions, whatever you need there. We're all makers, and we're happy to help. Till next time though, happy making.

Transcript

Comments

Follow us on instagram

About Us

Resources

Related Content

Transcript

Comments

Follow us on instagram

About Us

Resources