In this video we're going to get YOLO pose estimation running on the Raspberry Pi 5. We're going to take a look at the installation process, the Python scripts needed to run it, as well as how to optimize it to get some smoother FPS on the Pi. This is going to be a fun video so let's get right into it. To follow along we're going to need a few things, obviously a Raspberry Pi. We're using the Pi 5 because this will be a fairly processing intensive task. You're also going to need a Pi camera module, we're using the camera module 3 here, and you may also need an adapter cable. The Pi 5 shipped with a smaller camera connector and your camera might not come with the required adapter cable, so it's worth checking. You'll also need all the other essentials like a microSD card to install PiOS onto, a power supply, monitor, keyboard, mouse, all that junk. And you'll of course need a cooling solution for the Pi, we're just using the active cooler here. You'll find links to all of these in the written guide below, where you'll also find the zip file with all the code that you'll need for this video. In terms of hardware assembly, we don't really have much here, just plug the camera into the Pi with the ribbon cable, just ensure that you plug it in the correct way because it only works in one orientation. Also try to be careful with these ribbon cables, they're not made out of paper but they will break if you don't look after them and you bend and curl them too much. And we're also just going to find a way to mount our camera as well.
First of all, you're going to need to use another computer to install PiOS onto the microSD card using Raspberry Pi Imager. Super straightforward process, but if you need help, you'll find it in our written guide below. Once we've installed PiOS, go through the first time setup and ensure that you connect it to the internet, there's nothing fancy you need to do here. Once we're in the desktop, we can get started. First things first, we need to install and set up the Ultralytics YOLO package. If you've already done this in any of our other YOLO guides, you can skip this step and head on right over to here. If not, let's get into it. First things first, we need to set up a virtual environment to install all of our external packages into. And I'm just going to call this one YOLO-pose and then I'm going to enter the workspace like so. If you need a hand working with virtual environments on the Pi, we have a getting started guide link in description for that. Once we're inside of our environment, we can update our package list and just with these two lines, double check that our version of PIP is updated. And once we've done that all, we can finally install our Ultralytics library like so. Now, there is a lot going on in this installation and it can take a good 15, 30, maybe even 45 minutes depending on your setup. So go and grab a cup of tea or whatever you need, but keep an eye on it. Sometimes the installation can run into an issue and you'll see a big wall of red text, but just run that installation line again and it will resume. Once that installation is finished, go ahead and restart your Pi. The next thing we need to do is set up Thonny. Now, the first time that you open it, it will be in this simplified mode. Just click on the top right here and restart it to get back into the regular mode. And then we'll go ahead and get Thonny to work out of that virtual environment that we just created. Again, if you need a hand with this, there is a guide linked below.
Now we're going to go ahead and extract that zip folder to our desktop, just as a convenient location that we can find it in, but you could extract it anywhere. In that folder is going to be our first script pose demo. Go ahead and open that up in Thonny. And if we hit run, it should download a few needed libraries, tools and the model itself. And that's one of the beautiful things about the Ultralytics library. It sets up and downloads everything you need automatically. And after a few seconds, we are up and running with our pose estimation. How cool is that? And as you can see, our Raspberry Pi is calculating the location of these dots on my body, which are called the key points. And it's drawing lines between some of them to figure out the pose of my body. As you can see, it is drawing these dots on my body, and these are called key points. And it's drawing the lines between them to figure out the pose of my body. And if you ever want to close this down, just hit Q. And that is how easy it is to get going with pose estimation. But we can make that better. It's a little bit slow, so let's get into that. First things first, we can change the model that we are running. YOLO pose comes in five different model sizes. We are using the nano size denoted by this N here, but you can change this letter and change the model size as shown by these options here. The larger the model, the more powerful it is, and it can handle more complex posing with greater accuracy. But it will run slower, and I mean a lot slower, as in 0.1ish FPS. For 99% of the cases, the nano model will be good enough, and it runs the fastest, which is a nice bonus. Change that back to nano before we forget. You can also change the model version here. We're currently using YOLO 11, which is the newest and the best model at the time of running this video. But if in the future a new YOLO version is released, say YOLO 11, you should be able to just come to this line, modify it, and it will download the new model, and you're off and away. And you can also go back and use an older model if you want. Just be aware that some models might need the V in front of them. For example, YOLO 8 is called YOLO V8, but YOLO 11 is just YOLO 11, like so. So change the model if you need. Play around with the size. But if you're unsure, just use YOLO 11 nano like we are using in this code.
We can also optimize the model to run better on the Pi and increase our FPS. And the first step of this is to convert it to something called nCNN. It's a format that is more optimized for ARM processors like the Raspberry Pis. To do so, go ahead and open up the script called nCNN conversion in Thonny, like so. As you can see, not much going on here. At the top, we can specify which model we want to convert, and we can set this exactly the same as we just talked about. We can specify pretty much any model. And down here, we're saying we're going to convert it to nCNN and at what resolution. And we'll touch on resolution in just a bit. But for now, leave it at 640. Run the script, and it should download a few more tools it needs. But after downloading what it needs, it should only take a few seconds to convert the model over to the nCNN format. If we look in the folder with our scripts, first of all, you should see the models that we've been downloading. But you should also see this new folder called nCNN underscore model. Go ahead and copy the name of this. And if we return to our normal pose estimation demo, we should be able to punch that right into there, like so. Hit run. And now we're using that model instead of the PyTorch model that we initially downloaded. And look at that. We have a nice little FPS increase, about a four times increase in speed. We are finally starting to get somewhere. There's one more big thing that we can do to increase FPS, and that is to lower the processing resolution. If we give our pose estimation model less pixels to analyze, it's going to be able to analyze it quicker. Now, converting it to nCNN was pretty much a free performance boost, but lowering the resolution does come at a cost. A lower resolution will decrease the accuracy a tiny bit, but the main thing is that it decreases the distance that it can accurately estimate your pose at.
But at the default resolution of 640, that is a pretty far distance. I would expect it to be able to do up to 15, maybe 20 meters. So we can afford to lower a tad bit as it really increases our FPS. Also, a big shout out to Philip from our community forums for helping us getting this resolution and nCNN conversion going at the same time. So head on back to our nCNN conversion script. And all we're going to do is specify the new resolution that we want to export the model as here. And if we click run, we're just going to do 320, for example. It will overwrite the old model like so. And because it overwrote our old folder, we can just use the same name. There's one thing we need to change, though. We need to go down here and ensure that this resolution matches the nCNN conversion resolution, or you're going to run into some issues.
And if we hit run, whoa, look at that. We've nearly quadrupled our FPS yet again just by halving the resolution. Dropping it from 640 to 320 is not that much of a difference. Realistically, in most applications, you're not going to notice a difference in accuracy or really distance unless you're really, really far away from the camera. I can very happily stand at the back of the room, which is about six meters away from the camera. And we can drop it yet again. Let's go down to 160. And you can see, oh, look how smooth that is. We're hitting 100 FPS sometimes, but it's probably averaging out to be about 30, maybe 40 FPS, which is really, really quite smooth. Now for us, 320 is probably good enough, but feel free to go backwards and forwards and lower that resolution as much as you need to get your desired FPS. But just be aware that you may start to lose some accuracy and distance that you can detect things at. So tune it to your needs.
Also note that that resolution needs to be a multiple of 32. So you can't set it to 100. It's not a multiple of 32, but you could set it to 96 or 128. Also, just remember to change that resolution in the demo script to match it, or you're going to run into some detection issues. Now we have pose estimation running and we can get it to a decent FPS. Let's do something with it by using those key points to achieve something. If you run the code for a few frames and then stop it with Q, we can punch in this variable called results. Result zero will get the result of the last frame. And in here is all the information coming out of the YOLO model. And there is a whole bunch of information in here, but the one we're really interested in is this key points. Now you can play around with that if you want, but we wrote another script that lets you easily get this data.
So go ahead and open up key point acquisition. And this code is pretty much identical to the last one, but with the addition of this function here, you input the key point number that you want. And here is a big list of what number corresponds to what point gets detected. And then you say whether you want the X or Y coordinate and it spits that out. So if we go down here, we can see it in use. We call the function get key point position. And we're going to ask for zero, which is the nose. And then we're going to call for the X and then the Y coordinate. And then we're just going to simply print both of them out. And if we run this code, you should see the preview window pop up in a bit and see those numbers being printed out. If I move my nose to the bottom of the screen, it should approach one. And as I move it back up, it should approach zero and then across zero all the way to one on the other side along the X axis there.
And this is the connection between running pose estimation and your project, as we can get where in the frame any of those key pointed body parts are. And from here, you could do, well, anything you really want with that. You could take the position of your arm and control hardware with it, like a servo or something, track the position of your shoulder points and make a push up counter. Or you could use it to control software based things like video games. And if you go ahead and open up SpaceInvaders.py, you'll find one that we prepared earlier. Now, this code is quite long, but all it's doing is taking the X position of your nose and it uses it to calculate where to move the spaceship. Now, this uses something called Pygame. And this video is a tutorial on how to use that. And I'm not an expert in Pygame, but luckily you and I can get access to one.
Large language models like ChatGPT and Claude are really good at coding in Python. And that's how we actually made this game of Space Invaders. All we did was copy and paste the key point acquisitions code into it and asked it to make it for us. So you could very well go online, sign into a large language model like Claude for free, paste in the key points code and tell it to make a game like Breakout. And it should be able to do so. Let's give that code a go. I'm just going to paste it in. And make sure you save it to the same folder before you run it. We're just going to call this Breakout.py. And give it a quick run and see what happens. Oh, look at that. We have Breakout being controlled by my nose in 3D space. Now, chances are it won't be perfect the first time, but you can tell it, hey, I want this to be this color or the ball is too slow. I don't want the camera preview to pop up or whatever you want. It will be able to modify and change it as you need.
It's also really good at breaking down and explaining the code as well if you want. And if the code doesn't work, just copy and paste the error into it and it will try and fix it. Regardless of what your project or use is, you now have a Raspberry Pi that can analyze a camera feed and get information about the position of different body parts. And that is pretty mind-blowing, especially the fact that we can run it with decent FPS on a Raspberry Pi. Well, that about wraps that up. If you need a hand with anything in this video, or you used it to just make something cool and you want to share it, feel free to post about it on our community forums. We're all makers over there and we're happy to help. Also, feel free to check out some of our other computer vision guides on the Raspberry Pi linked below. We do some other cool and wacky stuff like object detection. And as always, until next time, happy making.
Makers love reviews as much as you do, please follow this link to review the products you have purchased.