From an arm's length away, the new AI HAT+ 2 and the original AI HAT+ 1 look pretty similar, but pop the hood, and you'll find there are quite a few differences to get excited about. In this guide, we are going to take a look at what these boards can actually do, compare the technical specs, and most importantly, help you figure out which one is the right fit for your next project. Let's get into it!
YOLO Computer Vision Performance Comparison
To understand the new AI HAT 2, we first need to look at the original AI HAT 1 and explain a bit of technical gobbledegook.
The original AI HAT hit launched in late 2024, coming in two flavours: a 13 TOPS version and a 26 TOPS version. So, what is a TOP? It stands for Terra Operations Per Second, which is just a fancy measurement of number-crunching power. Basically, more TOPS means more power, and the 26 TOP version should be about twice as fast as the 13 TOP version. Both of these HATs run off the Hailo 8 chip, an AI accelerator designed to handle heavy AI computations for only a few watts of power - about 2.5 watts in this case.
The Hailo 8 chip is optimised for one specific thing: Convolutional Neural Networks, or CNNs. This is just a technical term for a specific type of AI architecture used in computer vision. Think of tasks like YOLO object detection, pose estimation, license plate reading, or image segmentation—these are all CNNs. Anything to do with taking an image and analysing the pixels for patterns is likely a CNN. This is where the AI HAT shines; as it is able to take a video feed and identify people, cups, or keyboards in real-time, and with only a few watts of power.
Now, let’s bring in the AI HAT 2. This new board features the upgraded Hailo 10H chip. On the box, you’ll see it boasts 40 TOPS of compute power. While this is technically true, it's not quite that straightforward. That 40 TOPS figure applies when processing generative models (which we will get to later). In the world of CNNs like YOLO, it actually runs at half that speed—around 20 TOPS. For those looking for the technical qualifier, it has 40 TOPs of power at Int4 and 20 TOPs at Int8, which is what CNNs like YOLO often run at.
So, we have the AI HAT 2 at 20 TOPS (for vision), and the older AI HAT 1 at either 13 or 26 TOPS. What does this actually mean for your projects?
Let’s look at an example using the YOLOv8 Medium model—a decently powerful object detection model you’ll commonly encounter in the wild.
If you try to run this model on the Raspberry Pi 5’s CPU alone, the best you can hope for is 1 FPS. You are essentially in slideshow territory, and your CPU is maxed out at 100%, meaning you can't do anything else.
Now, let's look at the HATs (and why they are called AI accelerators):
- AI HAT 1 (13 TOPS): ~ 20 – 25 FPS.
- AI HAT 1 (26 TOPS): ~ 50 FPS.
- AI HAT 2 (20 TOPS): ~ 45 FPS.
If you need something speedier, you could switch to the faster, but slightly less powerful YOLOv8 Small model:
- AI HAT 1 (13 TOPS): ~80 FPS
- AI HAT 1 (26 TOPS): ~200 FPS
- AI HAT 2 (20 TOPS): ~170 FPS
Just quick note on how we got these numbers and what you can expect in the real world. We did not explicitly measure 170 and 200 FPS with YOLOv8 Small in a real world setup - they are estimations. Once you get above 80 to 100 FPS, it becaomes increasingly difficult to benchmark performance. Most webcams can only run at 60 FPS, and even a Raspberry Pi Camera tops out at about 120 FPS at 480p. We attained these estimates by observing the performance differences between the HATs on synthetic benchmarks and more intensive models (like YOLOV8 Extra Large), as well using our years of experience and knowledge working with YOLO.
These numbers should give you an idea of comparative performance, but your mileage may differ, and getting these FPS out of the HATs for these model sizes will be very trick as there is a great deal of software work required. In practice, as long as you can run the model at 60 to 80 FPS, you likely won't see any real world differences. Where this extra power will make a difference is in running the very large and capable models like YOLOV8 Extra Large, where the AI HAT 2 ran at about 25 FPS, and the AI HAT 1 (26 TOP) ran at about 28 FPS.
LLMs on the AI HAT+ 2
Wait a minute... the AI HAT 2 is a bit more expensive than the 26 TOPS AI HAT 1, yet it gets fewer FPS in computer vision tasks? Why on earth would anyone buy that? Well... it’s got a few tricks up its sleeve.
The single most common question we’ve had since the original AI HAT came out is: "How do I run an LLM on this?"
An LLM, or Large Language Model, is the technology behind things like ChatGPT or Gemini. Up until now, we’ve had to tell makers that the AI HAT simply couldn't do it. But with the AI HAT 2, that has changed. It can run LLMs locally and offline on your Pi, and the reason why is that it is literally "built different."
To understand why the old HAT couldn't do it, we have to look at how data moves. The original AI HAT 1 stores the AI model in the Raspberry Pi’s system RAM and sends data back and forth to the HAT through the PCIe connection.
For a computer vision model (like YOLO), which is relatively small, this works perfectly fine. But LLMs are massive memory-hogs. They require huge amounts of data to be "fed" to the processor constantly. The PCIe connection on the Pi is like a thin straw—it just can't jam enough data through fast enough to keep an LLM running, and so it bottlenecks the whole system.
The AI HAT 2 fixes this by having 8GB of dedicated LPDDR4X RAM soldered directly onto the HAT itself. Because the memory is right next to the Hailo-10H chip, it isn't bottlenecked by the "straw" of the PCIe connection. It can just run. This is also where that 40 TOPS number comes from—it’s the compute power available specifically for these generative models (which are Int 4 for the technical crowd).
Now, we have to be realistic here. These local models are not going to be as powerful as the latest and greatest ChatGPT or Gemini models running on massive server farms. Even with a high-end gaming GPU, local models can still struggle with complex facts.
If you ask a model running on the AI HAT 2 to name the capital of every city on earth in the year 1600, or name all the winners of the 1962 World Cup team, it’s probably going to "hallucinate" and make something up. It isn't a replacement for a search engine. However, where it is useful is in processing specific data you give it. We were able to instead feed it a paragraph containing the names of all the World Cup winners, and get it to extract all the player's names, whilst putting it in a CSV list.
The uses for these low-power LLMs might seem a bit niche, but they are incredibly powerful for things like home automation. Instead of writing 50 different "if" statements to catch every possible way someone might ask to turn on a light, you can use the LLM to understand the context.
You could tell your house, "It’s getting a bit dark in here," and the LLM can interpret that "dark" means "raise the lights." It turns messy human language into logic that your Python script can actually use.
VLMs on the AI HAT+ 2?
While local LLMs are cool, we think the absolute standout feature of the AI HAT 2 is its ability to run VLMs, or Visual Language Models, as it is probably more applicable to maker projects.
Think of a VLM as an LLM that has eyes. It can look at an image and understand it much like a human does. Now, this is a bit spooky to see in action, but again it is incredibly practical for maker projects.
With traditional models like YOLO, you have to go through a lengthy training process. If you wanted to detect how many bins you have left on the curb, you’d need to take hundreds of photos of bins, label them, and train the model specifically to identify the pattern of pixels of a bin looks like. With a VLM like Qwen2 VL, you skip all of that. You can just ask it: "How many bins are on the curb?" and it will tell you.
You can get even more specific without any extra work. All of the following questions were accurately answered with the image on the right without any retraining:
- "Is there a bin with a red lid?"
- "Is there a garbage truck in the image?"
- "How many bins are there?"
- "List the colour bins on the curb."
- "Do any of the bins have rubbish in them?"
Because the model "understands" the visual world, it can analyze scenes far better than a standard object detector. This is a game-changer for robotics and home automation. You could have a Python script that takes a photo, sends it to the HAT, and asks, "Is there a car parked in the driveway? Answer only yes or no." You can then use that simple text output to trigger your gate or send a notification.
Now integrating a text output that a VLM can generate into a series of if and else statments can be a bit tricky. This is where one would need to experiment with prompts - both to find a way to integrate something like this in a system, as well as figure out what prompts the VLM can reliably answer. To give an example, we love to test with this clothes line monitor. The idea is that your home automation system might check an API to see that it is going to rain soon. Then it snaps an image of your clothes line through a camera in your backyard, and send it to the AI HAT and VLM with the following prompt:
You are a binary classifier. Your sole task is to determine if the clothes on the line in the image has clothes on it.
Reply with exactly one word: "yes" or "no". Do not output punctuation, explanations, or any other text.
And with variations of the image on the right, we get an output of:
yes
With this, you now have a way to check the output of a VLM with coding logic to perform an action (such as alerting that the clothes are about to get wet!).
So Which HAT Should You Use?
Alrighty, let’s wrap this up. Which AI HAT is actually going to sit on your Pi 5?
- If you are on a budget and just want to run some standard computer vision tasks—like identifying a person walking into a room or counting how many coffee cups are on your desk—the 13 TOPS AI HAT 1 is probably going to be more than enough. It's the cheapest entry point, and getting 25 FPS on a medium-sized model is plenty smooth for most projects.
- If you know for 100% certain that the only thing you care about is computer vision (YOLO, pose estimation, etc.) and you want the absolute best framerate for your dollar, get the 26 TOPS AI HAT 1. You’ll save a bit of money compared to the new version and get that extra bit of performance in vision-specific tasks. But I would only recommend this if you have zero interest in playing with language models.
- However, if you think there is even a slight chance you’ll want to run an LLM or use a VLM to "see" the world more like a human, the AI HAT 2 is the way to go. Yes, it’s a bit more expensive, and yes, it’s a tiny bit slower at YOLO than the 26 TOPS version, but it is the only one that can actually handle these heavier models because it has that dedicated 8GB of RAM on the board.
The best part about all of this is that the models the HAT can run right now are the "worst" they will ever be. The software side of things is moving at breakneck speed. Every few months, someone releases a new model that is able to think harder and run quicker while using the same amount of resources.
You aren't just stuck with what's available today. You can take these new models and put them through Hailo’s conversion process to get them running on your HAT. Now, fair warning: that conversion process is a bit of a headache and definitely not a "fun" afternoon, but it is entirely possible. The vision and language models you'll be running a few years from now are likely going to be way more capable than the ones I’ve shown you today.
If you need a hand with the technical side or you just want to show off a project you’ve managed to get running, feel free to drop a post in the forum below. We are all makers and happy to help. Until next time, happy making!








