Learn how to use the Ultralytics YOLO11 model for accurate pose estimation. We'll cover real-time inferencing and custom model training for various applications.
Research related to computer vision, a branch of artificial intelligence (AI), can be traced back to the 1960s. However, it wasn’t until the 2010s, with the rise of deep learning, that we saw major breakthroughs in how machines understand images. One of the latest advancements in computer vision are the cutting-edge Ultralytics YOLO11 models. The YOLO11 models, first introduced at Ultralytics’ annual hybrid event, YOLO Vision 2024 (YV24), support a range of computer vision tasks, including pose estimation.
Pose estimation can be used to detect key points on a person or object in an image or video to understand their position, posture, or movement. It’s widely used in applications like sports analytics, animal behavior monitoring, and robotics to help machines interpret physical actions in real time. Thanks to its improved accuracy, efficiency, and speed over earlier models in the YOLO (You Only Look Once) series, YOLO11 is well-suited for real-time pose estimation tasks.
In this article, we’ll explore what pose estimation is, discuss some of its applications, and walk through how you can use YOLO11 with the Ultralytics Python package for pose estimation. We’ll also take a look at how you can use Ultralytics HUB to try out YOLO11 and pose estimation in a few simple clicks. Let’s get started!
Before we dive into how to use the new Ultralytics YOLO11 model for pose estimation, let’s get a better understanding of pose estimation.
Pose estimation is a computer vision technique used to analyze the pose of a person or object in an image or video. Deep learning models like YOLO11 can identify, locate, and track key points on a given object or person. For objects, these key points might include corners, edges, or distinct surface markings, whereas for humans, these key points represent major joints like the elbow, knee, or shoulder.
Pose estimation is unique and more complex when compared to other computer vision tasks like object detection. While object detection locates objects in an image by drawing a box around them, pose estimation goes further by predicting the exact positions of key points on the object.
When it comes to pose estimation, there are two main ways it works: bottom-up and top-down. The bottom-up approach detects individual key points and groups them into skeletons, while the top-down approach focuses on first detecting objects and then estimating key points within them.
YOLO11 combines the strengths of both top-down and bottom-up methods. Like the bottom-up approach, it keeps things simple and fast without needing to group key points manually. At the same time, it uses the accuracy of the top-down method by detecting people and estimating their poses in a single step.
The versatile capabilities of YOLO11 for pose estimation open up a wide range of possible applications in many industries. Let’s take a closer look at some pose estimation use cases of YOLO11.
Safety is an important aspect of any construction project. This is especially true, since statistically, construction sites see a higher number of work-related injuries. In 2021, about 20% of all work-related fatal injuries occurred on or near construction sites. With daily risks like heavy equipment and electrical systems, strong safety measures are essential to keeping workers safe. Traditional methods like using signs, barricades, and manual monitoring by supervisors aren’t always effective and often take supervisors away from more critical tasks.
AI can step in to improve safety, and the risk of accidents can be reduced by using a pose estimation-based worker monitoring system. Ultralytics YOLO11 models can be used to track workers' movements and postures. Any potential hazards like workers standing too close to dangerous equipment or performing tasks incorrectly can be quickly spotted. If a risk is detected, supervisors can be notified, or an alarm can alert the worker. A continuous monitoring system can make construction sites safer by always being on the lookout for hazards and protecting workers.
Farmers and researchers can use YOLO11 to study the movement and behavior of farm animals, like cattle, to detect early signs of diseases such as lameness. Lameness is a condition where an animal struggles to move properly due to pain in its legs or feet. In cattle, illnesses like lameness not only affect their health and welfare but also lead to production issues on dairy farms. Studies show that lameness affects between 8% of cattle in pasture-based systems and 15% to 30% in confined systems across the global dairy industry. Detecting and addressing lameness early can help improve animal welfare and reduce the production losses associated with this condition.
YOLO11’s pose estimation features can help farmers track the animal’s gait patterns and quickly identify any abnormalities that might signal health problems, such as joint issues or infections. Catching these problems early allows for faster treatment, reducing the animals’ discomfort and helping farmers avoid economic losses.
Vision AI enabled monitoring systems can also help analyze resting behavior, social interactions, and feeding patterns. Farmers can also use pose estimation to get observations on signs of stress or aggression. These insights can be used to cultivate better living conditions for animals and increase their well-being.
Pose estimation can also help people improve their posture in real time while working out. With YOLO11, gym and yoga instructors can monitor and track the body movements of people working out, focusing on key points like joints and limbs to assess their posture. The data collected can be compared to ideal poses and workout techniques, and instructors can receive alerts if someone is performing a move incorrectly, helping to prevent injuries.
For example, during a yoga class, pose estimation can help monitor whether all students are maintaining proper balance and alignment. Mobile applications integrated with computer vision and pose estimation can make fitness more accessible for people working out at home or those without access to personal trainers. This continuous real-time feedback helps users improve their technique and achieve their fitness goals while reducing the risk of injury.
Now that we’ve explored what pose estimation is and discussed some of its applications. Let’s take a look at how you can try out pose estimation with the new YOLO11 model. To get started, there are two convenient ways to do this: using the Ultralytics Python package or through Ultralytics HUB. Let’s take a look at both options.
Running an inference involves the YOLO11 model processing new data outside of its training sets and using the patterns it learned to make predictions based on that data. You can run inferences through code with the Ultralytics Python package. All you need to do to get started is install the Ultralytics package using pip, conda, or Docker. If you face any challenges during installation, our Common Issues Guide offers helpful troubleshooting tips.
Once you’ve installed the package successfully, the following code outlines how to load a model and use it to predict poses of objects in an image.
Let's say you are working on a computer vision project and you have a specific dataset for a particular application involving pose estimation. Then you can fine-tune and train a custom YOLO11 model to suit your application. For example, you can use a dataset of keypoints to analyze and understand the pose of a tiger in images by identifying key features such as the position of its limbs, head, and tail.
You can use the following code snippet to load and train a YOLO11 pose estimation model. The model can be built from a YAML configuration, or you can load a pre-trained model for training. This script also lets you transfer weights and start training the model using a specified dataset, such as the COCO dataset for pose estimation.
Using the newly trained custom model, you can run inferences on unseen images related to your computer vision solution. The trained model can also be converted to other formats using the export mode.
So far, we looked at methods to use YOLO11 requiring some basic coding knowledge. If that’s not what you’re looking for, or you’re not familiar with coding, there’s another option: Ultralytics HUB. Ultralytics HUB is a user-friendly platform designed to simplify the process of training and deploying YOLO models. HUB lets you easily manage datasets, train models, and deploy them without the need for technical expertise.
To run inferences on images, you can create an account, navigate to the ‘Models’ section, and choose the YOLO11 pose estimation model you’re interested in. In the preview section, you can upload an image and view the prediction's results as shown below.
Ultralytics YOLO11 offers accurate and flexible solutions for tasks like pose estimation across a wide range of applications. From improving the safety of workers on construction sites to monitoring livestock health and assisting with posture correction in fitness routines, YOLO11 brings precision and real-time feedback through advanced computer vision technology.
Its versatility, with multiple model variants and the ability to custom train for specific use cases, makes it a very valuable tool for developers and businesses alike. Whether through coding with the Ultralytics Python package or using the Ultralytics HUB for easier implementation, YOLO11 makes pose estimation accessible and impactful.
To explore more, visit our GitHub repository, and engage with our community. Explore AI applications in manufacturing and agriculture on our solutions pages. 🚀
Begin your journey with the future of machine learning