Green check
Link copied to clipboard

Enhancing hand keypoints estimation with Ultralytics YOLO11

Explore AI-driven hand keypoints estimation with Ultralytics YOLO11's support for pose estimation in applications like real-time gesture recognition.

Recently, the sign language interpreters at the Super Bowl have gained a lot of attention. When you watch them sing your favorite artist’s song on TV, you can understand them if you know sign language because your brain processes their hand movements. But what if a computer could do the same? Thanks to AI-driven hand-tracking solutions, it’s possible for machines to track and interpret hand movements with impressive accuracy.

At the core of these solutions is computer vision, a subfield of AI that enables machines to process and understand visual information. By analyzing images and videos, Vision AI helps them to detect objects, track movements, and recognize complex gestures with remarkable accuracy.

For example, computer vision models like Ultralytics YOLO11 can be trained to detect and analyze hand keypoints in real time using pose estimation. By doing so, these models can be used for applications like gesture recognition, sign language translation, and AR/VR interactions. 

In this article, we’ll explore how YOLO11 enables AI-based hand tracking, the datasets used for training, and how to custom-train a model for hand pose estimation. We’ll also look at real-world applications. Let’s get started!

Understanding AI-based hand keypoints detection

AI can be used to recognize and track hand movements in visual data by identifying keypoints like the wrist, fingertips, and finger joints. One approach, known as pose estimation, helps computers understand human movement by mapping keypoints and analyzing how they change over time. This allows AI systems to interpret body posture, gestures, and motion patterns with high accuracy.

Computer vision models make this possible by analyzing images or videos to identify keypoints on the hand and track their movement. Once these points are mapped, AI can recognize gestures by analyzing the spatial relationships between keypoints and how they change over time. 

For example, if the distance between a thumb and index finger decreases, AI can interpret it as a pinching motion. Similarly, tracking how keypoints move in sequences helps identify complex hand gestures and even predict future movements.

Fig 1. An example of recognizing the keypoints in a hand using computer vision.

Interestingly, pose estimation for hand tracking has opened up exciting possibilities, from hands-free control of smart devices to improved robotic precision and assistance in healthcare applications. As AI and computer vision continue to evolve, hand tracking will likely play a bigger role in making technology more interactive, accessible, and intuitive in everyday life.

Exploring YOLO11 for pose estimation

Before we dive into how to create a solution for AI-based hand tracking, let's take a closer look at pose estimation and how YOLO11 supports this computer vision task. Unlike standard object detection, which identifies entire objects, pose estimation focuses on detecting key landmarks - such as joints, limbs, or edges - to analyze movement and posture. 

Specifically, Ultralytics YOLO11 is designed for real-time pose estimation. By leveraging both top-down and bottom-up methods, it efficiently detects people and estimates keypoints in one step, outperforming previous models in speed and accuracy.

Out of the box, YOLO11 comes pre-trained on the COCO-Pose dataset and can recognize keypoints on the human body, including the head, shoulders, elbows, wrists, hips, knees, and ankles. 

Fig 2. Using YOLO11 for human pose estimation.

Beyond human pose estimation, YOLO11 can be custom-trained to detect keypoints on a variety of objects, both animate and inanimate. This flexibility makes YOLO11 a great option for a wide range of applications.

An overview of the Hand Keypoints dataset

The first step in custom-training a model is gathering data and annotating it or finding an existing dataset that fits the project’s needs. For example, the Hand Keypoints dataset is a good starting point for training Vision AI models for hand tracking and pose estimation. With 26,768 annotated images, it eliminates the need for manual labeling. 

It can be used to train models like Ultralytics YOLO11 to quickly learn how to detect and track hand movements. The dataset includes 21 keypoints per hand, covering the wrist, fingers, and joints. Also, the dataset’s annotations were generated with Google MediaPipe, a tool for developing AI-powered solutions for real-time media processing, ensuring precise and reliable keypoint detection. 

Fig 3. The 21 keypoints included in the Hand Keypoints dataset.

Using a structured dataset like this saves time and lets developers focus on training and fine-tuning their models instead of collecting and labeling data. In fact, the dataset is already divided into training (18,776 images) and validation (7,992 images) subsets, making it easy to evaluate model performance. 

How to train YOLO11 for hand pose estimation

Training YOLO11 for hand pose estimation is a straightforward process, especially with the Ultralytics Python package, which makes setting up and training the model easier. Since the Hand Keypoints dataset is already supported in the training pipeline, it can be used right away without extra formatting, saving time and effort.

Here’s how the training process works:

  • Set up the environment: The first step is to install the Ultralytics Python package.
  • Load the Hand Keypoints dataset: YOLO11 supports this dataset natively, so it can be downloaded and prepared automatically.
  • Use a pre-trained model: You can start with a pre-trained YOLO11 pose estimation model, which helps improve accuracy and speeds up the training process.
  • Train the model: The model learns to detect and track hand keypoints by going through multiple training cycles.
  • Monitor performance: The Ultralytics package also provides built-in tools to track key metrics like accuracy and loss, helping ensure the model improves over time.
  • Save and deploy: Once trained, the model can be exported and used for real-time hand tracking applications.

Evaluating your custom-trained model

Going through the steps of creating a custom model, you’ll notice that monitoring performance is essential. Along with tracking progress during training, evaluating the model afterward is crucial to make sure it accurately detects and tracks hand keypoints. 

Key performance metrics like accuracy, loss values, and mean average precision (mAP) help assess how well the model performs. The Ultralytics Python package provides built-in tools to visualize results and compare predictions with real annotations, making it easier to spot areas for improvement.

To better understand the model's performance, you can check evaluation graphs such as loss curves, precision-recall plots, and confusion matrices, which are automatically generated in the training logs. 

These graphs help identify issues like overfitting (when the model memorizes training data but struggles with new data) or underfitting (when the model fails to learn patterns well enough to perform accurately) and guide adjustments to improve accuracy. Also, testing the model on new images or videos is important to see how well it works in real-world scenarios.

Applications of AI-driven hand-tracking solutions

Next, let’s walk through some of the most impactful applications of hand keypoints estimation with Ultralytics YOLO11.

Real-time gesture recognition with YOLO11

Let’s say you could adjust the volume on your TV by simply waving your hand or navigate a smart home system with a simple swipe in the air. Real-time gesture recognition powered by YOLO11 makes these touch-free interactions possible by accurately detecting hand movements in real time. 

This works by using AI cameras to track key points on your hand and interpret gestures as commands. Depth-sensing cameras, infrared sensors, or even regular webcams capture hand movements, while YOLO11 can process the data to recognize different gestures. For example, such a system can tell the difference between a swipe to change a song, a pinch to zoom in, or a circular motion to adjust volume.

AI-based hand keypoints detection for sign language recognition

AI solutions for hand-tracking can support seamless communication between a deaf person and someone who doesn’t know sign language. For example, smart devices integrated with cameras and YOLO11 can be used to instantly translate sign language into text or speech. 

Thanks to advancements like YOLO11, sign language translation tools are becoming more accurate and accessible. This impacts applications like assistive technology, live translation services, and educational platforms. AI can help bridge communication gaps and promote inclusivity in workplaces, schools, and public spaces.

Computer vision for hand tracking: Improving AR and VR experiences

Have you ever played a virtual reality (VR) game where you could grab objects without using a controller? Hand tracking powered by computer vision makes this possible by allowing users to interact naturally in augmented reality (AR) and VR environments. 

Fig 4. Hand tracking is a key part of AR and VR applications.

With hand keypoints estimation using models like Ultralytics YOLO11, AI tracks movements in real-time, enabling gestures like pinching, grabbing, and swiping. This enhances gaming, virtual training, and remote collaboration, making interactions more intuitive. As hand-tracking technology improves, AR and VR will feel even more immersive and lifelike. 

Key takeaways

Hand keypoints estimation with Ultralytics YOLO11 is making AI-driven hand-tracking solutions more accessible and reliable. From real-time gesture recognition to sign language interpretation and AR/VR applications, computer vision is opening up new possibilities in human-computer interaction.

Also, streamlined custom training and fine-tuning processes are helping developers build efficient models for various real-world uses. As computer vision technology evolves, we can expect even more innovations in areas like healthcare, robotics, gaming, and security.

Engage with our community and explore AI advancements on our GitHub repository. Discover the impact of AI in manufacturing and computer vision in healthcare through our solutions pages. Explore our licensing plans and begin your AI journey today!

Facebook logoTwitter logoLinkedIn logoCopy-link symbol

Read more in this category

Let’s build the future
of AI together!

Begin your journey with the future of machine learning