绿色检查
链接复制到剪贴板

A guide on tracking moving objects in videos with Ultralytics YOLO models

Learn how computer vision tracking systems work, explore popular models that support object tracking like YOLO11, and discover their real-world applications.

Robots that can assemble electrical parts, systems that catch speeding cars, and smart retail solutions that track how customers shop - all of these innovations rely on computer vision. It's a branch of artificial intelligence (AI) that helps machines analyze and understand images and videos.

For example, a robot needs to recognize and follow different parts to put them together correctly. Similarly, a traffic system can use computer vision to spot cars, read license plates, and figure out when someone is speeding. Meanwhile, in stores, Vision AI can help track what customers are looking at or picking up and can even keep an eye on inventory.

Such applications are powered by computer vision models like Ultralytics YOLO11, which support a wide range of visual tasks. Many of these tasks focus on gathering insights from a single image, but one particularly interesting task, object tracking, can be used to follow the movement of objects across a series of images or video frames.

Fig 1. An example of detecting and tracking cars.

In this guide, we’ll take a closer look at how object tracking works and explore real-world examples of how it’s used. We’ll also discuss how Vision AI models like Ultralytics YOLO11 support object tracking. Let’s get started!

A closer look at computer vision tracking systems

Object tracking is a computer vision task used to follow the movement of objects across video frames, helping systems monitor and understand how things change over time. This is very similar to how humans can naturally follow a moving person or object with their eyes, like when you're watching a tennis match and your eyes track the ball as it moves back and forth across the court.

In the same way, object tracking involves using cameras and AI to follow the ball’s movement in real time. This technology can give viewers at home a better understanding of the game's flow, especially through analytics like speed, trajectory, and player positioning.

While this kind of visual tracking might seem effortless to humans, when it comes to machine vision, it involves a series of steps powered by Vision AI models. Here's a simple breakdown of how object tracking works: 

  • Capturing video: Cameras record video footage, capturing how objects move through a scene over time.
  • Detecting objects: AI-powered computer vision models like YOLO11 can analyze each frame to identify and locate specific objects, such as people, vehicles, or products.
  • Assigning identity: Once an object is detected, tracking algorithms assign it a unique ID to follow it across multiple frames, ensuring the system knows it’s the same object even as it moves.
  • Monitoring movement: The system tracks movement over time, and this data can be used to collect data like speed, direction, and interactions with other objects.
  • Generating insights: This information can be used in real time to provide analytics, assist decision-making, or power visual overlays - depending on the specific use case.

Comparing object detection and tracking with YOLO

Another computer vision task supported by YOLO11 that is closely related to object tracking is object detection. Let’s explore the difference between these two tasks. 

Object detection involves identifying and locating objects of interest within a single image or video frame. For example, a self-driving car uses object detection to recognize a stop sign or a pedestrian in a single frame captured by cameras on board. It answers the question: “What is in this image, and where is it?” However, it doesn’t provide any information about where the object goes next.

Object tracking builds on object detection by adding an understanding of movement over time. The key difference between the two is how they handle time and motion. Object detection treats each frame as an independent snapshot, while object tracking connects the dots between frames, using past data to predict an object’s future position.

By combining both, we can build powerful vision AI systems capable of real-time tracking in dynamic environments. For instance, an automated security system can detect people entering a space and continuously track their movement across the frame.

Real-time tracking using Ultralytics YOLO models

Now that we’ve covered the difference between object detection and tracking, let’s take a look at how Ultralytics YOLO models, like YOLO11, support real-time object tracking.

While YOLO models aren't tracking algorithms themselves, they play an essential role by detecting objects in each video frame. Once objects are detected, tracking algorithms are needed to assign unique IDs to them, allowing the system to follow their movement from frame to frame. 

To address this need, the Ultralytics Python package seamlessly integrates object detection with popular tracking algorithms like BoT-SORT and ByteTrack. This integration enables users to run detection and tracking together with minimal setup.

When using YOLO models for object tracking, you can choose which tracking algorithm to apply based on the requirements of your application. For example, BoT-SORT is a good option for following objects that move unpredictably, thanks to its use of motion prediction and deep learning. ByteTrack, on the other hand, performs especially well in crowded scenes, maintaining reliable tracking even when objects are blurry or partially hidden.

Fig 2.  The Ultralytics Python package seamlessly integrates BoT-SORT and ByteTrack.

Applications of object tracking with Ultralytics YOLO

Now that we have a better understanding of what object tracking is and how it works, let’s explore some real-world applications where this technology is making an impact.

Real-time tracking using Ultralytics YOLO for speed estimation

Speed estimation systems enabled by computer vision depend on tasks like object detection and tracking. These systems are designed to calculate how fast an object is moving - whether it’s a vehicle, a cyclist, or even a person. This information is crucial for a variety of applications, from traffic management to safety monitoring and industrial automation.

Using a model like Ultralytics YOLO11, objects can be detected and tracked across video frames. By analyzing how far an object moves over a specific period of time, the system can estimate its speed. 

Fig 3. Using YOLO11’s support for object tracking for speed estimation.

Exploring object tracking in manufacturing

Manufacturing processes can be fast-paced and highly complex, making it difficult to keep track of every item being produced manually. Object tracking offers a good solution for automating the monitoring of products as they move through each stage of production. It can help factories maintain high levels of accuracy and efficiency without slowing things down.

From counting products on a conveyor belt to spotting defects or verifying proper assembly, object tracking brings visibility and control to tasks that would otherwise be time-consuming or error-prone. This technology is especially impactful in high-volume industries like food processing, electronics, and packaging, where speed and precision are critical.

Fig 4. An example of tracking and counting food products on an assembly line using YOLO11.

An overview of object tracking in retail analytics

Countless customers walk in and out of retail stores every day, and understanding their behavior is key to improving both the customer experience and business performance. Object tracking makes it possible for retailers to monitor foot traffic, measure dwell time, and analyze movement patterns - all without needing invasive or manual methods.

By tracking individuals as they enter, exit, and move throughout the store, businesses can gain insights into peak hours, popular areas, and even queue lengths. These insights can inform decisions around staffing, store layout, and inventory placement, ultimately leading to more efficient operations and increased sales.

Fig 5. Using YOLO11’s object-tracking abilities to monitor people entering and exiting a store.

Pros and cons of object tracking

From retail stores to factory floors, object tracking is being used in all kinds of industries to improve factors like efficiency, safety, and the overall experience. Here are some of the key benefits that object tracking can bring to various industries:

  • Enables real-time alerts: Systems integrated with object tracking can be configured to trigger alerts automatically when something unusual is detected, such as a person entering a restricted area or a delivery being left too long in one place.
  • Integrates with other systems: Object-tracking data can be combined with other technologies, like facial recognition, thermal cameras, or inventory systems, for even more powerful insights.
  • Cost-effective in the long run: While initial setup may require investment, automated tracking reduces the need for manual labor, lowers error rates, and cuts down operational costs over time.

While these benefits highlight how object tracking positively impacts different use cases, it's also important to consider the challenges involved in its implementation. Let’s take a closer look at some limitations of object tracking:

  • Difficulty in crowded environments: In busy settings like concerts, shopping centers, or city streets, tracking systems may struggle to distinguish between people or objects that are close together, leading to confusion or inaccurate results.
  • Sensitive to environmental conditions: Poor lighting, fog, fast motion, or camera shake can affect the system’s ability to track objects accurately, especially in outdoor or uncontrolled environments.
  • Privacy and legal concerns: Improper handling of personal data, lack of user consent, or surveillance in public spaces may raise ethical issues and lead to non-compliance with privacy laws.

主要收获

Object tracking is a computer vision task that lets machines follow the movement of objects over time. It’s used in a wide range of real-world scenarios - from estimating vehicle speed and counting products on an assembly line to analyzing player movements in sports.

With Vision AI models like YOLO11 and tracking algorithms such as BoT-SORT and ByteTrack, object tracking has become faster, smarter, and more accessible across different industries. As object-tracking technology evolves, it’s helping systems become more intelligent, efficient, and responsive, one frame at a time.

Want to learn more about computer vision and AI? Explore our GitHub repository, connect with our community, and check out our licensing options to jumpstart your computer vision project. If you're exploring innovations like AI in manufacturing and computer vision in the automotive industry, visit our solutions pages to discover more. 

LinkedIn 徽标Twitter 徽标Facebook 徽标复制链接符号

在此类别中阅读更多内容

让我们共同打造人工智能的未来

开始您的未来机器学习之旅