Green check
Link copied to clipboard

YOLO12 explained: Real-world applications and use cases

Discover YOLO12, the latest computer vision model! Learn how its attention-centric architecture and FlashAttention technology enhance object detection tasks across industries

Computer vision is a branch of artificial intelligence (AI) that helps machines understand images and videos. It is a field that is advancing at an incredible pace because AI researchers and developers are constantly pushing the limits. The AI community is always aiming to make models faster, smarter, and more efficient. One of the latest breakthroughs is YOLO12, the newest addition to the YOLO (You Only Look Once) model series, released on February 18th, 2025.

YOLO12 was developed by researchers from the University at Buffalo, SUNY (State University of New York), and the University of Chinese Academy of Sciences. In a unique new approach, YOLO12 introduces attention mechanisms, allowing the model to focus on the most essential parts of an image rather than processing everything equally. 

It also features FlashAttention, a technique that speeds up processing while using less memory, and an area attention mechanism, designed to mimic the way humans naturally focus on central objects.

These improvements make YOLO12n 2.1% more accurate than YOLOv10n and YOLO12m +1.0% more accurate than YOLO11m. However, this comes with a tradeoff - YOLO12n is 9% slower than YOLOv10n, and YOLO12m is 3% slower than YOLO11m.

Fig 1. An example of YOLO12 being used to detect objects.

In this article, we’ll explore what makes YOLO12 different, how it compares to previous versions, and where it can be applied.

The road to the release of YOLO12

The YOLO model series is a collection of computer vision models designed for real-time object detection, meaning they can quickly identify and locate objects in images and videos. Over time, each version has improved in terms of speed, accuracy, and efficiency.

For instance, Ultralytics YOLOv5, released in 2020, became widely used because it was fast and easy to custom-train and deploy. Later, Ultralytics YOLOv8 improved on this by offering additional support for computer vision tasks like instance segmentation and object tracking. 

More recently, Ultralytics YOLO11 focused on improving real-time processing while maintaining a balance between speed and accuracy. For example, YOLO11m had 22% fewer parameters than YOLOv8m, yet still delivered better detection performance on the COCO dataset, a widely used benchmark for evaluating object detection models.

Building on these advancements, YOLO12 introduces a shift in how it processes visual information. Rather than treating all parts of an image equally, it prioritizes the most relevant areas, improving detection accuracy. Simply put, YOLO12 builds on previous improvements while aiming to be more precise.

Key features of YOLO12

YOLO12 introduces several improvements that enhance computer vision tasks while keeping real-time processing speeds intact. Here's an overview of YOLO12’s key features:

  • Attention-centric architecture: Instead of treating every part of an image equally, YOLO12 focuses on the most important areas. This improves accuracy and cuts down on unnecessary processing, making detection sharper and more efficient, even in cluttered images.
  • FlashAttention: YOLO12 speeds up image analysis while using less memory. With FlashAttention (a memory-efficient algorithm), it optimizes data handling, reducing hardware strain and making real-time tasks smoother and more reliable.
  • Residual Efficient Layer Aggregation Networks (R-ELAN): YOLO12 organizes its layers more efficiently using R-ELAN, which improves how the model processes and learns from data. This makes training more stable, object recognition sharper, and computing requirements lower, so it runs efficiently across different environments.

To understand how these features work in real life, consider a shopping mall. YOLO12 can help track shoppers, identify store decorations like potted plants or promotional signs, and spot misplaced or abandoned items. 

Its attention-centric architecture helps it focus on the most important details, while FlashAttention ensures it processes everything quickly without overloading the system. This makes it easier for mall operators to improve security, organize store layouts, and enhance the overall shopping experience.

Fig 2. Detecting objects in a shopping mall using YOLO12.

However, YOLO12 also comes with some limitations to consider:

  • Slower training times: Due to its architecture, YOLO12 requires more training time compared to YOLO11.
  • Export challenges: Some users may encounter difficulties when exporting YOLO12 models, particularly when integrating them into specific deployment environments.

Understanding YOLO12’s performance benchmarks

YOLO12 comes in multiple variants, each optimized for different needs. Smaller versions (nano and small) prioritize speed and efficiency, making them ideal for mobile devices and edge computing. The medium and large versions strike a balance between speed and accuracy, while YOLO12x (extra large) is designed for high-precision applications, such as industrial automation, medical imaging, and advanced surveillance systems.

With these variations, YOLO12 delivers different levels of performance depending on the model size. Benchmark tests show that certain variants of YOLO12 outperform YOLOv10 and YOLO11 in accuracy, achieving higher mean average precision (mAP). 

However, some models, like YOLO12m, YOLO12l, and YOLO12x, process images slower than YOLO11, showing a trade-off between detection accuracy and speed. Despite this, YOLO12 remains efficient, requiring fewer parameters than many other models, though it still uses more than YOLO11. This makes it a great choice for applications where accuracy is more important than raw speed.

Fig 3. Comparing Ultralytics YOLO11 and YOLO12.

Using YOLO12 through the Ultralytics Python package

YOLO12 is supported by the Ultralytics Python package and is easy to use, making it accessible for both beginners and professionals. With just a few lines of code, users can load pre-trained models, run various computer vision tasks on images and videos, and also train YOLO12 on custom datasets. The Ultralytics Python package streamlines the process, eliminating the need for complex setup steps.

For example, here are the steps you would go through to use YOLO12 for object detection:

  • Install the Ultralytics package: First, install the Ultralytics Python package, which provides the tools needed to run YOLO12 efficiently. This ensures that all dependencies are set up correctly.
  • Load a pre-trained YOLO12 model: Choose the appropriate YOLO12 variant (nano, small, medium, large, or extra large) based on the level of accuracy and speed required for your task.
  • Provide an image or video: Input an image or video file that you want to analyze. YOLO12 can also process live video feeds for real-time detection.
  • Run the detection process: The model scans the visual data, identifies objects, and places bounding boxes around them. It labels each detected object with its predicted class and confidence score.
  • Adjust detection settings: You can also modify parameters such as confidence thresholds to fine-tune detection accuracy and performance.
  • Save or use the output: The processed image or video, now containing detected objects, can be saved or integrated into an application for further analysis, automation, or decision-making.

These steps make YOLO12 easy to use for a variety of applications, from surveillance and retail tracking to medical imaging and autonomous vehicles.

Practical YOLO12 applications

YOLO12 can be used in a variety of real-world applications thanks to its support for object detection, instance segmentation, image classification, pose estimation, and oriented object detection (OBB). 

Fig 4. YOLO12 supports tasks like object detection and instance segmentation.

However, as we discussed earlier, the YOLO12 models prioritize accuracy over speed, meaning they take slightly longer to process images compared to earlier versions. This tradeoff makes YOLO12 ideal for applications where precision is more important than real-time speed, such as:

  • Medical imaging: YOLO12 can be custom-trained to detect tumors or abnormalities in X-rays and MRIs with high accuracy, making it a useful tool for doctors and radiologists who need precise image analysis for diagnosis.
  • Quality control in manufacturing: It can help identify product defects during the production process, ensuring that only high-quality items make it to market while reducing waste and improving efficiency.
  • Forensic analysis: Law enforcement agencies can fine-tune YOLO12 to analyze surveillance footage and gather evidence. In criminal investigations, precision is vital for identifying key details.
  • Precision agriculture: Farmers can use YOLO12 to analyze crop health, detect disease or pest infestations, and monitor soil conditions. Accurate assessments help optimize farming strategies, leading to better yield and resource management.

Getting started with YOLO12

Before running YOLO12, it's important to make sure your system meets the necessary requirements.

Technically, YOLO12 can run on any dedicated GPU (Graphics Processing Unit). By default, it does not require FlashAttention, so it can work on most GPU systems without it. However, enabling FlashAttention can be especially useful when working with large datasets or high-resolution images, as it helps prevent slowdowns, reduce memory usage, and improve processing efficiency. 

To use FlashAttention, you’ll need an NVIDIA GPU from one of these series: Turing (T4, Quadro RTX), Ampere (RTX 30 series, A30, A40, A100), Ada Lovelace (RTX 40 series), or Hopper (H100, H200).

Keeping usability and accessibility in mind, the Ultralytics Python package does not yet support FlashAttention inference, as its installation can be quite technically complex. To learn more about getting started with YOLO12 and optimizing its performance, check out the official Ultralytics documentation.

Key takeaways

As computer vision advances, models are becoming more precise and efficient. YOLO12 improves computer vision tasks like object detection, instance segmentation, and image classification with attention-centric processing and FlashAttention, enhancing accuracy while optimizing memory use.

At the same time, computer vision is more accessible than ever. YOLO12 is easy to use through the Ultralytics Python package and, with its focus on accuracy over speed, is well-suited for medical imaging, industrial inspections, and robotics - applications where precision is key.

Curious about AI? Visit our GitHub repository and engage with our community. Explore innovations in sectors like AI in self-driving cars and computer vision in agriculture on our solutions pages. Check out our licensing options and bring your Vision AI projects to life. 🚀

Facebook logoTwitter logoLinkedIn logoCopy-link symbol

Read more in this category

Let’s build the future
of AI together!

Begin your journey with the future of machine learning