What is instance segmentation? A quick guide

Join us as we take a closer look at what instance segmentation is, how it works, its use in various computer vision applications, and the impact it can have.

Written by

Abirami Vina

min read

Mar 6, 2025

Apr 13, 2025

What is instance segmentation?

Instance segmentation vs semantic segmentation

Popular instance segmentation models

Understanding how instance segmentation works

Custom training YOLO11 for instance segmentation

How to custom train YOLO11

Instance segmentation applications enabled by YOLO11

Construction site safety and monitoring using YOLO11

Animal monitoring with segmentation and YOLO11

YOLO11 in sports analytics and player tracking

Pros and cons of instance segmentation

Key takeaways

Computer vision applications are becoming more common in our daily lives, from traffic cameras monitoring road conditions to self-checkout systems in stores. By enabling machines to understand visual data in a manner similar to humans, Vision AI is making an impact in a range of industries.

Many of these applications rely on object detection, a computer vision task that places bounding boxes around key objects in images. While this approach often works well, some image analysis solutions need even greater precision.

For example, medical imaging, requires more than just detecting a tumor - it’s crucial to outline its exact shape. Similarly, in robotics, machines need to recognize an object’s exact contours to grasp it correctly. To address these challenges, instance segmentation offers a more precise solution.

Instance segmentation is a computer vision task designed to support use cases where detecting objects isn’t enough - it provides pixel-level accuracy. Computer vision models like Ultralytics YOLO11 can be used to apply instance segmentation to images and videos easily.

__wf_reserved_inherit — Fig 1. Example of using YOLO11 for instance segmentation.

‍

In this guide, we’ll break down how instance segmentation works, its applications, and how Ultralytics YOLO11 can be custom-trained for specific segmentation tasks.

What is instance segmentation?

Let's say there's a group photo of people standing close together. Object detection can help draw boxes around each person, but that doesn’t tell you their exact shape.

Instance segmentation, on the other hand, is similar to carefully tracing around each person so you can see their full outline, even if they overlap. Instead of just marking where something is with a box, it identifies the exact shape of each object at the pixel level, making it easier to understand complex images.

The result is a detailed mask that fills in the shape of an object, pinpointing exactly which pixels belong to it. This level of precision is useful in many real-world applications where understanding the exact shape and boundaries of objects is important.

‍

Instance segmentation vs semantic segmentation

While exploring instance segmentation, you might come across the concept of semantic segmentation.

Both techniques help computers understand images at the pixel level, but they serve different purposes. Semantic segmentation labels every pixel based on its category, grouping all objects of the same type together. For example, in an image with multiple cars, semantic segmentation would mark all of them as "car" without distinguishing between individual vehicles.

Instance segmentation, on the other hand, takes it a step further by identifying each object separately. It assigns unique labels to individual instances and creates precise masks around their shapes. So in the same image, instance segmentation wouldn't just label everything as "car" but would recognize and outline each car individually.

The main difference between the two is that semantic segmentation, groups objects by category, while instance segmentation distinguishes each object as a unique entity with clear boundaries. Choosing which task to use depends on the specific application - whether it's enough to know what’s in an image or if it's important to differentiate between individual objects.

‍

Popular instance segmentation models

There are various instance segmentation models available to the Vision AI community nowadays. Some are faster, some are more accurate, and some are easier to use.

These options while useful can lead to the question, which one is the right one to use for a specific task? Among the options, Ultralytics YOLO models are quite popular because they focus on speed and accuracy.

Also, these models have evolved significantly over the years. For example, Ultralytics YOLOv5 simplified deployment using frameworks like PyTorch, making advanced Vision AI accessible to a broader audience without requiring deep technical expertise.

Building on that success, Ultralytics YOLOv8 introduced enhanced support for computer vision tasks such as instance segmentation, pose estimation, and image classification.

Now, YOLO11 takes performance to a new level. It achieves a higher mean average precision (mAP) on the COCO dataset with 22% fewer parameters than YOLOv8m, meaning it can recognize objects more precisely while using fewer resources.

‍

Simply put, YOLO11 delivers state-of-the-art accuracy without compromising on efficiency, making it a game-changer in the field.

Understanding how instance segmentation works

Next, let’s explore how instance segmentation typically works. Older computer vision models use a two-step approach.

First, they detect objects by drawing bounding boxes around them. Then, they generate a pixel-level mask to outline each object’s exact shape. A well-known example is Mask R-CNN, which builds on object detection models by adding a mask prediction step. While this method is effective, it can be slow because it processes the image in multiple stages, making real-time applications more challenging.

Meanwhile, models like YOLO11 process images in one go, simultaneously predicting object bounding boxes and instance segmentation masks. This streamlined approach makes it much faster while still maintaining high accuracy. As a result, it is particularly useful for real-time applications like autonomous driving, video analysis, and robotics, where both speed and precision are crucial.

Custom training YOLO11 for instance segmentation

Out of the box, YOLO11 comes as a pre-trained model. It has been trained on the COCO-Seg dataset, which covers everyday objects for instance segmentation. However, the Ultralytics Python package supports custom training, which is essential for specialized applications where unique objects need to be segmented.

Why is custom training or fine-tuning a model important? Custom training leverages transfer learning by building on the knowledge already embedded in pre-trained models. Rather than starting from scratch, it adapts an existing model to new tasks using smaller datasets and fewer computing resources, all while maintaining high accuracy.

How to custom train YOLO11

Here’s a closer look at the steps involved in fine-tuning YOLO11 for instance segmentation:

Data preparation: Collect and annotate images based on your specific application. Ultralytics provides support for multiple image datasets, but you can also train using your own dataset by preparing images and annotations in the required YOLO format.
‍
Using a pre-trained model: Instead of starting from scratch, use a pre-trained Ultralytics YOLO11 model.
‍
Model training: Adjust vital training settings like batch size (images processed per iteration), image size (target input resolution), and epochs (total training cycles) and train the model.
‍
Performance evaluation: After model training is complete, you can test the model's accuracy using performance metrics like mAP. The Ultralytics Python package also provides built-in functions for model evaluation.

Instance segmentation applications enabled by YOLO11

Instance segmentation can be used to solve real-world challenges by helping machines see and understand objects more accurately. From improving automation to protecting the environment, it plays a key role in many fields. Let's walk through some examples of where it is making an impact.

Construction site safety and monitoring using YOLO11

Instance segmentation can be a critical part of ensuring safety and efficiency at construction sites. For example, it can be used to monitor heavy machinery.

YOLO11 can be fine-tuned to accurately segment and identify different types of equipment, such as cranes, excavators, and bulldozers, and track their positions in real time. This allows site managers to make sure that machinery operates strictly within designated areas and does not encroach upon zones where workers are present or hazards exist.

Also, integrating such solutions with real-time alert systems enables swift corrective actions to be taken. Beyond this, the collected insights can help optimize site layout and workflow, further reducing risks and boosting productivity.

‍

Animal monitoring with segmentation and YOLO11

Animal behavior monitoring helps researchers, farmers, and conservationists take better care of animals in different environments. Instance segmentation plays a helpful role in these systems by identifying and segmenting individual animals in farms, zoos, and natural habitats. Unlike traditional object detection that uses bounding boxes, instance segmentation provides a pixel-level delineation of each animal, which is particularly useful when animals are in close proximity.

Detailed segmentation facilitates more accurate tracking of movements and behaviors. Overlapping or closely clustered animals can be distinctly recognized, and provide a more precise analysis of interactions, health assessments, and activity patterns. Overall, deeper insights into animal behavior enhance animal care and management practices.

‍

YOLO11 in sports analytics and player tracking

Precise player and event tracking is a huge part of sports analysis. Traditional tracking methods rely on manual tagging, which may not capture detailed interactions. Computer vision can be used to segment details like each player, ball, and key event at the pixel level to get detailed insights.

For example, instance segmentation can help detect events like fouls or off-ball incidents by clearly separating each player and object. This granular monitoring enabled by models like YOLO11 offers analysts clearer information to study movement patterns, spatial positioning, and interactions with high accuracy. A key benefit of these insights is that they help teams refine their strategies and boost overall performance.

Pros and cons of instance segmentation

Here are some of the key benefits that instance segmentation can bring to various industries:

Improved automation: By automating tasks such as quality control and safety monitoring, instance segmentation reduces the need for manual intervention and minimizes human error.
‍
Better scene understanding: By accurately outlining each object, instance segmentation contributes to a deeper understanding of complex scenes, supporting more informed decision-making.
‍
Efficient post-processing: The pixel-level output simplifies tasks like background removal, object counting, and spatial analysis, reducing the need for additional processing steps.

While these benefits highlight how instance segmentation impacts different use cases, it's also essential to consider the challenges involved in its implementation.

Here are some of the key limitations of instance segmentation:

Challenges with transparency: Segmenting transparent or reflective objects like glass and water is difficult, leading to inaccurate boundaries.
‍
Maintenance overhead: To keep models accurate and relevant, continuous updates and fine-tuning are necessary as environmental conditions and datasets change.
‍
High annotation effort: Training instance segmentation models requires detailed pixel-level annotations, which significantly increases the time and cost involved in data preparation.

Key takeaways

Instance segmentation makes it possible to distinguish individual objects with precision, even when they overlap. By capturing object boundaries at the pixel level, it provides a deeper understanding of visual data compared to traditional computer vision tasks like object detection.

Recent advancements in computer vision have made instance segmentation faster and easier to use. In particular, computer vision models like Ultralytics YOLO11 simplify the process, enabling real-time segmentation with minimal setup, making it more accessible for various industries and applications.

Curious about AI? Visit our GitHub repository and connect with our community to keep exploring. Learn about innovations like AI in self-driving cars and Vision AI in agriculture on our solutions pages. Check out our licensing options and get started on a computer vision project!

What is instance segmentation? A quick guide

What is instance segmentation?

Instance segmentation vs semantic segmentation

Popular instance segmentation models

Understanding how instance segmentation works

Custom training YOLO11 for instance segmentation

How to custom train YOLO11

Instance segmentation applications enabled by YOLO11

Construction site safety and monitoring using YOLO11

Animal monitoring with segmentation and YOLO11

YOLO11 in sports analytics and player tracking

Pros and cons of instance segmentation

Key takeaways

Read more in this category

Let’s build the future
of AI together!

What is instance segmentation? A quick guide

What is instance segmentation?

Instance segmentation vs semantic segmentation

Popular instance segmentation models

Understanding how instance segmentation works

Custom training YOLO11 for instance segmentation

How to custom train YOLO11

Instance segmentation applications enabled by YOLO11

Construction site safety and monitoring using YOLO11

Animal monitoring with segmentation and YOLO11

YOLO11 in sports analytics and player tracking

Pros and cons of instance segmentation

Key takeaways

Read more in this category

Let’s build the future of AI together!

Let’s build the future
of AI together!