Glossary

Semantic Segmentation

Discover the power of semantic segmentation—classify every pixel in images for precise scene understanding. Explore applications & tools now!

Train YOLO models simply
with Ultralytics HUB

Learn more

Semantic segmentation is a fundamental task in computer vision (CV) that involves assigning a specific class label to every single pixel within an image. Unlike other vision tasks that might identify objects or classify the whole image, semantic segmentation provides a dense, pixel-level understanding of the scene content. This means it doesn't just detect that there is a car, but precisely outlines which pixels belong to the car category, differentiating them from pixels belonging to the road, sky, or pedestrians. It aims to partition an image into meaningful regions corresponding to different object categories, providing a comprehensive understanding of the visual environment.

How Semantic Segmentation Works

The primary goal of semantic segmentation is to classify each pixel in an image into a predefined set of categories. For instance, in an image containing multiple cars, pedestrians, and trees, a semantic segmentation model would label all pixels making up any car as 'car', all pixels for any pedestrian as 'pedestrian', and all pixels for any tree as 'tree'. It treats all instances of the same object class identically.

Modern semantic segmentation heavily relies on deep learning, particularly Convolutional Neural Networks (CNNs). These models are typically trained using supervised learning techniques, requiring large datasets with detailed pixel-level annotations. The process involves feeding an image into the network, which then outputs a segmentation map. This map is essentially an image where each pixel's value (often represented by color) corresponds to its predicted class label, visually separating different categories like 'road', 'building', 'person', etc. The quality of data labeling is crucial for training accurate models.

Key Differences from Other Segmentation Tasks

It's important to distinguish semantic segmentation from related computer vision tasks:

  • Image Classification: Assigns a single label to the entire image (e.g., "this image contains a cat"). It doesn't locate or outline objects.
  • Object Detection: Identifies and locates objects using bounding boxes. It tells you where objects are but doesn't provide their exact shape at the pixel level.
  • Instance Segmentation: Goes a step further than semantic segmentation by not only classifying each pixel but also distinguishing between different instances of the same object class. For example, it would assign a unique ID and mask to each individual car in the scene. See this guide comparing instance and semantic segmentation for more details.
  • Panoptic Segmentation: Combines semantic and instance segmentation, providing both a category label for every pixel and unique instance IDs for countable objects ('things') while grouping uncountable background regions ('stuff') like sky or road.

Real-World Applications

The detailed scene understanding provided by semantic segmentation is crucial for many real-world applications:

Models and Tools

Semantic segmentation often employs deep learning models, particularly architectures derived from CNNs.

Read all