Glossary

Residual Networks (ResNet)

Discover how ResNets revolutionize deep learning by solving vanishing gradients, enabling ultradeep networks for image analysis, NLP, and more.

Train YOLO models simply
with Ultralytics HUB

Learn more

Residual Networks, commonly known as ResNet, represent a groundbreaking convolutional neural network (CNN) architecture developed by Kaiming He and colleagues at Microsoft Research. Introduced in their 2015 paper, "Deep Residual Learning for Image Recognition", ResNet addressed a major challenge in deep learning (DL): the degradation problem. This problem occurs when adding more layers to a very deep network leads to higher training error, contrary to the expectation that deeper models should perform better. ResNet's innovation allowed for the successful training of networks substantially deeper than previously feasible, significantly advancing the state-of-the-art in various computer vision (CV) tasks.

How ResNets Work: Skip Connections

The core idea behind ResNet is the introduction of "skip connections" or "shortcut connections." In traditional deep networks, each layer feeds sequentially into the next. ResNet modifies this by allowing the input of a block of layers to be added to the output of that block. This creates a "residual block" where the layers learn a residual mapping (the difference between the input and the desired output) rather than trying to learn the entire underlying mapping directly. If the optimal function is closer to an identity mapping (where the output should be the same as the input), it's easier for the network to learn to make the residual zero (by driving the weights of the stacked layers towards zero) than to learn the identity mapping itself through non-linear layers.

These skip connections facilitate gradient flow during backpropagation, mitigating the vanishing gradient problem that often plagues very deep networks. This allows for the construction and effective training of networks with hundreds or even thousands of layers, achieving remarkable accuracy improvements on challenging benchmark datasets like ImageNet.

Key Concepts

  • Residual Block: The fundamental building unit of a ResNet, consisting of a few convolutional layers and a skip connection that adds the block's input to its output.
  • Skip Connection (Shortcut): A direct connection that bypasses one or more layers, enabling easier gradient flow and identity mapping learning.
  • Identity Mapping: When a layer or block simply passes its input through unchanged. Skip connections make it easier for residual blocks to approximate identity mappings if needed.
  • Degradation Problem: The phenomenon where deeper networks perform worse (higher training and test error) than shallower counterparts, addressed by ResNet's residual learning.

Relevance In Computer Vision

ResNet architectures quickly became a standard backbone for many computer vision tasks beyond image classification, including:

  • Object Detection: Many detection models, like Faster R-CNN and some variants used in systems compared against Ultralytics YOLO models (e.g., RT-DETR), utilize ResNet backbones for feature extraction (Object Detection glossary).
  • Image Segmentation: Architectures like Mask R-CNN often employ ResNet for extracting rich spatial features necessary for pixel-level classification (Image Segmentation glossary).

Its ability to extract powerful features from images made it a highly versatile and widely adopted architecture.

Real-World Applications

  1. Medical Image Analysis: ResNets are extensively used in analyzing medical scans (X-rays, CT, MRI) to detect anomalies like tumors or diabetic retinopathy. The depth enabled by ResNet allows the model to learn intricate patterns indicative of diseases, aiding radiologists in diagnosis. You can explore related applications in AI in Radiology and learn more about the field in medical image analysis. Initiatives like the NIH's Bridge2AI program often leverage such advanced models.
  2. Autonomous Driving: Perception systems in self-driving cars often rely on ResNet-based architectures for real-time object detection and recognition of pedestrians, vehicles, traffic lights, and road signs. The robustness and accuracy of deep ResNet models are crucial for safety in complex driving scenarios (AI in Automotive solutions). Companies like Waymo detail the importance of robust perception systems.

Comparison To Other Architectures

  • VGGNet: While VGGNet demonstrated the benefit of depth using simple 3x3 convolutions, it struggled with convergence for very deep networks due to vanishing gradients. ResNet directly addressed this limitation (Vision AI History blog, VGG paper).
  • DenseNet: DenseNets connect each layer to every other layer in a feed-forward fashion, promoting feature reuse. This differs from ResNet's additive skip connections. Both aim to improve information flow but use different mechanisms (DenseNet paper).
  • Vision Transformers (ViT): More recent architectures like ViT use attention mechanisms, diverging from the convolutional approach of ResNet, and have shown competitive or superior performance on many benchmarks, though ResNets remain influential and widely used.

Tools And Implementation

ResNet architectures are readily available in major deep learning frameworks like PyTorch (PyTorch official site) and TensorFlow (TensorFlow official site). Pre-trained models, often trained on ImageNet, are accessible through libraries like torchvision, enabling effective transfer learning. Platforms like Ultralytics HUB allow users to leverage various architectures, including ResNet-based ones, for training custom models and deploying them (Ultralytics HUB documentation). You can find further educational resources on CNNs at Stanford CS231n or through courses like those offered by DeepLearning.AI.

Read all