용어집

일괄 정규화

일괄 정규화로 딥러닝 성능을 향상하세요! 이 기법으로 AI 모델의 학습 속도, 안정성, 정확도를 향상시키는 방법을 알아보세요.

YOLO 모델을 Ultralytics HUB로 간단히
훈련

자세히 알아보기

Batch Normalization is a technique used widely in deep learning to stabilize the learning process and significantly speed up the training of deep neural networks. Introduced by Sergey Ioffe and Christian Szegedy in their 2015 paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", it addresses the problem where the distribution of inputs to layers deep in the network changes during training, known as internal covariate shift. By normalizing the inputs to each layer for each mini-batch, Batch Normalization helps maintain a more stable distribution of activation values, leading to smoother and faster convergence.

배치 정규화 작동 방식

During training, Batch Normalization standardizes the inputs to a layer for each mini-batch. This involves calculating the mean and variance of the activations across the mini-batch and then normalizing these activations. Crucially, the technique also introduces two learnable parameters per activation channel – a scale (gamma) and a shift (beta) parameter. These parameters allow the network to learn the optimal scale and mean of the normalized inputs, essentially giving it the flexibility to undo the normalization if that proves beneficial for learning. This process helps combat issues like vanishing gradients and exploding gradients by keeping activations within a reasonable range. During inference, the mean and variance are fixed, typically using population statistics estimated during training.

Benefits of Using Batch Normalization

Applying Batch Normalization in neural networks offers several key advantages:

  • Faster Training: It often allows for significantly higher learning rates, which accelerates the convergence of the training process. See Tips for Model Training for more optimization strategies.
  • Improved Gradient Flow: By stabilizing activation distributions, it mitigates the problems of vanishing and exploding gradients, leading to more stable training, especially in very deep networks.
  • Regularization Effect: Batch Normalization adds a slight noise component to the layer inputs due to the mini-batch statistics. This acts as a form of regularization, potentially reducing the need for other techniques like Dropout.
  • Reduced Sensitivity to Initialization: Networks with Batch Normalization are often less sensitive to the initial weights chosen before training begins.
  • Enables Deeper Networks: By addressing issues related to training deep architectures, it facilitates the successful training of much deeper models.

애플리케이션 및 예시

Batch Normalization is a staple component in many state-of-the-art deep learning models, particularly in computer vision.

  1. Image Recognition and Object Detection: In Convolutional Neural Networks (CNNs), Batch Normalization is typically applied after convolutional layers and before the activation function (like ReLU). Models like ResNet heavily rely on it. In object detection models, such as Ultralytics YOLO, Batch Normalization helps stabilize training, improve accuracy, and speed up convergence, enabling effective detection on complex datasets like COCO. Variations like Cross mini-Batch Normalization (CmBN) were used in models like YOLOv4 to further enhance performance.
  2. Generative Adversarial Networks (GANs): Batch Normalization is often used in the generator and discriminator networks of GANs to stabilize the adversarial training process, although careful implementation is needed to avoid artifacts. It helps prevent mode collapse and ensures smoother training dynamics.

Considerations and Implementations

A key consideration for Batch Normalization is its dependence on the mini-batch size during training. Performance can degrade if the batch size is too small (e.g., 1 or 2), as the batch statistics become noisy estimates of the population statistics. Furthermore, the behavior differs between training (using batch statistics) and inference (using estimated population statistics). Standard deep learning frameworks like PyTorch (torch.nn.BatchNorm2d) 및 TensorFlow (tf.keras.layers.BatchNormalization) provide robust implementations. Despite alternatives, Batch Normalization remains a fundamental technique for training many modern deep learning models effectively.

모두 보기