용어집

SiLU(시그모이드 선형 단위)

SiLU(Swish) 활성화 기능이 객체 감지 및 NLP와 같은 AI 작업에서 딥 러닝 성능을 어떻게 향상시키는지 알아보세요.

YOLO 모델을 Ultralytics HUB로 간단히
훈련

자세히 알아보기

SiLU (Sigmoid Linear Unit), also known as the Swish function, is an activation function used in deep learning (DL) models, particularly in neural networks (NN). It was proposed by researchers at Google and has gained popularity due to its effectiveness in improving model performance compared to traditional activation functions like ReLU and Sigmoid. SiLU is valued for its smoothness and non-monotonic properties, which can help with gradient flow and model optimization. For a broader understanding, see a general activation function overview.

How SiLU Works

SiLU is defined as the product of the input and the 시그모이드 function applied to the input. Essentially, SiLU(x) = x * sigmoid(x). This formulation allows SiLU to act as a self-gating mechanism, where the sigmoid component determines the extent to which the linear input x is passed through. When the sigmoid output is close to 1, the input passes through almost unchanged (similar to ReLU for positive values), and when it's close to 0, the output is suppressed towards zero. Unlike ReLU, SiLU is smooth and non-monotonic (it can decrease even when the input increases), properties derived from its Sigmoid function details component. The concept was detailed in the original Swish paper.

SiLU의 장점

SiLU offers several advantages that contribute to its effectiveness in deep learning models:

  • Smoothness: Unlike ReLU, SiLU is a smooth function, meaning its derivative is continuous. This smoothness can be beneficial for gradient-based optimization algorithms during backpropagation, leading to more stable training.
  • Non-Monotonicity: The function's shape, which dips slightly for negative inputs before rising towards zero, might help the network represent more complex patterns.
  • Avoiding Vanishing Gradients: While Sigmoid functions can suffer significantly from the vanishing gradient problem in deep networks, SiLU mitigates this issue, especially for positive inputs where it behaves linearly, similar to ReLU.
  • Improved Performance: Empirical studies have shown that replacing ReLU with SiLU can lead to improvements in model accuracy across various tasks and datasets, particularly in deeper architectures.

다른 활성화 기능과의 비교

SiLU distinguishes itself from other common activation functions:

  • ReLU: ReLU is computationally simpler (max(0, x)) and linear for positive values but suffers from the "dying ReLU" problem where neurons can become inactive for negative inputs. See a ReLU explanation. SiLU is smooth and avoids this issue due to its non-zero output for negative values.
  • Sigmoid: Sigmoid maps inputs to a range between 0 and 1 but suffers from saturation and vanishing gradients, making it less suitable for hidden layers in deep networks compared to SiLU.
  • Leaky ReLU: Leaky ReLU addresses the dying ReLU problem by allowing a small, non-zero gradient for negative inputs. SiLU offers a different, smoother profile.
  • GELU: GELU (Gaussian Error Linear Unit) is another smooth activation function that often performs similarly to SiLU. SiLU is generally considered computationally slightly simpler than GELU.

SiLU의 애플리케이션

SiLU는 다목적이며 딥러닝 모델이 사용되는 다양한 영역에 성공적으로 적용되었습니다:

구현

SiLU is readily available in major deep learning frameworks:

Platforms like Ultralytics HUB support training models and exploring various deployment options for models utilizing advanced components like SiLU. Continued research and resources from organizations like DeepLearning.AI help practitioners leverage such functions effectively.

모두 보기