Glossary

Leaky ReLU

Discover the power of Leaky ReLU activation for AI and ML. Solve the dying ReLU problem and boost model performance in CV, NLP, GANs, and more!

Train YOLO models simply
with Ultralytics HUB

Learn more

Leaky Rectified Linear Unit, commonly known as Leaky ReLU, is an activation function used in Neural Networks (NN), particularly within Deep Learning (DL) models. It is a modified version of the standard Rectified Linear Unit (ReLU) activation function, designed specifically to address the "dying ReLU" problem. This issue occurs when neurons become inactive and output zero for any input, effectively preventing them from learning during the training process due to zero gradients during backpropagation.

How Leaky ReLU Works

Like ReLU, Leaky ReLU outputs the input directly if it is positive. However, unlike ReLU which outputs zero for any negative input, Leaky ReLU allows a small, non-zero, constant gradient (slope) for negative inputs. This "leak" ensures that neurons remain active even when their input is negative, allowing gradients to flow backwards through the network and enabling continued learning. The small slope is typically a fixed small value (e.g., 0.01), but variations like Parametric ReLU (PReLU) allow this slope to be learned during training.

Addressing the Dying ReLU Problem

The primary motivation behind Leaky ReLU is to mitigate the dying ReLU problem. When a standard ReLU neuron receives a large negative input, its output becomes zero. If the gradient flowing back during training is also zero, the neuron's weights will not be updated, and it may remain permanently inactive for all inputs. Leaky ReLU prevents this by ensuring a small, non-zero gradient always exists, even for negative inputs, thus preventing neurons from completely dying and improving the robustness of the training process, especially in very deep networks where the vanishing gradient problem can also be a concern.

Relevance and Applications in AI and ML

Leaky ReLU is a valuable tool in scenarios where maintaining active neurons throughout training is critical. Its computational efficiency, similar to standard ReLU, makes it suitable for large-scale models. Key applications include:

Leaky ReLU vs. Other Activation Functions

Compared to standard ReLU, Leaky ReLU's main advantage is avoiding the dying neuron problem. Other activation functions like ELU (Exponential Linear Unit) or SiLU (Sigmoid Linear Unit) also address this issue, sometimes offering benefits like smoother gradients, as seen in models like Ultralytics YOLOv8. However, these alternatives, such as ELU, can be computationally more expensive than Leaky ReLU (see activation function comparisons). The optimal choice often depends on the specific neural network architecture, the dataset (like those found on Ultralytics Datasets), and empirical results obtained through processes like hyperparameter tuning. Frameworks like PyTorch (PyTorch Docs) and TensorFlow (TensorFlow Docs) provide easy implementations for various activation functions, facilitating experimentation within platforms like Ultralytics HUB.

Read all