Tanh (Hyperbolic Tangent)

Discover the power of the Tanh activation function in neural networks. Learn how it enables AI to model complex data with zero-centered efficiency!

Tanh (Hyperbolic Tangent) is a widely used activation function in neural networks. It is a mathematical function that squashes input values into a range between -1 and 1. Visually, it produces an "S"-shaped curve, similar to the Sigmoid function. Its key characteristic is that its output is zero-centered, meaning that negative inputs are mapped to negative outputs and positive inputs are mapped to positive outputs. This property can help speed up the convergence of optimization algorithms like gradient descent during the model training process.

How Tanh Works

In a deep learning model, an activation function decides whether a neuron should be activated or not by calculating a weighted sum and further adding bias with it. The Tanh function takes any real-valued number and maps it to the range [-1, 1]. Large positive values are mapped close to 1, large negative values are mapped close to -1, and values near zero are mapped to values around zero. This zero-centered nature is a significant advantage, as it helps to keep the outputs of layers from shifting too far in one direction, which can make training more stable. For an in-depth technical explanation, resources from institutions like Stanford offer detailed course notes on activation functions.

Comparison With Other Activation Functions

Tanh is often compared with other activation functions, each with its own strengths and weaknesses:

Tanh vs. Sigmoid: Both functions have a similar S-shape. However, the Sigmoid function outputs values in the range, whereas Tanh outputs values in [-1, 1]. Because Tanh's output is zero-centered, it is often preferred over Sigmoid in the hidden layers of a network, as it tends to lead to faster convergence.
Tanh vs. ReLU: ReLU and its variants, like Leaky ReLU and SiLU, have become the default choice in many modern computer vision architectures. Unlike Tanh, ReLU is not computationally expensive and helps mitigate the vanishing gradient problem, where gradients become extremely small during backpropagation. However, Tanh is still valuable in specific contexts where a bounded output is required. You can see the usage of modern activation functions in models like Ultralytics YOLO11.

Applications In AI And Machine Learning

Tanh has historically been a popular choice, particularly in:

Recurrent Neural Networks (RNNs): Tanh was commonly used in the hidden states of RNNs and variants like Long Short-Term Memory (LSTM) networks, especially for tasks in Natural Language Processing (NLP). Its bounded range helps regulate the information flow within the recurrent connections. See Understanding LSTMs for more details.
Sentiment Analysis: In older NLP models, Tanh helped map features extracted from text (e.g., word embeddings processed by an RNN) to a continuous range, representing sentiment polarity from negative (-1) to positive (+1). You can find relevant datasets for sentiment analysis on platforms like Kaggle.
Control Systems and Robotics: In Reinforcement Learning (RL), Tanh is sometimes used as the final activation function for policies that output continuous actions bounded within a specific range (e.g., controlling motor torque between -1 and +1). Frameworks like Gymnasium (formerly OpenAI Gym) are often used in RL research.
Hidden Layers: It can be used in the hidden layers of feedforward networks, although ReLU variants are now more common. It might be chosen when the zero-centered property is particularly beneficial for the specific problem or architecture. You can explore the performance of different architectures in our model comparison pages.

While modern architectures like Ultralytics YOLO often utilize functions like SiLU for tasks such as object detection, understanding Tanh remains valuable. It provides context for the evolution of activation functions and might still appear in specific network designs or legacy systems. Frameworks like PyTorch and TensorFlow provide standard implementations of Tanh. You can train and experiment with different activation functions using platforms like Ultralytics HUB. The Papers with Code website also lists research that utilizes Tanh.

Tanh (Hyperbolic Tangent)

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Train YOLO models simply with Ultralytics HUB

How Tanh Works

Comparison With Other Activation Functions

Applications In AI And Machine Learning

Read more in this category

Google AlphaEarth uses observation data for global mapping

FastVLM: Apple Introduces its new fast vision language model

Human-in-the-loop machine learning (HITL) explained

Join the Ultralytics community