Glossary

Long Short-Term Memory (LSTM)

Discover how Long Short-Term Memory (LSTM) networks excel in handling sequential data, overcoming RNN limitations, and powering AI tasks like NLP and forecasting.

Train YOLO models simply
with Ultralytics HUB

Learn more

Long Short-Term Memory (LSTM) is a specialized type of Recurrent Neural Network (RNN) architecture designed to overcome the limitations of traditional RNNs in learning long-range dependencies. Introduced by Sepp Hochreiter and Jürgen Schmidhuber in 1997, LSTMs are particularly effective at processing sequences of data, such as text, speech, and time series, where context from earlier parts of the sequence is crucial for understanding later parts. This capability makes them a cornerstone technology in various Deep Learning (DL) applications.

How LSTMs Work

Traditional RNNs struggle with the vanishing gradient problem, where information from early steps in a sequence fades away as it propagates through the network, making it difficult to learn dependencies over long intervals. LSTMs address this using a unique structure involving memory cells and gates.

The core component is the memory cell, which acts like a conveyor belt, allowing information to flow through the network relatively unchanged. LSTMs use three main "gates" to regulate the information stored in the memory cell:

  1. Forget Gate: Decides which information to throw away from the cell state.
  2. Input Gate: Decides which new information to store in the cell state.
  3. Output Gate: Decides what part of the cell state to output.

These gates, implemented using activation functions like sigmoid and tanh, learn which information is important to keep or discard at each time step, enabling the network to maintain relevant context over extended sequences.

Real-World Applications

LSTMs have been successfully applied in numerous domains requiring sequence modeling:

  • Natural Language Processing (NLP): LSTMs excel at tasks like machine translation (e.g., translating long sentences while preserving meaning), sentiment analysis (understanding opinions expressed in text), and language modeling. For example, an LSTM can process a paragraph of text to understand the overall sentiment, remembering key phrases from the beginning that influence the meaning at the end.
  • Speech Recognition: They are used to convert spoken language into text by modeling the temporal dependencies in audio signals. An LSTM-based system can recognize words and phrases by considering the sequence of sounds over time, improving accuracy compared to models that don't capture long-range context. Google's speech recognition systems have historically utilized LSTMs.
  • Time Series Analysis: LSTMs are applied to forecast future values based on historical data, such as stock prices, weather patterns, or energy consumption. Their ability to remember long-term trends makes them suitable for complex predictive modeling.
  • Video Analysis: LSTMs can process sequences of video frames to understand actions or events occurring over time, contributing to applications like activity recognition.

Implementation and Tools

LSTMs can be readily implemented using popular deep learning frameworks such as PyTorch (see PyTorch LSTM documentation) and TensorFlow (see TensorFlow LSTM documentation). While Ultralytics primarily focuses on Computer Vision (CV) models like Ultralytics YOLO for tasks such as object detection and instance segmentation, understanding sequence models is valuable, especially as research explores bridging NLP and CV for tasks like video understanding or image captioning. You can explore various ML models and concepts further in the Ultralytics documentation. Managing the training and deployment of various models can be streamlined using platforms like Ultralytics HUB. The foundational LSTM paper by Hochreiter and Schmidhuber provides the original technical details. Resources like DeepLearning.AI offer courses covering sequence models, including LSTMs.

Read all