Glossary

Reinforcement Learning

Discover reinforcement learning, where agents optimize actions through trial & error to maximize rewards. Explore concepts, applications & benefits!

Train YOLO models simply
with Ultralytics HUB

Learn more

Reinforcement Learning (RL) is a type of Machine Learning (ML) where an intelligent agent learns to make a sequence of decisions by trying to maximize a reward it receives for its actions. Unlike supervised learning, which learns from labeled examples, or unsupervised learning, which finds patterns in unlabeled data, RL learns through trial and error by interacting with an environment. The agent receives feedback in the form of rewards or penalties based on the actions it takes, guiding its learning process towards achieving a specific goal.

Core Concepts

Several key components define a Reinforcement Learning system:

  • Agent: The learner or decision-making entity that interacts with the environment.
  • Environment: The external system or world within which the agent operates.
  • State: A representation of the current situation or configuration of the environment perceived by the agent.
  • Action: A decision or move made by the agent within the environment.
  • Reward: A numerical signal received from the environment after performing an action, indicating how good or bad that action was in a particular state. The agent's objective is typically to maximize the cumulative reward over time.
  • Policy: The strategy or mapping the agent uses to determine the next action based on the current state. This is essentially what the agent learns.
  • Value Function: A prediction of the expected future rewards achievable from a given state or by taking a specific action in a given state, following a particular policy.
  • Markov Decision Process (MDP): A mathematical framework commonly used to model RL problems, defining the interactions between the agent and the environment.

A fundamental challenge in RL is the exploration-exploitation tradeoff: the agent must balance exploring new actions to discover potentially higher rewards (exploration) against choosing actions known to yield good rewards (exploitation).

How Reinforcement Learning Works

The RL process is typically iterative. The agent observes the current state of the environment, selects an action based on its current policy, performs the action, and receives a reward (or penalty) and the next state from the environment. This feedback is used to update the agent's policy or value function, improving its decision-making over time. Common RL algorithms include Q-learning, SARSA, and Policy Gradient methods, each employing different strategies for learning and updating the policy. Deep Reinforcement Learning (DRL) combines RL with deep learning techniques, using neural networks (NN) to approximate policies or value functions, enabling RL to tackle problems with complex, high-dimensional state spaces like images or sensor data.

Comparison With Other Learning Paradigms

RL differs significantly from other ML paradigms:

  • Supervised Learning: Learns from a dataset containing labeled examples (input-output pairs). The goal is to learn a mapping function that predicts outputs for new inputs. Examples include image classification and regression. RL learns from interaction and feedback (rewards), not predefined correct answers.
  • Unsupervised Learning: Learns patterns and structures from unlabeled data. Examples include clustering and dimensionality reduction. RL is goal-oriented, learning a policy to maximize rewards, whereas unsupervised learning focuses on data structure discovery.

Real-World Applications

RL has enabled breakthroughs in various domains:

Relevance In The AI Ecosystem

Reinforcement Learning is a crucial component of the broader Artificial Intelligence (AI) landscape, particularly for creating autonomous systems capable of complex decision-making. While companies like Ultralytics specialize in vision AI models like Ultralytics YOLO for tasks such as object detection and instance segmentation using supervised learning, the perception capabilities provided by these models are often essential inputs (states) for RL agents. For instance, a robot might use an object detection model deployed via Ultralytics HUB to understand its surroundings before an RL policy decides its next move. Understanding RL provides context for how advanced perception fits into building intelligent, autonomous systems, often developed using frameworks like PyTorch and tested in simulation environments like Gymnasium (formerly OpenAI Gym). Many real-world applications involve integrating perception (Computer Vision) with decision-making (RL).

Read all