Glossary

Token

Learn how tokens, the building blocks of AI models, power NLP, computer vision, and tasks like sentiment analysis and object detection.

In the realm of Artificial Intelligence (AI) and Machine Learning (ML), particularly in Natural Language Processing (NLP) and increasingly in computer vision, a 'token' represents the smallest unit of data that a model processes. Think of tokens as the fundamental building blocks that AI models use to understand and analyze information, whether it's text, images, or other forms of data. They are essential for converting raw input into a format that algorithms can interpret and learn from, forming the basis for many complex AI tasks.

Understanding Tokens

Tokens are the discrete outputs of a process called tokenization. In NLP, for example, a sentence like "Ultralytics YOLO is fast and accurate" can be tokenized into individual words: ["Ultralytics", "YOLO", "is", "fast", "and", "accurate"]. Depending on the specific tokenization strategy, tokens could also be sub-word units (e.g., "Ultra", "lytics") or even individual characters. This breakdown transforms continuous text or complex data into manageable pieces.

The reason tokens are crucial is that most deep learning models, including powerful architectures like Transformers used in many modern AI systems, cannot process raw, unstructured data directly. They require input in a structured, often numerical, format. Tokenization provides this bridge. Once data is tokenized, each token is typically mapped to a numerical representation, such as an ID in a vocabulary or, more commonly, dense vector representations called embeddings. These embeddings capture semantic relationships between tokens, which models learn during training.

Tokenization Methods

Different methods exist for breaking down data into tokens:

Word-based Tokenization: Splits text based on spaces and punctuation. Simple but struggles with large vocabularies and unknown words.
Character-based Tokenization: Uses individual characters as tokens. Handles any word but results in very long sequences.
Subword Tokenization: A balance between word and character methods. Techniques like Byte Pair Encoding (BPE) or WordPiece break words into common sub-units, efficiently handling large vocabularies and rare words. These are widely used in Large Language Models (LLMs).

Applications of Tokens

Tokens are fundamental across various AI domains. Here are two concrete examples:

Machine Translation: In services like Google Translate, an input sentence in one language is first tokenized. These tokens are processed by a sequence-to-sequence model (often a Transformer), which then generates tokens representing the translated sentence in the target language. The choice of tokenization significantly impacts translation accuracy and fluency. LLMs like GPT-4 and BERT heavily rely on token processing for tasks including translation, text generation, and sentiment analysis. Techniques such as prompt tuning and prompt chaining involve manipulating input token sequences to guide model behavior.
Computer Vision with Transformers: While traditionally associated with NLP, tokens are now central to advanced computer vision models like Vision Transformers (ViTs). In a ViT, an image is divided into fixed-size, non-overlapping patches (e.g., 16x16 pixels). Each patch is treated as a 'visual token'. These tokens are linearly embedded and fed into a Transformer architecture, which uses attention mechanisms to analyze relationships between different parts of the image. This approach is used for tasks like image classification, object detection, and image segmentation. Models like the Segment Anything Model (SAM) utilize this token-based approach. Even in convolutional models like Ultralytics YOLOv8 or the newer Ultralytics YOLO11, the grid cell system used for detection can be viewed as an implicit form of spatial tokenization.

Understanding tokens is fundamental to grasping how AI models interpret and process information. As AI evolves, the concept of tokens and the methods for creating them will remain central to handling diverse data types and building more sophisticated models for applications ranging from medical image analysis to autonomous vehicles. Platforms like Ultralytics HUB provide tools to manage datasets and train models, often involving data that is implicitly or explicitly tokenized.

Token

Train YOLO models simply
with Ultralytics HUB

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Train YOLO models simply with Ultralytics HUB

Understanding Tokens

Tokenization Methods

Applications of Tokens

Read more blogs

Join the Ultralytics community

Token

Train YOLO models simplywith Ultralytics HUB

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Train YOLO models simply with Ultralytics HUB

Understanding Tokens

Tokenization Methods

Tokens vs. Related Concepts

Applications of Tokens

Read more blogs

Join the Ultralytics community

Train YOLO models simply
with Ultralytics HUB