Glossary

Large Language Model (LLM)

Discover how Large Language Models (LLMs) revolutionize AI with advanced NLP, powering chatbots, content creation, and more. Learn key concepts!

Train YOLO models simply
with Ultralytics HUB

Learn more

Large Language Models (LLMs) represent a significant advancement in the field of Artificial Intelligence (AI), particularly within Natural Language Processing (NLP). These models are characterized by their immense scale, often containing billions of parameters, and are trained on vast datasets comprising text and code. This extensive training enables LLMs to understand context, generate coherent and human-like text, translate languages, answer questions, and perform a wide array of language-based tasks with remarkable proficiency. They are a specific type of Deep Learning (DL) model, driving innovation across numerous applications.

Definition

A Large Language Model is fundamentally a sophisticated neural network (NN), typically based on the Transformer architecture. The "large" in LLM refers to the huge number of parameters—variables adjusted during training—that can range from billions to trillions. More parameters generally allow the model to learn more complex patterns from data. LLMs learn these patterns through unsupervised learning on massive text corpora gathered from the internet, books, and other sources. This process helps them grasp grammar, facts, reasoning abilities, and even biases present in the data. Core capabilities include predicting subsequent words in a sentence, which forms the basis for tasks like text generation and question answering. Well-known examples include the GPT series from OpenAI like GPT-4, Llama models from Meta AI such as Llama 3, Gemini from Google DeepMind, and Claude from Anthropic.

Applications

The versatility of LLMs allows them to be applied across diverse domains. Here are two concrete examples:

  • Conversational AI: LLMs power sophisticated chatbots and virtual assistants like ChatGPT and Google Assistant, enabling more natural and context-aware interactions compared to older rule-based systems. They can handle customer service inquiries, provide information, and engage in complex dialogues.
  • Content Creation and Summarization: Businesses and individuals use LLMs to generate marketing copy, write articles, create code snippets, and summarize lengthy documents (Text Summarization). Tools like Microsoft Copilot integrate LLMs to assist users in various writing and coding tasks.

Key Concepts

Understanding LLMs involves familiarity with several related concepts:

  • Foundation Models: LLMs are considered a type of foundation model, meaning they are large models trained on broad data that can be adapted (fine-tuned) for various downstream tasks.
  • Attention Mechanisms: Crucial to the Transformer architecture, attention allows the model to weigh the importance of different words in the input sequence when generating output, enabling better handling of long-range dependencies and context. The seminal paper introducing this is "Attention Is All You Need".
  • Prompt Engineering: This is the practice of designing effective inputs (prompts) to guide the LLM towards generating the desired output. The quality of the prompt significantly influences the model's response.
  • Tokenization: LLMs process text by breaking it down into smaller units called tokens (words, subwords, or characters). The way text is tokenized affects model performance and computational cost.

While LLMs excel at language tasks, they differ from models primarily designed for Computer Vision (CV), such as Ultralytics YOLO models used for object detection. However, the rise of Multi-modal Models and Vision Language Models is bridging this gap, combining language understanding with visual processing. Platforms like Ultralytics HUB facilitate the training and deployment of various AI models, including those for vision tasks.

Read all