X
Ultralytics YOLOv8.2 ReleaseUltralytics YOLOv8.2 ReleaseUltralytics YOLOv8.2 Release Arrow
Green check
Link copied to clipboard

2024 Starts with a Generative AI Wave

A look at the exciting AI innovations from the first quarter of 2024. We'll cover breakthroughs like OpenAI's Sora AI, Neuralink's brain chip, and the latest LLMs.

Facebook logoTwitter logoLinkedIn logoCopy-link symbol

The AI community seems to make headlines almost daily. The first few months of 2024 have been exciting and packed full of new AI innovations. From powerful new large language models to human brain implants, 2024 is shaping up to be amazing.

We are seeing AI transform industries, making information more accessible, and even taking the first steps toward merging our minds with machines. Let's rewind the first quarter of 2024 and take a closer look at the progress made in AI in just a few months.

LLMs are Trending

Large language models (LLMs), designed to understand, generate, and manipulate human language based on vast amounts of text data, took center stage in the first quarter of 2024. Many major tech companies released their own LLM models, each with unique capabilities. The incredible success of prior LLMs like GPT-3 inspired this trend. Here are some of the most notable LLM releases from early 2024.

Anthropic's Claude 3

Anthropic released Claude 3 on March 14, 2024. The Claude 3 model comes in three versions: Opus, Sonnet, and Haiku, each serving different markets and purposes. Haiku, the quickest model, is optimized for fast, basic responses. Sonnet balances speed with intelligence and is targeted at enterprise applications. Opus, the most advanced version, delivers unparalleled intelligence and reasoning and is ideal for complex tasks and achieving top benchmarks.

Claude 3 boasts many advanced features and improvements:

  • Enhanced Multilingual Conversations: Improved abilities in languages including Spanish, Japanese, and French​.
  • Advanced Vision Features: Capable of handling various visual formats
  • Minimized Refusals: Shows more understanding with fewer unnecessary refusals, indicating improved contextual grasp​
  • Extended Context Window: It offers a 200K context window, but is capable of processing inputs over 1 million tokens based on customer needs.
Fig 1. Claude 3 is more contextually aware than previous versions.

Databricks' DBRX

Databricks DBRX is an open, general-purpose LLM released by Databricks on March 27, 2024. DBRX does really well in various benchmarks, including language understanding, programming, and mathematics. It surpasses other established models while being approximately 40% smaller than similar models.

Fig 2. Comparing DBRX with other models.

DBRX was trained using next-token prediction with a fine-grained mixture-of-experts (MoE) architecture, and that’s why we can see significant improvements in training and inference performance. Its architecture allows the model to predict the next word in a sequence more accurately by consulting a diverse set of specialized submodels (the "experts"). These submodels are good at handling different types of information or tasks.

Google’s Gemini 1.5

Google introduced Gemini 1.5, a compute-efficient, multimodal AI model that can analyze extensive text, video, and audio data, on February 15, 2024. The latest model is more advanced in terms of performance, efficiency, and capabilities. A key feature of Gemini 1.5 is its breakthrough in long-context understanding. The model is capable of handling up to 1 million tokens consistently. Gemini’s 1.5 capabilities are also thanks to a new MoE-based architecture.

Fig 3. Comparing Context Lengths of Popular LLMs

Here are some of Gemini’s 1.5 most interesting features:

  • Improved Data Handling: Allows for direct uploads of large PDFs, code repositories, or lengthy videos as prompts. The model can reason across modalities and output text.
  • Multiple File Uploads and Queries: Developers can now upload multiple files and ask questions.
  • Can Be Used For Different Tasks: It's optimized to scale across diverse tasks, and shows improvements in areas like math, science, reasoning, multilinguality, video understanding, and code​

Stunning Visuals from AI

The first quarter of 2024 has unveiled generative AI models that can create visuals so real they've sparked debates on the future of social media and AI's progress. Let's dive into the models stirring up the conversation.

OpenAI’s Sora 

OpenAI, the creator of ChatGPT, announced a state-of-the-art text-to-video deep learning model called Sora on February 15, 2024. Sora is a text-to-video generator capable of generating minute-long videos with high visual quality based on textual user prompts. 

For example, take a look at the following prompt. 

“A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.” 

And, here’s a frame from the output video. 

Fig 4. A frame from a video generated by Sora.

Sora’s architecture makes this possible by blending diffusion models for texture generation and transformer models for structural coherence. So far, access to Sora has been given to red teamers and a select group of visual artists, designers, and filmmakers to understand the risks and get feedback. 

Stability AI’s Stable Diffusion 3 

Stability AI announced the arrival of Stable Diffusion 3, a text-to-image generation model, on February 22nd, 2024. The model mixes diffusion transformer architecture and flow matching. They are yet to release a technical paper, but there are a few key features to look out for.

Fig 5. The output image based on the prompt: “Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy”

The latest model of Stable Diffusion offers improved performance, image quality, and accuracy in creating images with multiple subjects. Stable Diffusion 3 will also offer a variety of models ranging from 800 million to 8 billion parameters. It will allow users to choose based on their specific needs for scalability and detail.

Google’s Lumiere 

On January 23, 2024, Google launched Lumiere, a text-to-video diffusion model. Lumiere uses an architecture called Space-Time-U-Net, or STUNet for short. It helps Lumiere understand where things are and how they move in a video. By doing so, it can generate smooth and lifelike videos.

Fig 6. A frame from a video generated based on the prompt: “Panda play ukulele at home.”

With the capability to generate 80 frames per video, Lumiere is pushing boundaries and setting new standards for video quality in the AI space. Here are some of Lumiere’s features:

  • Image-to-Video: Starting from an image and a prompt, Lumiere can animate images into videos.
  • Stylized Generation: Lumiere can create videos in specific styles using a single reference image.
  • Cinemagraphs: Lumiere can animate specific regions within an image to create dynamic scenes, such as a particular object moving while the rest of the scene remains static.
  • Video Inpainting: It can modify parts of a video, such as changing the attire of people within it or altering background details.

The Future Seems to Be Here

The beginning of 2024 has also brought around many AI innovations that feel like something out of a sci-fi movie. Things that previously we would have said were impossible are now being worked on. The future doesn’t feel so far off with the following discoveries.

Elon Musk’s Neuralink

Elon Musk’s Neuralink successfully implanted its wireless brain chip in a human on January 29, 2024. This is a huge step toward connecting human brains to computers. Elon Musk shared that Neuralink’s first product, named ‘Telepathy,’ is in the pipeline. 

Fig 7. The Neuralink Implant

The goal is to enable users, particularly those who have lost limb functionality, to control devices effortlessly through their thoughts. The potential applications extend beyond convenience. Elon Musk imagines a future where individuals with paralysis can communicate easily.

Disney's HoloTile Floor 

On January 18, 2024, Walt Disney Imagineering unveiled the HoloTile Floor. It has been dubbed the world's first multi-person, omnidirectional treadmill ground. 

Fig 8. Disney Imagineer Lanny Smoot poses on his latest innovation, the HoloTile floor.

It can move under any person or object like telekinesis for an immersive virtual and augmented reality experience. You can walk in any direction, and avoid collisions while on it. Disney’s HoloTile Floor can also be planted on theatrical stages to dance and move in creative ways.

Apple’s Vision Pro

On Feb 2, 2024, Apple’s much-anticipated Vision Pro headset hit the market. It has an array of features and applications designed to redefine the virtual and augmented reality experience. The Vision Pro headset caters to a diverse audience by blending entertainment, productivity, and spatial computing. Apple proudly announced that over 600 apps, ranging from productivity tools to gaming and entertainment services, were optimized for the Vision Pro at its launch.

Cognition’s Devin

On March 12th, 2024, Cognition released a software engineering assistant called Devin. Devin is the world's first attempt at an autonomous AI software engineer. Unlike traditional coding assistants that offer suggestions or complete specific tasks, Devin is designed to handle entire software development projects from initial concept to completion. 

It can learn new technologies, build and deploy full apps, find and fix bugs, train its own models, contribute to open-source and production codebases, and even take on real development jobs from sites like Upwork. 

Fig 9. Comparing Devin with other models.

Devin was evaluated on SWE-bench, a challenging benchmark that asks agents to resolve real-world GitHub issues found in open-source projects like Django and scikit-learn. It correctly resolved 13.86% of the issues end-to-end, compared to the previous state-of-the-art of 1.96%.

Honorable Mentions

There’s been so much happening that covering everything in this article isn’t possible. But, here are some more honorable mentions. 

  • NVIDIA's LATTE3D, announced on March 21, 2024, is a text-to-3D AI model that instantly creates 3D representations from text prompts.
  • Midjourney's new text-to-video generator, teased by CEO David Holz, started training in January and is expected to launch soon.
  • Advancing the AI PC revolution, Lenovo released the ThinkBook 13x with E Ink Prism technology and high-performance AI laptops on January 8, 2024.

Stay Updated on AI Trends with Us!

The start of 2024 saw groundbreaking advancements in AI and many major technological milestones. But this is just the start of what AI can do. If you want to learn more about the latest AI developments, Ultralytics has got you covered.

Check out our GitHub repository to see our latest contributions in computer vision and AI. You can also look at our solutions pages to see how AI is being used in industries like manufacturing and healthcare

Let’s build the future
of AI together!

Begin your journey with the future of machine learning

Read more in this category