A look at the exciting AI innovations from the first quarter of 2024. We'll cover breakthroughs like OpenAI's Sora AI, Neuralink's brain chip, and the latest LLMs.
The AI community seems to make headlines almost daily. The first few months of 2024 have been exciting and packed full of new AI innovations. From powerful new large language models to human brain implants, 2024 is shaping up to be amazing.
We are seeing AI transform industries, making information more accessible, and even taking the first steps toward merging our minds with machines. Let's rewind the first quarter of 2024 and take a closer look at the progress made in AI in just a few months.
Large language models (LLMs), designed to understand, generate, and manipulate human language based on vast amounts of text data, took center stage in the first quarter of 2024. Many major tech companies released their own LLM models, each with unique capabilities. The incredible success of prior LLMs like GPT-3 inspired this trend. Here are some of the most notable LLM releases from early 2024.
Anthropic released Claude 3 on March 14, 2024. The Claude 3 model comes in three versions: Opus, Sonnet, and Haiku, each serving different markets and purposes. Haiku, the quickest model, is optimized for fast, basic responses. Sonnet balances speed with intelligence and is targeted at enterprise applications. Opus, the most advanced version, delivers unparalleled intelligence and reasoning and is ideal for complex tasks and achieving top benchmarks.
Claude 3 boasts many advanced features and improvements:
Databricks DBRX is an open, general-purpose LLM released by Databricks on March 27, 2024. DBRX does really well in various benchmarks, including language understanding, programming, and mathematics. It surpasses other established models while being approximately 40% smaller than similar models.
DBRX was trained using next-token prediction with a fine-grained mixture-of-experts (MoE) architecture, and that’s why we can see significant improvements in training and inference performance. Its architecture allows the model to predict the next word in a sequence more accurately by consulting a diverse set of specialized submodels (the "experts"). These submodels are good at handling different types of information or tasks.
Google introduced Gemini 1.5, a compute-efficient, multimodal AI model that can analyze extensive text, video, and audio data, on February 15, 2024. The latest model is more advanced in terms of performance, efficiency, and capabilities. A key feature of Gemini 1.5 is its breakthrough in long-context understanding. The model is capable of handling up to 1 million tokens consistently. Gemini’s 1.5 capabilities are also thanks to a new MoE-based architecture.
Here are some of Gemini’s 1.5 most interesting features:
The first quarter of 2024 has unveiled generative AI models that can create visuals so real they've sparked debates on the future of social media and AI's progress. Let's dive into the models stirring up the conversation.
OpenAI, the creator of ChatGPT, announced a state-of-the-art text-to-video deep learning model called Sora on February 15, 2024. Sora is a text-to-video generator capable of generating minute-long videos with high visual quality based on textual user prompts.
For example, take a look at the following prompt.
“A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.”
And, here’s a frame from the output video.
Sora’s architecture makes this possible by blending diffusion models for texture generation and transformer models for structural coherence. So far, access to Sora has been given to red teamers and a select group of visual artists, designers, and filmmakers to understand the risks and get feedback.
Stability AI announced the arrival of Stable Diffusion 3, a text-to-image generation model, on February 22nd, 2024. The model mixes diffusion transformer architecture and flow matching. They are yet to release a technical paper, but there are a few key features to look out for.
The latest model of Stable Diffusion offers improved performance, image quality, and accuracy in creating images with multiple subjects. Stable Diffusion 3 will also offer a variety of models ranging from 800 million to 8 billion parameters. It will allow users to choose based on their specific needs for scalability and detail.
On January 23, 2024, Google launched Lumiere, a text-to-video diffusion model. Lumiere uses an architecture called Space-Time-U-Net, or STUNet for short. It helps Lumiere understand where things are and how they move in a video. By doing so, it can generate smooth and lifelike videos.
With the capability to generate 80 frames per video, Lumiere is pushing boundaries and setting new standards for video quality in the AI space. Here are some of Lumiere’s features:
The beginning of 2024 has also brought around many AI innovations that feel like something out of a sci-fi movie. Things that previously we would have said were impossible are now being worked on. The future doesn’t feel so far off with the following discoveries.
Elon Musk’s Neuralink successfully implanted its wireless brain chip in a human on January 29, 2024. This is a huge step toward connecting human brains to computers. Elon Musk shared that Neuralink’s first product, named ‘Telepathy,’ is in the pipeline.
The goal is to enable users, particularly those who have lost limb functionality, to control devices effortlessly through their thoughts. The potential applications extend beyond convenience. Elon Musk imagines a future where individuals with paralysis can communicate easily.
On January 18, 2024, Walt Disney Imagineering unveiled the HoloTile Floor. It has been dubbed the world's first multi-person, omnidirectional treadmill ground.
It can move under any person or object like telekinesis for an immersive virtual and augmented reality experience. You can walk in any direction, and avoid collisions while on it. Disney’s HoloTile Floor can also be planted on theatrical stages to dance and move in creative ways.
On Feb 2, 2024, Apple’s much-anticipated Vision Pro headset hit the market. It has an array of features and applications designed to redefine the virtual and augmented reality experience. The Vision Pro headset caters to a diverse audience by blending entertainment, productivity, and spatial computing. Apple proudly announced that over 600 apps, ranging from productivity tools to gaming and entertainment services, were optimized for the Vision Pro at its launch.
On March 12th, 2024, Cognition released a software engineering assistant called Devin. Devin is the world's first attempt at an autonomous AI software engineer. Unlike traditional coding assistants that offer suggestions or complete specific tasks, Devin is designed to handle entire software development projects from initial concept to completion.
It can learn new technologies, build and deploy full apps, find and fix bugs, train its own models, contribute to open-source and production codebases, and even take on real development jobs from sites like Upwork.
Devin was evaluated on SWE-bench, a challenging benchmark that asks agents to resolve real-world GitHub issues found in open-source projects like Django and scikit-learn. It correctly resolved 13.86% of the issues end-to-end, compared to the previous state-of-the-art of 1.96%.
There’s been so much happening that covering everything in this article isn’t possible. But, here are some more honorable mentions.
The start of 2024 saw groundbreaking advancements in AI and many major technological milestones. But this is just the start of what AI can do. If you want to learn more about the latest AI developments, Ultralytics has got you covered.
Check out our GitHub repository to see our latest contributions in computer vision and AI. You can also look at our solutions pages to see how AI is being used in industries like manufacturing and healthcare.
Begin your journey with the future of machine learning