Green check
Link copied to clipboard

2024 starts with a generative AI wave

A look at the exciting AI innovations from the first quarter of 2024. We'll cover breakthroughs like OpenAI's Sora AI, Neuralink's brain chip, and the latest LLMs.

The AI community seems to make headlines almost daily. The first few months of 2024 have been exciting and packed full of new AI innovations. From powerful new large language models to human brain implants, 2024 is shaping up to be amazing.

We are seeing AI transform industries, making information more accessible, and even taking the first steps toward merging our minds with machines. Let's rewind the first quarter of 2024 and take a closer look at the progress made in AI in just a few months.

Stunning visuals from AI

The first quarter of 2024 has unveiled generative AI models that can create visuals so real they've sparked debates on the future of social media and AI's progress. Let's dive into the models stirring up the conversation.

OpenAI’s Sora 

OpenAI, the creator of ChatGPT, announced a state-of-the-art text-to-video deep learning model called Sora on February 15, 2024. Sora is a text-to-video generator capable of generating minute-long videos with high visual quality based on textual user prompts. 

For example, take a look at the following prompt. 

“A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.” 

And, here’s a frame from the output video. 

Fig 4. A frame from a video generated by Sora.

Sora’s architecture makes this possible by blending diffusion models for texture generation and transformer models for structural coherence. So far, access to Sora has been given to red teamers and a select group of visual artists, designers, and filmmakers to understand the risks and get feedback. 

Stability AI’s Stable Diffusion 3 

Stability AI announced the arrival of Stable Diffusion 3, a text-to-image generation model, on February 22nd, 2024. The model mixes diffusion transformer architecture and flow matching. They are yet to release a technical paper, but there are a few key features to look out for.

Fig 5. The output image based on the prompt: “Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy”

The latest model of Stable Diffusion offers improved performance, image quality, and accuracy in creating images with multiple subjects. Stable Diffusion 3 will also offer a variety of models ranging from 800 million to 8 billion parameters. It will allow users to choose based on their specific needs for scalability and detail.

Google’s Lumiere 

On January 23, 2024, Google launched Lumiere, a text-to-video diffusion model. Lumiere uses an architecture called Space-Time-U-Net, or STUNet for short. It helps Lumiere understand where things are and how they move in a video. By doing so, it can generate smooth and lifelike videos.

Fig 6. A frame from a video generated based on the prompt: “Panda play ukulele at home.”

With the capability to generate 80 frames per video, Lumiere is pushing boundaries and setting new standards for video quality in the AI space. Here are some of Lumiere’s features:

  • Image-to-Video: Starting from an image and a prompt, Lumiere can animate images into videos.
  • Stylized Generation: Lumiere can create videos in specific styles using a single reference image.
  • Cinemagraphs: Lumiere can animate specific regions within an image to create dynamic scenes, such as a particular object moving while the rest of the scene remains static.
  • Video Inpainting: It can modify parts of a video, such as changing the attire of people within it or altering background details.

The future seems to be here

The beginning of 2024 has also brought around many AI innovations that feel like something out of a sci-fi movie. Things that previously we would have said were impossible are now being worked on. The future doesn’t feel so far off with the following discoveries.

Disney's HoloTile Floor 

On January 18, 2024, Walt Disney Imagineering unveiled the HoloTile Floor. It has been dubbed the world's first multi-person, omnidirectional treadmill ground. 

Fig 8. Disney Imagineer Lanny Smoot poses on his latest innovation, the HoloTile floor.

It can move under any person or object like telekinesis for an immersive virtual and augmented reality experience. You can walk in any direction, and avoid collisions while on it. Disney’s HoloTile Floor can also be planted on theatrical stages to dance and move in creative ways.

Apple’s Vision Pro

On Feb 2, 2024, Apple’s much-anticipated Vision Pro headset hit the market. It has an array of features and applications designed to redefine the virtual and augmented reality experience. The Vision Pro headset caters to a diverse audience by blending entertainment, productivity, and spatial computing. Apple proudly announced that over 600 apps, ranging from productivity tools to gaming and entertainment services, were optimized for the Vision Pro at its launch.

Cognition’s Devin

On March 12th, 2024, Cognition released a software engineering assistant called Devin. Devin is the world's first attempt at an autonomous AI software engineer. Unlike traditional coding assistants that offer suggestions or complete specific tasks, Devin is designed to handle entire software development projects from initial concept to completion. 

It can learn new technologies, build and deploy full apps, find and fix bugs, train its own models, contribute to open-source and production codebases, and even take on real development jobs from sites like Upwork. 

Fig 9. Comparing Devin with other models.

Devin was evaluated on SWE-bench, a challenging benchmark that asks agents to resolve real-world GitHub issues found in open-source projects like Django and scikit-learn. It correctly resolved 13.86% of the issues end-to-end, compared to the previous state-of-the-art of 1.96%.

Honorable mentions

There’s been so much happening that covering everything in this article isn’t possible. But, here are some more honorable mentions. 

  • NVIDIA's LATTE3D, announced on March 21, 2024, is a text-to-3D AI model that instantly creates 3D representations from text prompts.
  • Midjourney's new text-to-video generator, teased by CEO David Holz, started training in January and is expected to launch soon.
  • Advancing the AI PC revolution, Lenovo released the ThinkBook 13x with E Ink Prism technology and high-performance AI laptops on January 8, 2024.
Facebook logoTwitter logoLinkedIn logoCopy-link symbol

Read more in this category

Let’s build the future
of AI together!

Begin your journey with the future of machine learning