Grüner Scheck
Link in die Zwischenablage kopiert

The Latest OpenAI Updates: Canvas, Vision Fine-Tuning, and More

Join us as we take a closer look at the recent ChatGPT updates released by OpenAI. We'll explore Canvas, fine-tuning for vision capabilities, and the latest Search feature.

After we last looked at OpenAI's o1 models in September (which were designed to improve reasoning), many new and exciting features have been added to ChatGPT. Some of these releases are geared toward developers, and others are designed to refine user experience. Overall, each upgrade helps make interactions with ChatGPT more intuitive and effective.

Updates like Canvas, designed for collaborative writing and coding, and fine-tuning for vision capabilities that improves how ChatGPT works with images, have sparked a lot of interest, encouraging users to explore more creative possibilities. Meanwhile, technical upgrades, like new APIs and fairness test reports, address aspects like model integration, and ethical AI practices. Let’s dive in and get a better understanding of the latest ChatGPT features from OpenAI!

An Overview of OpenAI’s Canvas Feature

Canvas is the first major update to ChatGPT’s user interface (UI) since its release. It is a new interface with a two-screen layout, prompts on the left sidebar, and responses in the right side window. The new UI eliminates the usual workflow of a chat-like single-screen structure and moves to a two-screen layout that suits multitasking purposes to boost productivity.

Fig 1. Canvas Brings UI Updates to ChatGPT.

Before Canvas was introduced, working with long-form documents on ChatGPT meant having to scroll up and down quite a bit. In the new layout, prompts are displayed on the left sidebar, and the text document or code snippet occupies the majority of the screen. If needed, you can even customize the size of the left sidebar and the output screen. Also, you can select a portion of the text or a section of code and edit the specific section without altering the entire document.

Fig 2. Edit Specific Sections of Text Using Canvas.

If you use Canvas, you’ll notice there's no specific button or toggle to open it on the ChatGPT interface. Instead, when you're working with the GPT-4o model, Canvas opens automatically if it detects that you're editing, writing, or coding. For simpler prompts, it stays inactive. If you want to open it manually, you can use prompts like "Open the Canvas" or "Get me the Canvas layout."

Currently, Canvas is in beta and available only with GPT-4o. However, OpenAI has mentioned that Canvas will be available for all free users when it is out of beta.

ChatGPT’s API Updates

OpenAI has released three new ChatGPT API updates aimed at improving efficiency, scalability, and versatility. Let’s take a closer look at each of these updates.

Model Distillation

Using the Model Distillation feature through the OpenAI APIs, developers can use the outputs of advanced models like GPT-4o or o1-preview to enhance the performance of smaller, cost-efficient models like GPT-4o mini. Model distillation is a process that involves training smaller models to mimic the behavior of more advanced ones, making them more efficient for specific tasks.

Before this feature was introduced, developers had to manually coordinate a variety of tasks using different tools. These tasks included generating datasets, measuring model performance, and fine-tuning models, which often made the process complex and error-prone. The Model Distillation update lets developers use Stored Completions, a tool that lets them automatically generate datasets by capturing and storing the input-output pairs produced by advanced models through the API.

Another feature of Model Distillation, Evals (currently in beta), helps measure how well a model performs on specific tasks, without needing to create custom evaluation scripts or using separate tools. Using datasets generated with Stored Completions and evaluating performance with Evals, developers can fine-tune their own custom GPT models.

Fig 3. You can use Evals to measure model performance.

Prompt Caching

Oftentimes when building AI applications, especially chatbots, the same context (the background information or previous conversation history needed to understand the current request) will be used repeatedly for multiple API calls. Prompt Caching makes it possible for developers to reuse recently used input tokens (segments of text that the model processes to understand the prompt and generate a response), helping to reduce cost and latency.

From October 1st, OpenAI has automatically applied Prompt Caching to its models like GPT-4o, GPT-4o mini, o1-preview, and o1-mini. This means that when developers use the API to interact with a model with a long prompt (over 1,024 tokens), the system saves the parts it has already processed. 

This way, if the same or similar prompts are used again, it can skip recalculating those parts. The system automatically caches the longest part of the prompt it has previously encountered, starting with 1,024 tokens and adding in chunks of 128 tokens as the prompt gets longer.

Realtime API

Creating a voice assistant generally involves needing to transcribe audio to text, process the text, and then convert it back to audio to play the response. OpenAI’s Realtime API aims to handle this entire process with a single API request. By making the process simpler, the API enables real-time conversations with AI. 

For example, a voice assistant integrated with the Realtime API can perform specific actions, like placing an order or finding information, based on user requests. The API makes the voice assistant more responsive and able to adapt quickly to users' needs. The Realtime API became available through public beta on October 1st, with six voices. On October 30th, five more voices were added, making a total of eleven available voices.

Fig 4. An example of using the Realtime API for practicing conversations in a new language.

Fine-tuning ChatGPT for Vision Tasks

Originally, the GPT-4o vision language model could only be fine-tuned and customized using text-only datasets. Now, with the release of the vision fine-tuning API, developers can train and customize GPT-4o using image datasets. Since its release, vision fine-tuning has become a major topic of interest among developers and computer vision engineers.

To fine-tune GPT-4o’s vision capabilities, developers can use image datasets that range from as few as 100 images to as many as 50,000 images. After ensuring the dataset matches the format required by OpenAI, it can be uploaded to the Openai platform, and the model can be finetuned for specific applications. 

For instance, Automat, an automation company, used a dataset of screenshots to train GPT-4o to be able to identify UI elements on a screen based on a description. This helps streamline Robotic Process Automation (RPA) by making it easier for bots to interact with user interfaces. Instead of relying on fixed coordinates or complex selector rules, the model can identify UI elements based on simple descriptions, making automation setups more adaptable and easier to maintain when interfaces change.

Fig 5. Using a fine-tuned version of GPT-4o model to detect UI elements.

ChatGPT Fairness and Bias Detection

Ethical concerns surrounding AI applications are a prominent topic of conversation as AI becomes more and more advanced. Because ChatGPT’s responses are based on user-provided prompts and data available on the Internet, it can be challenging to fine-tune its language to be responsible all the time. Reports state that ChatGPT’s answers are biased on name, gender, and race. To address this issue, OpenAI’s in-house team conducted a first-person fairness test.

Names often carry subtle cues about our culture and geographical factors. In most cases, ChatGPT will ignore the subtle cues in the names. However, in some cases, names that reflect race or culture lead to different responses from ChatGPT, with about 1% of these reflecting harmful language. Eliminating biases and harmful language is a challenging task for a language model. However, by sharing these findings publicly and acknowledging the model’s limitations, OpenAI helps users refine their prompts to achieve more neutral, unbiased answers. 

Fig 6. An example of differing responses due to the user’s name.

Understanding ChatGPT Search

When ChatGPT was first launched, there was discussions in the AI community about whether it could replace traditional web browsing. Now, many users are using ChatGPT instead of Google Search

OpenAI’s new update, the Search feature, takes this a step further. With Search, ChatGPT generates up-to-date responses and includes links to relevant sources. As of October 31st, the Search feature is available to all ChatGPT Plus and Team users, making ChatGPT function more like an AI-powered search engine.

Fig 7. An example of using ChatGPT’s new Search feature.

The Road Ahead

ChatGPT's recent updates focus on making AI more useful, flexible, and fair. The new Canvas feature helps users work more efficiently, while vision fine-tuning allows developers to customize models to better handle visual tasks. Addressing fairness and reducing bias are also key priorities, ensuring AI works well for everyone, regardless of who they are. Whether you are a developer fine-tuning models or just using the latest features, ChatGPT is evolving to meet a wide range of needs. With real-time capabilities, visual integration, and a focus on responsible use, these updates building a more trustworthy and reliable AI experience for everyone.

Explore more about AI by visiting our GitHub repository and joining our community. Learn more about AI applications in self-driving and healthcare.

Facebook-LogoTwitter-LogoLinkedIn-LogoKopier-Link-Symbol

Lies mehr in dieser Kategorie

Lass uns gemeinsam die Zukunft
der KI gestalten!

Beginne deine Reise in die Zukunft des maschinellen Lernens