OpenAI Unveils GPT-4o, the Most Advanced AI Language Model Yet

OpenAI Unveils GPT-4o, the Most Advanced AI Language Model Yet

New Model Handles Text, Audio, and Images, Enabling Natural Human-Computer Interaction

In recent days, rumors circulated that OpenAI would announce on Monday the release of an AI search engine to rival Google. Instead, the company unveiled a new version of ChatGPT, which its creators claim is not only more advanced than previous versions but the most sophisticated generative language model ever developed.

GPT-4o accepts any combination of text, audio, and images as input and can generate output in all three formats. The program can communicate vocally with users and, according to OpenAI, allows users to interrupt mid-speech, responding almost as quickly as a human during conversations. It is also capable of recognizing emotions, marking a significant step forward in natural human-computer interaction.

New Capabilities

During the presentation, OpenAI demonstrated GPT-4o translating live between English and Italian, assisting a researcher in solving a linear equation on paper in real-time, and providing breathing guidance simply by listening to the user's breath.

The "o" in GPT-4o stands for "omni," referencing the model's multimodal capabilities. OpenAI stated that GPT-4o was trained on text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. This differs from the company's previous models, GPT-3.5 and GPT-4, which allowed users to ask questions by speaking but then transcribed the speech to text. This process removed tone and emotion and made interactions slower.

OpenAI is making the new model available to everyone, including free ChatGPT users, over the next few weeks. Additionally, a desktop version of ChatGPT is being released, initially for Mac, with paid users gaining access starting today.

Loader