OpenAI just debuted GPT-4o, a new kind of AI model that you can communicate with in real time via live voice conversation, video streams from your phone, and text. The model is rolling out over the next few weeks and will be free for all users through both the GPT app and the web interface, according to the company. Users who subscribe to OpenAI’s paid tiers, which start at $20 per month, will be able to make more requests. 

OpenAI CTO Mira Murati led the live demonstration of the new release one day before Google is expected to unveil its own AI advancements at its flagship I/O conference on Tuesday, May 14. 

GPT-4 offered similar capabilities, giving users multiple ways to interact with OpenAI’s AI offerings. But it siloed them in separate models, leading to longer response times and presumably higher computing costs. GPT-4o has now merged those capabilities into a single model, which Murati called an “omnimodel.” That means faster responses and smoother transitions between tasks, she said.

The result, the company’s demonstration suggests, is a conversational assistant much in the vein of Siri or Alexa but capable of fielding much more complex prompts.

“We’re looking at the future of interaction between ourselves and the machines,” Murati said of the demo. “We think that GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural.”

Barret Zoph and Mark Chen, both researchers at OpenAI, walked through a number of applications for the new model. Most impressive was its facility with live conversation. You could interrupt the model during its responses, and it would stop, listen, and adjust course. 

OpenAI showed off the ability to change the model’s tone, too. Chen asked the model to read a bedtime story “about robots and love,” quickly jumping in to demand a more dramatic voice. The model got progressively more theatrical until Murati demanded that it pivot quickly to a convincing robot voice (which it excelled at). While there were predictably some short pauses during the conversation while the model reasoned through what to say next, it stood out as a remarkably naturally paced AI conversation. 

The model can reason through visual problems in real time as well. Using his phone, Zoph filmed himself writing an algebra equation (3x + 1 = 4) on a sheet of paper, having GPT-4o follow along. He instructed it not to provide answers, but instead to guide him much as a teacher would.

“The first step is to get all the terms with x on one side,” the model said in a friendly tone. “So, what do you think we should do with that plus one?”

GPT-4o will store records of users’ interactions with it, meaning the model “now has a sense of continuity across all your conversations,” according to Murati. Other highlights include live translation, the ability to search through your conversations with the model, and the power to look up information in real time. 

As is the nature of a live demo, there were hiccups and glitches. GPT-4o’s voice might jump in awkwardly during the conversation. It appeared to comment on one of the presenters’ outfits even though it wasn’t asked to. But it recovered well when the demonstrators told the model it had erred. It seems to be able to respond quickly and helpfully across several mediums that other models have not yet merged as effectively. 

Previously, many of OpenAI’s most powerful features, like reasoning through image and video, were behind a paywall. GPT-4o marks the first time they’ll be opened up to the wider public, though it’s not yet clear how many interactions you’ll be able to have with the model before being charged. OpenAI says paying subscribers will “continue to have up to five times the capacity limits of our free users.” 

Additional reporting by Will Douglas Heaven.

​ Artificial intelligence – MIT Technology Review

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.