"OpenAI Announces New Multimodal Desktop GPT with New Voice and Vision Capabilities"

GPT-4o can recognize and respond to screenshots, photos, documents, or charts uploaded to it. The new GPT-4o model can also recognize facial expressions and information written by hand on paper. OpenAI said the improved model and accompanying chatbot can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, "which is similar to human response time in a conversation". . . .

It displayed a better conversational capability, where users can interrupt it and begin new or modified queries, and it is also versed in 50 languages. In one onstage live demonstration, the Voice Mode was able to translate back and forth between Murati speaking Italian and Barret Zoph, OpenAI’s head of post-training, speaking English.


| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Avatar photo

Author: Charles W. Bailey, Jr.

Charles W. Bailey, Jr.