OpenAI could debut a multimodal AI digital assistant soon

OpenAI has been displaying a few of its prospects a brand new multimodal AI mannequin that may each speak to you and acknowledge objects, in line with a brand new report from The Data. Citing unnamed sources who’ve seen it, the outlet says this may very well be a part of what the corporate plans to indicate on Monday.

The brand new mannequin reportedly affords sooner, extra correct interpretation of photos and audio than what its current separate transcription and text-to-speech fashions can do. It could apparently be capable to assist customer support brokers “higher perceive the intonation of callers’ voices or whether or not they’re being sarcastic,” and “theoretically,” the mannequin may help college students with math or translate real-world indicators, writes The Data.

The outlet’s sources say the mannequin can outdo GPT-4 Turbo at “answering some sorts of questions,” however continues to be vulnerable to confidently getting issues improper.

It’s attainable OpenAI can also be readying a brand new built-in ChatGPT means to make telephone calls, in line with Developer Ananay Arora, who posted the above screenshot of call-related code. Arora additionally noticed proof that OpenAI had provisioned servers meant for real-time audio and video communication.

None of this could be GPT-5, if it’s being unveiled subsequent week. CEO Sam Altman has explicitly denied that its upcoming announcement has something to do with the mannequin that’s speculated to be “materially higher” than GPT-4. The Data writes GPT-5 could also be publicly launched by the top of the 12 months.

Source link