I saw an AI generate images offline in real-time, and I still can't believe it

It was solely just a little greater than a 12 months in the past that I began listening to about Steady Diffusion and Midjourney and the power to create pictures from nothing. Simply string just a few phrases collectively, and a generative AI mannequin sitting on a server transforms these written phrases right into a graphic picture. Magic.

All the things has progressed so quick and so frenetically since then. And instantly, I used to be standing in the midst of MediaTek’s sales space at MWC, an Android cellphone working the Dimensity 9300 chipset and producing AI pictures on the fly.

The mannequin generated and improved the picture with each letter I typed, in real-time.

Each letter and phrase I typed triggered the Steady Diffusion mannequin and altered the picture to suit my description extra precisely. In actual time. Zero lag, zero wait, zero servers. All the things is native and offline. I used to be dumbstruck.

Simply final 12 months, Qualcomm was completely happy to indicate off (at MWC too) a Steady Diffusion mannequin that might generate an AI picture regionally in beneath 15 seconds. We discovered that spectacular then, particularly in comparison with Midjourney’s extra time-consuming and server-demanding era.

However now that I’ve seen real-time era in motion, these 15 seconds seem to be a lagfest. Oh, what a distinction 12 months make!

Now that I’ve seen real-time AI era in motion, the rest looks like a lagfest.

The Dimensity 9300 was constructed from the bottom as much as face up to extra on-device AI capabilities, in order that wasn’t the one demo MediaTek was touting. Nevertheless, the others weren’t as spectacular and as eye-catching: native AI summaries, photograph enlargement, and Magic Eraser-like photograph manipulation. Most of these options have grow to be commonplace now, with Google and Samsung boasting them of their Pixel software program and Galaxy AI swimsuit, respectively.

Robert Triggs / Android Authority

Then there was an area video era mannequin, which creates a picture and animates it as a collection of GIFs to make a video out of it. I attempted it a few instances. It took over 50 seconds and wasn’t at all times correct, so you possibly can think about that it didn’t catch my eye as a lot because the real-time picture mannequin.

MediaTek additionally confirmed off a real-time AI avatar maker that makes use of the digicam to seize dwell footage of an individual and animates it with a number of types. The animation was a second or two behind her actual actions, so it was not so laggy, however the generated picture jogged my memory of the early days of Dall-E. Once more, this was working regionally and offline, which explains these points. It’s nonetheless spectacular tech, after all, however it didn’t really feel “there” in the identical manner because the real-time picture era mannequin.

As you possibly can inform by now, I actually appreciated that first demo. It simply felt just like the tech had lastly arrived. And the truth that you could possibly do it regionally, with out the additional prices of servers and the privateness considerations of sending requests on-line, is what makes this extra sensible to me.

Source link