AI stable broadcast image creates VR dream worlds
Image: Scottie Fox via Twitter
A developer previews the future of virtual reality with generative AI using Stable Diffusion.
Generative AI systems for text, image, audio, video and 3D have made huge strides recently. They have the potential to change work processes, or are already doing so by enabling humans to create sophisticated audio-visual media – or just better text.
Generative AI is also driving the proliferation of 3D content, much like what smartphone cameras have done for photography. Well-known Silicon Valley venture capital firm Sequoia Capital believes that current generative AI systems are at the forefront of a computing revolution.
A developer is now demonstrating the potential of generative AI using a VR world designed by open-source image AI Stable Diffusion.
Stable streaming for virtual reality
The developer combines Stable Diffusion with the Touchdesigner programming language and calls its result an “immersive real-time latent space”. He considers the following video as proof of the technology’s future potential and announces further improvements. According to the developer, you can move freely in the demonstrated Stable Diffusion VR world.
According to the developer, the fact that the objects in the video change permanently when you watch them longer is a side effect of the current stable streaming implementation: the image AI assumes it could have done better draw an object if you had looked at it. longer and generates a new variant.
Great technical effort – with prospects for rapid improvements
Outraged Steady broadcastthe developer uses a second AI system: Intel MIDAS is responsible for the 3D representation of the environment. The MIDAS model makes it possible to calculate the 3D depth from a single image, on which the Stable Diffusion generations are then projected.
The demo runs in real time, but requires enormous computing power: according to the developer, it consumes 40 credits per hour at Google Colab on an Nvidia A100. The demo was created on an Nvidia 2080 Ti with 11 GB.
The Midas model operates continuously per frame, stable streaming at a predefined rate. To further reduce the computational load, the system also only renders the image in the field of view instead of the full 360 degree environment. In the demo, the same image is rendered per eye, so stereoscopic 3D is not yet supported, but this will “definitely be improved” according to the developer.
“Stable Diffusion’s speed is skyrocketing these days, but we still need better optimizations,” the developer writes. He can’t say when the demo or something similar might come out as a test version. Currently, the code is spread across two neural networks and three different hardware setups, and putting it together would take more effort than it could on its own.
Carmack’s Vision: Automated VR Worlds for Every Video
Meanwhile, star developer and former Oculus CTO John Carmack is speaking out on Twitter. As a virtual reality enthusiast, he is now doing AI, so he knows both technologies. His dream is to automatically generate photogrammetric 3D worlds “built from every film and video ever shot,” Carmack writes.
There are still many technical challenges to solve, especially with geometry, such as merging different camera positions, he says. But according to Carmack, “it feels like we’re on the cutting edge of neural models solving everything, including overprinting.”
His vision is a generative AI system that creates 3D worlds based on any given video. “I’m sure there are experiments with this already, but if it gets out of the lab like Stable Diffusion did, it will be fantastic,” Carmack writes.