Intel Labs apresenta modelo de difusão latente para imagens 3D

Intel Labs Introduces Latent Diffusion Model for 3D Imaging

Intel Labs, in collaboration with Blockade Labs, introduced the Latent Diffusion Model for 3D (LDM3D), a new diffusion model that uses generative AI to create realistic 3D visual content.

Watch a video of LDM3D in operation here.

LDM3D is the industry's first model to generate a depth map using the diffusion process to create 3D images with 360-degree views that are vivid and immersive. It has the potential to revolutionize content creation, metaverse applications and digital experiences, transforming a wide range of industries, from entertainment and gaming to architecture and design.

“Generative AI technology aims to further enhance and enhance human creativity and save time,” explained Vasudev Lal, AI/ML research scientist at Intel Labs. “However, most current generative AI models are limited to generating 2D images, and only very few can generate 3D images from text prompts. Unlike existing stable latent diffusion models, LDM3D allows users to generate an image and a depth map from a given text prompt using almost the same number of parameters. It provides more accurate relative depth for each pixel in an image compared to standard post-processing methods for depth estimation and saves developers significant time developing scenes.”

Why does this matter
Closed ecosystems limit scale. And Intel's commitment to the true democratization of AI will enable broader access to the benefits of AI through an open ecosystem. One area that has seen significant advances in recent years is the field of computer vision, particularly generative AI.

However, many of today's advanced generative AI models are limited to generating only 2D images. Unlike existing diffusion models, which typically only generate 2D RGB images from text prompts, LDM3D allows users to generate an image and depth map from a given text prompt. Using almost the same number of parameters as stable latent diffusion, LDM3D provides more accurate relative depth for each pixel in an image compared to standard post-processing methods for depth estimation.

This research could revolutionize the way we interact with digital content, allowing users to experience their text prompts in previously inconceivable ways. The images and depth maps generated by LDM3D allow users to transform the textual description of a serene tropical beach, a modern skyscraper, or a sci-fi universe into a detailed 360-degree panorama. This ability to capture deep information can instantly improve realism and overall immersion, enabling innovative applications for industries ranging from entertainment and gaming to interior design and real estate listings, as well as virtual museums and immersive virtual reality (VR) experiences.

How it works
LDM3D was trained on a dataset constructed from a subset of 10,000 samples from the LAION-400M database, which contains over 400 million image caption pairs. The team used the Dense Prediction Transformer (DPT) deep estimation model (previously developed at Intel Labs) to annotate the training corpus. The large DPT model provides highly accurate relative depth for each pixel in an image.

The LAION-400M dataset was built for research purposes to enable larger scale test model training for general researchers and other interested communities.

The LDM3D model is trained on an Intel AI supercomputer equipped with Intel Xeon processors and Intel Habana Gaudi AI accelerators. The resulting model and pipeline combine generated RGB image and depth map to generate 360-degree views for immersive experiences.

To demonstrate the potential of LDM3D, researchers at Intel and Blockade developed DepthFusion, an application that leverages standard 2D RGB photos and depth maps to create immersive, interactive 360-degree viewing experiences. DepthFusion uses TouchDesigner, a node-based visual programming language for real-time interactive multimedia content, to transform text prompts into interactive, immersive digital experiences.

The LDM3D model is a unique model for creating an RGB image and its depth map, leading to savings in memory consumption and latency improvements.

Conteúdo Relacionado

A network of sensors is embedded in every vehicle,...
The motor controller is one of the most important...
ESP32-CAM is a compact camera module that combines the...
Arctic Semiconductor has unveiled SilverWings. Representing an industry breakthrough,...
A evolução dos padrões USB foi fundamental para moldar...
A SCHURTER anuncia um aprimoramento para sua conhecida série...
A Sealevel Systems anuncia o lançamento da Interface Serial...
A STMicroelectronics introduziu Diodos retificadores Schottky de trincheira de...
Determinar uma localização precisa é necessário em várias indústrias...
O novo VIPerGaN50 da STMicroelectronics simplifica a construção de...
A GenAI está transformando a força de trabalho com...
Entenda o papel fundamental dos testes unitários na validação...
Aprenda como os testes de carga garantem que seu...
Aprofunde-se nas funções complementares dos testes positivos e negativos...
Vídeos deep fake ao vivo cada vez mais sofisticados...
Back to blog

Leave a comment

Please note, comments need to be approved before they are published.