TaoAvatar AI: AR communication redefined by 90 FPS 3D avatars

Alibaba’s latest innovation TaoAvatar sets new standards for photorealistic 3D avatars in real time and finally makes AR communication suitable for everyday use.

The technology combines 3D Gaussian Splatting (3DGS) with an innovative teacher-student network approach to create fully controllable human avatars. These digital representations not only achieve impressive visual quality, but also run at 90 frames per second on mobile devices such as the Apple Vision Pro – a crucial factor for practical use in AR applications. The avatars follow a parametric SMPLX template with consistent topology, allowing precise control over poses, gestures and facial expressions.

Unlike previous technologies, TaoAvatar only requires multi-view camera sequences as input and achieves 2.4 dB better PSNR image quality than comparable systems. At the same time, the technology reduces memory requirements by 70% compared to NeRF-based approaches.

Table of Contents

Technical innovation on several levels

At the heart of the system is a hybrid representation model that combines SMPLX meshes with 3D Gaussian textures. This enables both precise geometric control and convincing dynamic appearances. Particularly noteworthy is the use of a teacher-student framework:

The StyleUnet teacher network captures high-frequency details through position-based deformation maps
The MLP student network is optimized for mobile devices and ensures 90 FPS at 2K resolution

To develop the technology, the research team used the TalkBody4D dataset with 59-camera recordings in 20 FPS and 3K×4K resolution. The integration of Audio2BS technology also enables the synchronization of lip movements, facial expressions and gestures with spoken language.

Areas of application and future prospects

The technology developed by Alibaba researchers opens up a wide range of possible applications:

Life-sizeAR shopping assistants for 3D product demonstrations
Holographic meetings with emotional expressiveness
AI customer service with natural body language

Despite this impressive progress, there are still challenges in modeling extreme facial expressions and the high computational cost of initial template creation (approximately 8 hours per avatar). However, with the planned release of the code and dataset via Hugging Face, the technology should soon find wider application.

Summary:

TaoAvatar creates photorealistic 3D avatars with consistent topology
Real-time rendering at 90 FPS on mobile devices and AR headsets
Hybrid architecture combines 3D Gaussian splatting with parametric models
70% memory savings compared to conventional methods
Applications in e-commerce, AR communication and AI assistance
Integration of audio-to-facial expression synchronization for natural interactions

Source: Taoavatar

Technical innovation on several levels

Areas of application and future prospects

Summary:

Related Posts: