What does the Digital Humans topic page cover?

It groups WindFlash AI Daily items related to Digital Humans, including summaries, sources, and related report links.

How often is the Digital Humans topic page updated?

The page updates when new daily report items are published with the same topic tag.

AI Topic: Digital Humans

JoVA: A Joint Self-Attention Framework for Synchronized Video-Audio Generation

Important

We highlight a collaborative breakthrough from HKU and ByteDance introducing JoVA, a streamlined framework designed for high-fidelity joint video and audio generation. By utilizing a single joint self-attention layer for cross-modal interaction, we eliminate the need for complex external fusion modules found in traditional cascaded or end-to-end models. Our analysis shows that JoVA addresses the critical challenge of lip-syncing through a novel Mouth-Aware Supervision strategy, which applies weighted flow matching losses to precisely mapped mouth regions in latent space. Utilizing a diverse dataset of approximately 1.9 million samples, the model achieves a state-of-the-art LSE-C score of 6.64, outperforming existing solutions in both temporal alignment and audio-visual consistency. This research provides developers with a more efficient architectural blueprint for multimodal diffusion models, simplifying the path toward realistic digital human synthesis.

机器之心Dec 30, 07:17 AM

Topic: Digital Humans

What this topic covers

Why it matters

How to use it

December 30, 2025

JoVA: A Joint Self-Attention Framework for Synchronized Video-Audio Generation

FAQ

Where do these items come from?

Will this hub update?