|
Abstract: |
Synthesizing real-time, high-resolution, and lip-sync digital human is a challenging task. Although the Wav2Lip model represents a remarkable advancement in real-time lip-sync, its clarity is still limited. To address this, we enhanced the Wav2Lip model in this study and trained it on a high-resolution video dataset produced in our laboratory. Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model, while maintaining its real-time performance and accurate lip-sync. We implemented the improved Wav2Lip model in a government interface application, generating a government digital human. Testing revealed that this government digital human can interact seamlessly with users in real-time, delivering clear visuals and synthesized speech that closely resembles a human voice. |
Key words: digital human lip-sync video generation talking face generation |
DOI:10.11916/j.issn.1005-9113.24056 |
Clc Number:TP39 |
Fund: |