|
| Abstract: |
| Synthesizing a real-time, high-resolution, and lip-sync digital human is a challenging task. Although the Wav2Lip model represents a remarkable advancement in real-time lip-sync, its clarity is still limited. To address this, we enhanced the Wav2Lip model in this study and trained it on a high-resolution video dataset produced in our laboratory. Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model, while maintaining its real-time performance and accurate lip-sync. We implemented the improved Wav2Lip model in a government interface application, generating a government digital human. Testing revealed that this government digital human can interact seamlessly with users in real-time, delivering clear visuals and synthesized speech that closely resembles a human voice. |
| Key words: digital human lip-sync high-resolution video generation talking face generation |
| DOI:10.11916/j.issn.1005-9113.24056. |
| Clc Number:TP39 |
| Fund: |
|
| Descriptions in Chinese: |
| 实时高清交互数字人 方海泉, 余点 浙江工业大学 公共管理学院 杭州 310023 摘要:合成实时、高清且嘴唇同步的数字人是一项富有挑战性的工作。Wav2Lip模型在实时性和嘴唇同步方面已经到达最先进的水平,但是不够清晰。为生成实时、高清且嘴唇同步的数字人,本文改进了Wav2Lip模型,并在我们制作的高分辨率数据集上训练模型,经过实验发现,改进的Wav2Lip模型相比传统的Wav2Lip模型合成的数字人更加清晰,并且在实时性和嘴唇同步性方面并没有太多降低。进一步,我们把改进的Wav2Lip模型应用于政务,构建了政务数字人,经过测试发现该政务数字人能与人实时互动,视频清晰,合成的语音与真人声音相似,达到了理想的数字人效果。 关键词:数字人; 嘴唇同步; 高分辨率; 视频生成; 说话人脸生成 |