Please submit manuscripts in either of the following two submission systems

    ScholarOne Manuscripts

  • ScholarOne
  • 勤云稿件系统

  • 登录

Search by Issue

  • 2025 Vol.32
  • 2024 Vol.31
  • 2023 Vol.30
  • 2022 Vol.29
  • 2021 Vol.28
  • 2020 Vol.27
  • 2019 Vol.26
  • 2018 Vol.25
  • 2017 Vol.24
  • 2016 vol.23
  • 2015 vol.22
  • 2014 vol.21
  • 2013 vol.20
  • 2012 vol.19
  • 2011 vol.18
  • 2010 vol.17
  • 2009 vol.16
  • No.1
  • No.2

Supervised by Ministry of Industry and Information Technology of The People's Republic of China Sponsored by Harbin Institute of Technology Editor-in-chief Yu Zhou ISSNISSN 1005-9113 CNCN 23-1378/T

期刊网站二维码
微信公众号二维码
Related citation:Haiquan Fang,Dian Yu.The Real-time and High-resolution Interactive Digital Human[J].Journal of Harbin Institute Of Technology(New Series),2025,32(5):41-51.DOI:10.11916/j.issn.1005-9113.24056..
【Print】   【HTML】   【PDF download】   View/Add Comment  Download reader   Close
←Previous|Next→ Back Issue    Advanced Search
This paper has been: browsed 317times   downloaded 722times 本文二维码信息
码上扫一扫!
Shared by: Wechat More
The Real-time and High-resolution Interactive Digital Human
Author NameAffiliation
Haiquan Fang School of Public Administration, Zhejiang University of Technology, Hangzhou 310023,China 
Dian Yu School of Public Administration, Zhejiang University of Technology, Hangzhou 310023,China 
Abstract:
Synthesizing a real-time, high-resolution, and lip-sync digital human is a challenging task. Although the Wav2Lip model represents a remarkable advancement in real-time lip-sync, its clarity is still limited. To address this, we enhanced the Wav2Lip model in this study and trained it on a high-resolution video dataset produced in our laboratory. Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model, while maintaining its real-time performance and accurate lip-sync. We implemented the improved Wav2Lip model in a government interface application, generating a government digital human. Testing revealed that this government digital human can interact seamlessly with users in real-time, delivering clear visuals and synthesized speech that closely resembles a human voice.
Key words:  digital human  lip-sync  high-resolution  video generation  talking face generation
DOI:10.11916/j.issn.1005-9113.24056.
Clc Number:TP39
Fund:
Descriptions in Chinese:
  

实时高清交互数字人

方海泉, 余点

浙江工业大学 公共管理学院 杭州 310023

摘要:合成实时、高清且嘴唇同步的数字人是一项富有挑战性的工作。Wav2Lip模型在实时性和嘴唇同步方面已经到达最先进的水平,但是不够清晰。为生成实时、高清且嘴唇同步的数字人,本文改进了Wav2Lip模型,并在我们制作的高分辨率数据集上训练模型,经过实验发现,改进的Wav2Lip模型相比传统的Wav2Lip模型合成的数字人更加清晰,并且在实时性和嘴唇同步性方面并没有太多降低。进一步,我们把改进的Wav2Lip模型应用于政务,构建了政务数字人,经过测试发现该政务数字人能与人实时互动,视频清晰,合成的语音与真人声音相似,达到了理想的数字人效果。

关键词:数字人; 嘴唇同步; 高分辨率; 视频生成; 说话人脸生成

LINKS