Please submit manuscripts in either of the following two submission systems

    ScholarOne Manuscripts

  • ScholarOne
  • 勤云稿件系统

  • 登录

Search by Issue

  • 2025 Vol.32
  • 2024 Vol.31
  • 2023 Vol.30
  • 2022 Vol.29
  • 2021 Vol.28
  • 2020 Vol.27
  • 2019 Vol.26
  • 2018 Vol.25
  • 2017 Vol.24
  • 2016 vol.23
  • 2015 vol.22
  • 2014 vol.21
  • 2013 vol.20
  • 2012 vol.19
  • 2011 vol.18
  • 2010 vol.17
  • 2009 vol.16
  • No.1
  • No.2

Supervised by Ministry of Industry and Information Technology of The People's Republic of China Sponsored by Harbin Institute of Technology Editor-in-chief Yu Zhou ISSNISSN 1005-9113 CNCN 23-1378/T

期刊网站二维码
微信公众号二维码
Related citation:
【Print】   【HTML】   【PDF download】   View/Add Comment  Download reader   Close
Back Issue    Advanced Search
This paper has been: browsed 69times   downloaded 35times  
Shared by: Wechat More
The Real-time and High-resolution Interactive Digital Human
Author NameAffiliationPostcode
Haiquan Fang* School of Public Administration, Zhejiang University of Technology, Hangzhou 310023,China 310023
Dian Yu School of Public Administration, Zhejiang University of Technology, Hangzhou 310023,China 310023
Abstract:
Synthesizing real-time, high-resolution, and lip-sync digital human is a challenging task. Although the Wav2Lip model represents a remarkable advancement in real-time lip-sync, its clarity is still limited. To address this, we enhanced the Wav2Lip model in this study and trained it on a high-resolution video dataset produced in our laboratory. Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model, while maintaining its real-time performance and accurate lip-sync. We implemented the improved Wav2Lip model in a government interface application, generating a government digital human. Testing revealed that this government digital human can interact seamlessly with users in real-time, delivering clear visuals and synthesized speech that closely resembles a human voice.
Key words:  digital human  lip-sync  video generation  talking face generation
DOI:10.11916/j.issn.1005-9113.24056
Clc Number:TP39
Fund:

LINKS