深度多模态不确定度的短视频事件检测方法

苏育挺; 王富铕; 井佩光

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【11-26】《哈尔滨工业大学学报》入选中国科技期刊卓越行动计划领军期刊
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-05】哈工大学报2024优秀审稿专家
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	苏育挺,王富铕,井佩光.深度多模态不确定度的短视频事件检测方法[J].哈尔滨工业大学学报,2024,56(5):36.DOI:10.11918/202207110
	SU Yuting,WANG Fuyou,JING Peiguang.Micro-video event detection method with deep multimodal uncertainty[J].Journal of Harbin Institute of Technology,2024,56(5):36.DOI:10.11918/202207110

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 2049次下载 2346次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
深度多模态不确定度的短视频事件检测方法
苏育挺,王富铕,井佩光
(天津大学电气自动化与信息工程学院,天津 300072)

摘要:

随着短视频的快速发展,短视频事件检测任务受到越来越多的关注。现有短视频事件检测研究普遍采用深度神经网络来获得确定的检测结果,但是网络忽略了不确定度的影响从而导致错误的预测结果也会产生过度置信的决策。为了解决上述问题,本文提出了一个深度多模态不确定度网络的短视频事件检测方法。首先,该方法在传统域分离网络中嵌入变分层,用来获得预测分布；然后,将视觉模态信息和音频模态信息输入到网络中,利用该方法所构建的独立性和相关性损失可以获得包含不确定度的音频模态共、私有域预测分布以及视觉模态共、私有域预测分布；最后,提出了一个不确定度判别法则用来筛选4个域的预测分布,从而得到最终的预测结果。在公开数据集(UCF-101与HMDB51)和新构建的短视频事件检测数据集上进行了实验。实验结果表明,面对不同的深度分类方法以及不同的数据集,本文方法不仅有着更高的分类准确率,还可以对输出结果进行不确定度估计,针对音频的干扰也具有较强的鲁棒性。

关键词: 深度神经网络短视频事件检测域分离网络变分层模态不确定度

DOI：10.11918/202207110

分类号:TP183

文献标识码:A

基金项目:国家自然科学基金(61802277)

Micro-video event detection method with deep multimodal uncertainty

SU Yuting,WANG Fuyou,JING Peiguang

(School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China)

Abstract:

With the rapid development of micro-videos, the task of micro-video event detection is receiving more and more attention. Existing micro-video event detection studies commonly use deep neural networks to obtain definitive detection results. But these networks that ignore the effect of uncertainty may lead to false predictions yielding definitive results. To address these problems, in this paper, a micro-video event detection method with multimodal uncertainty network was proposed. Firstly, the proposed method embeds a variational layer in a traditional domain separation network, which was used to obtain predictive distributions. Then the visual modal information and the acoustic modal information was fed into the network, and the independence and correlation losses were constructed to obtain visual-audio shared domain predictive distributions and visual-audio private domain predictive distributions. Finally, an uncertainty discriminant was proposed to filter the prediction distribution of the four domains, so as to get the final prediction results. The experiments were performed on the public dataset(UCF-101 and HMDB51) and the newly constructed micro-video event detection dataset. Experimental results show that the proposed method not only has higher classification accuracy on different datasets but also can estimate the uncertainty of the output results. It also shows robustness against audio interference.

Key words: deep neural network micro-video event detection domain separation network variational layer modal uncertainty

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS