Abstract:With the rapid development of micro-videos, the task of micro-video event detection is receiving more and more attention. Existing micro-video event detection studies commonly use deep neural networks to obtain definitive detection results. But these networks that ignore the effect of uncertainty may lead to false predictions yielding definitive results. To address these problems, in this paper, a micro-video event detection method with multimodal uncertainty network was proposed. Firstly, the proposed method embeds a variational layer in a traditional domain separation network, which was used to obtain predictive distributions. Then the visual modal information and the acoustic modal information was fed into the network, and the independence and correlation losses were constructed to obtain visual-audio shared domain predictive distributions and visual-audio private domain predictive distributions. Finally, an uncertainty discriminant was proposed to filter the prediction distribution of the four domains, so as to get the final prediction results. The experiments were performed on the public dataset(UCF-101 and HMDB51) and the newly constructed micro-video event detection dataset. Experimental results show that the proposed method not only has higher classification accuracy on different datasets but also can estimate the uncertainty of the output results. It also shows robustness against audio interference.