引用本文: | 张澜,朱新山,王泽平,薛俊韬.面向图像修复的桥式注意力取证网络[J].哈尔滨工业大学学报,2025,57(4):62.DOI:10.11918/202404005 |
| ZHANG Lan,ZHU Xinshan,WANG Zeping,XUE Juntao.Bridge-type attention forensics network for image inpainting[J].Journal of Harbin Institute of Technology,2025,57(4):62.DOI:10.11918/202404005 |
|
摘要: |
为提升多媒体信息的可靠性,减轻图像伪造事件对于社会造成的负面影响,亟需发展图像修复取证技术,检测并定位图像的篡改区域。本研究提出了一种面向图像修复的桥式注意力取证网络,该网络直接接收篡改后的图像,端到端的输出图像中被篡改的区域,网络采用编码器-解码器架构作为基础框架。首先,编码器选用Swin Transformer和RepVGG两个主干网络以提取多域修复特征。然后,使用桥式注意力模块连接两个主干网络的同级阶段,来增加编码器在局部和全局维度上的建模能力。最后,在编码器和解码器中间搭建了语义对齐融合模块,消除了两个主干网络提取的特征之间的语义不一致,有助于提高网络的取证性能。在不同修复取证数据集上的实验结果表明,所提出的模型与其他主流取证模型相比,能够更准确地对修复区域进行定位。特别是在有挑战性的DeepFillV2数据集和Diffusion数据集上,所提出的BAFNet分别取得了91.37%和82.34%的IoU分数,相比于主流的取证网络MVSS-Net, IoU指标分别提升了8.77%和10.46%。另外,综合多个实验结果,BAFNet在取证性能和模型复杂度之间取得了很好的平衡。 |
关键词: 图像修复取证 深度取证网络 操作痕迹 多域修复特征 桥式注意力 语义对齐融合 |
DOI:10.11918/202404005 |
分类号:TN911.73 |
文献标识码:A |
基金项目:国家自然科学基金(2,3) |
|
Bridge-type attention forensics network for image inpainting |
ZHANG Lan,ZHU Xinshan,WANG Zeping,XUE Juntao
|
(School of Electrical and Information Engineering, Tianjin University, Tianjin 300072,China)
|
Abstract: |
To enhance the reliability of multimedia information and mitigate the negative impact of image forgery events on society, there is an urgent need to develop image inpainting forensics to detect and locate tampered regions of images. This paper proposes a bridge-type attention forensics network (BAFNet) for image inpainting. The network receives tampered images directly and outputs the tampered regions end-to-end. The network adopts an encoder-decoder architecture as the basic framework. Firstly, the encoder selects two backbones, Swin Transformer and RepVGG, to extract multi-domain inpainting features. Then, a bridge-type attention module is used to connect the same-level stages of the two backbones, enhancing the encoder’s modeling capability in both local and global dimensions. Finally, a semantic alignment fusion module is built between the encoder and the decoder to eliminate semantic inconsistencies between the features extracted by the two backbones, thereby improving the forensic performance of the network. Experimental results on different inpainting forensic datasets demonstrate that the proposed model, compared with other mainstream forensic models, can more accurately locate the inpainting areas. In particular, on the challenging DeepFillV2 dataset and Diffusion dataset, the proposed BAFNet achieves IoU scores of 91.37% and 82.34%, respectively, which improves the IoU metrics by 8.77% and 10.46% compared to the mainstream forensic network MVSS-Net. In addition, combining the results of several experiments, BAFNet achieves a good balance between forensic performance and model complexity. |
Key words: image inpainting forensics deep forensic network manipulation traces multi-domain inpainting features bridge-type attention semantic alignment fusion |