一种整合语义对象特征的视觉注意力模型

李娜; 赵歆波

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【11-26】《哈尔滨工业大学学报》入选中国科技期刊卓越行动计划领军期刊
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-05】哈工大学报2024优秀审稿专家
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	李娜,赵歆波.一种整合语义对象特征的视觉注意力模型[J].哈尔滨工业大学学报,2020,52(5):99.DOI:10.11918/201905181
	LI Na,ZHAO Xinbo.Incorporating semantic object features into a visual attention model[J].Journal of Harbin Institute of Technology,2020,52(5):99.DOI:10.11918/201905181

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 2401次下载 1161次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
一种整合语义对象特征的视觉注意力模型
李娜^1,2,赵歆波^1,2
(1.西北工业大学计算机学院,西安 710129; 2.陕西省语音与图像信息处理重点实验室,西安 710129)

摘要:

视觉注意力建模作为预测人类在观察场景时注意力分布的关键技术,在计算机视觉的众多领域均有广泛应用.传统的视觉注意力模型着重研究人眼注视点,计算出的显著图更多的是反映眼动信息,并未将大脑的感知出的语义信息反映出来.针对这一问题,本文提出了一种整合了语义对象特征的视觉注意力模型.首先,本文建立了眼动跟踪数据库VOC2012-E,研究并记录普通人在观察自然场景时的眼动数据.然后,受语义分割启发,利用全卷积神经网络(Fully Convolutional Networks, FCN)提取语义对象特征,同时用激活函数PReLu和优化函数Adam改进FCN网络使其更有效地提取的语义对象特征,来模仿大脑对语义对象特征的感知.接着,提取在人类潜意识层吸引人注意力的如方向,颜色,强度特征等28个低级特征.最后利用支持向量机(Support Vector Machine, SVM)将之前提取的语义对象特征及低级特征映射到人类视觉空间,同时引入真实眼动数据进行有监督的训练,得到可以预测人眼视觉显著图的视觉注意力模型.实验结果表明,在VOC2012-E及MIT300数据库上与其他8种经典模型及4种先进模型相比,本文提出的视觉注意力模型性能更好,更有生物学优势.

关键词: 视觉注意力模型语义对象特征 FCN SVM 深度学习

DOI：10.11918/201905181

分类号:TP391

文献标识码:A

基金项目:国家自然科学基金(6,6)

Incorporating semantic object features into a visual attention model

LI Na^1,2,ZHAO Xinbo^1,2

(1.School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China; 2.Shaanxi Provincial Key Laboratory of Speech and Image Information Process, Xi’an 710129, China)

Abstract:

Visual attention modeling is a key technique for predicting the distribution of human attention when people are observing scenes, which is widely used in the fields of computer vision. Traditional visual attention models focus on the human eyes fixation points to reflect the eye movement information by calculating saliency maps, while they cannot reflect the perceived semantic information of the brain. To solve this problem, a visual attention model was proposed based on extracting semantic features. First of all, the eye tracking database VOC2012-E was established to study and record the eye movement data of human while observing natural scenes. Then, inspired by image semantic segmentation, the Fully Convolutional Networks(FCN) was used to extract the semantic object features. In order to extract the semantic object features more effectively, the FCN8s network was improved by activation function PReLu and optimization function Adam to mimic the brain’s perception of semantic object features. Next, 28 low-level features such as direction, color, and intensity characteristics were extracted, which attract attention in the human subconscious layer. Finally, Support Vector Machine(SVM) was used to map the previously extracted semantic object features and the low-level features into the human visual space. The real eye movement data was introduced for supervised training, and a visual attention model was obtained which can predict the human visual saliency map. Experimental results showed that the visual attention model proposed in this paper had better performance and biological advantages over the other eight classical models and four advanced models on the VOC2012-E and MIT300 databases.

Key words: visual attention model semantic object features FCN SVM deep learning

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS