一种语义级文本协同图像识别方法

段喜萍; 刘家锋; 王建华; 唐降龙

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【11-26】《哈尔滨工业大学学报》入选中国科技期刊卓越行动计划领军期刊
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-05】哈工大学报2024优秀审稿专家
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	段喜萍,刘家锋,王建华,唐降龙.一种语义级文本协同图像识别方法[J].哈尔滨工业大学学报,2014,46(3):49.DOI:10.11918/j.issn.0367-6234.2014.03.009
	DUAN Xiping,LIU Jiafeng,WANG Jianhua,TANG Xianglong.A collaborative image recognition method based on semantic level of text[J].Journal of Harbin Institute of Technology,2014,46(3):49.DOI:10.11918/j.issn.0367-6234.2014.03.009

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 2015次下载 1447次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
一种语义级文本协同图像识别方法
段喜萍^1,2,3,刘家锋¹,王建华^2,3,唐降龙¹
(1.哈尔滨工业大学计算机科学与技术学院,150001 哈尔滨; 2. 哈尔滨师范大学计算机科学与信息工程学院, 150025 哈尔滨; 3. 黑龙江省高校智能教育与信息工程重点实验室, 150025 哈尔滨)

摘要:

为解决单纯依赖图像低级视觉模态信息进行图像识别准率低的问题. 考虑到许多图像中存在文本信息,提出了利用图像中的文本信息辅助图像识别的语义级文本协同图像识别方法. 该方法通过文本定位方法定位到图像中的文本块,对其进行分割、二值化、提取特征等处理;然后获取语义,提取图像底层视觉信息,计算两模态的相关性,从而得到协同后验概率; 最后,得到联合后验概率,并取其中最大联合后验概率对图像进行识别. 在自建体育视频帧数据库中,通过与以朴素贝叶斯为代表的单模态方法进行比较,方法在3种不同视觉特征下均具有更高的准确率. 实验结果表明,文本协同方法能够有效辅助图像识别,具有更好的识别性能.

关键词: 文本定位图像识别多模态

DOI：10.11918/j.issn.0367-6234.2014.03.009

分类号:

基金项目:国家自然科学基金资助项目 (7,2).

A collaborative image recognition method based on semantic level of text

DUAN Xiping^1,2,3, LIU Jiafeng¹, WANG Jianhua^2,3, TANG Xianglong¹

(1.School of Computer Science and Technology, Harbin Institute of Technology, 150001 Harbin, China; 2.Computer Science and Information Engineering College, Harbin Normal University, 150025 Harbin, China; 3. Heilongjiang Provincial Key Laboratory of Intelligence Education and Information Engineering, 150025 Harbin, China )

Abstract:

To solve the problem that singular-modal image recognition using only the low-level visual features has low accuracy, considering that many images have embedded-in textual information, a collaborative method using the embedded-in text to aid the recognition of images is proposed. The method includes three steps. Firstly, after localization, segmentation, binarization and feature extraction, semantics of text is gotten. Secondly, the collaborative posterior probability is calculated by extracting visual features of images and counting correlation of visual and textual modals. At last, for each class of images, the joint posterior probability is calculated using the previous two items. A new image is recognized to the class with maximal joint posterior probability. Experiments on the self-built data set of sports video frames showed that the proposed method performed better than the singular-modal method on three different visual features and had higher accuracy.

Key words: text localization image recognition multi-modal

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS