引用本文: | 张力,陈康,孙光辉.实值无标签图文跨模态检索研究综述[J].哈尔滨工业大学学报,2024,56(9):1.DOI:10.11918/202404027 |
| ZHANG Li,CHEN Kang,SUN Guanghui.Review of unlabeled image-text cross-modal retrieval based on real-valued features[J].Journal of Harbin Institute of Technology,2024,56(9):1.DOI:10.11918/202404027 |
|
摘要: |
为研究面向无标签数据集基于实值特征的图像文本跨模态检索(以下简称跨模态检索)方法的发展现状和亟待解决的关键问题,对目前该领域的文献进行了分析与总结。跨模态检索是根据给定的一种模态查询,从另一种模态中检索出与查询相关的样本。首先,引入基于时间复杂度分类法,将现有跨模态检索方法分为基于特征方法和基于分数方法;其次,分别对以上两类方法的研究现状进行叙述,并针对两类方法现阶段存在的主要问题进行分析和讨论;然后,引入跨模态检索的两个主流数据集和常用评价指标,分别对两类方法在公开数据集上的性能进行比较与分析;最后,总结了跨模态检索领域亟待解决的关键问题。研究表明,现有跨模态检索方法尽管已经取得了显著进展,但仍有一些关键问题亟待解决,这些关键问题是未来跨模态检索领域的重要发展方向。 |
关键词: 图像文本跨模态检索 多模态学习 实值特征 基于特征方法 基于分数方法 |
DOI:10.11918/202404027 |
分类号:TP391.4 |
文献标识码:A |
基金项目:国家重点研发计划(2020AAA0106502);国家自然科学基金(62073105);机器人技术与系统国家重点实验室开放研究项目(SKLRS-2019-KF-14,SKLRS-202003D) |
|
Review of unlabeled image-text cross-modal retrieval based on real-valued features |
ZHANG Li1,CHEN Kang1,SUN Guanghui2
|
(1.Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China; 2.School of Astronautics, Harbin Institute of Technology, Harbin 150001, China)
|
Abstract: |
In order to investigate the current development status and key issues in the field of cross-modal retrieval based on real-valued features for unlabeled datasets (hereinafter referred to as cross-modal retrieval), this paper conducts an analysis and summary of the existing literatures. Cross-modal retrieval refers to the retrieval of samples from one modality that are relevant to a given query from another modality. Firstly, using a time complexity-based classification approach, existing cross-modal retrieval methods are categorized into feature-based methods and score-based methods. Secondly, the research status of these two categories of methods is described, and the main issues in the current stage for each category are analyzed and discussed. Furthermore, two mainstream datasets and commonly used evaluation metrics for cross-modal retrieval are introduced, and the performance of the two categories of methods on public datasets is compared and analyzed. Finally, key issues to be addressed in the field of cross-modal retrieval are summarized. The research indicates that although significant progress has been made in existing cross-modal retrieval methods, there are still key issues that urgently need to be addressed. These key issues represent important directions for future development in the field of cross-modal retrieval. |
Key words: image-text cross-modal retrieval multimodal learning real-valued feature feature-based method score-based method |