期刊文献+

深度学习跨模态图文检索研究综述 被引量:17

Survey of Research on Deep Learning Image-Text Cross-Modal Retrieval
下载PDF
导出
摘要 随着深度神经网络的兴起,多模态学习受到广泛关注。跨模态检索是多模态学习的重要分支,其目的在于挖掘不同模态样本之间的关系,即通过一种模态样本来检索具有近似语义的另一种模态样本。近年来,跨模态检索逐渐成为国内外学术界研究的前沿和热点,是信息检索领域未来发展的重要方向。首先,聚焦于深度学习跨模态图文检索研究的最新进展,对基于实值表示学习和基于二进制表示学习方法的发展动态进行了详细介绍,其中,基于实值表示的方法用于提升跨模态语义相关性,进而提高跨模态检索准确度,基于二进制表示学习的方法用于提升跨模态图文检索效率,减小存储空间;其次,总结了跨模态检索领域常用的公开数据集,对比了不同算法在不同数据集上的性能表现;此外,总结并分析了跨模态图文检索技术在公安、传媒及医学等领域的具体应用情况;最后,结合现有技术探讨了该领域的发展趋势及未来研究方向。 As the rapid development of deep neural networks,multi-modal learning techniques are widely concerned.Cross-modal retrieval is an important branch of multimodal learning.Its fundamental purpose is to reveal the relation between different modal samples by retrieving modal samples with identical semantics.In recent years,cross-modal retrieval has gradually become the forefront and hot spot of academic research.It’s an important direction in the future development of information retrieval.This paper focuses on the latest development of cross-modal retrieval based on deep learning,reviews the development trends of real value representation-based and binary representationbased learning methods systematically.Among them,the real value representation-based method is adopted to improve the semantic relevance,and improve the accuracy,and the binary representation-based learning method is used to improve the efficiency of image-text cross-modal retrieval and reduce storage space.In addition,the common open datasets in the field of image-text cross-modal retrieval are summarized,and the performance of various algorithms on different datasets is compared.Especially,this paper summarizes and analyzes the specified implementations of cross-modal retrieval techniques in the fields of public security,media and medicine.Finally,combined with the state-of-the-art technologies,development trends and future research directions are discussed.
作者 刘颖 郭莹莹 房杰 范九伦 郝羽 刘继明 LIU Ying;GUO Yingying;FANG Jie;FAN Jiulun;HAO Yu;LIU Jiming(Center for Image and Information Processing,Xi'an University of Posts and Telecommunications,Xi'an 710121,China;International Joint Research Center for Wireless Communication and Information Processing Technology of Shaanxi Province,Xi'an 710121,China;Key Laboratory of Electronic Information Application Technology for Crime Scene Investigation,Ministry of Public Security,Xi'an University of Posts and Telecommunications,Xi'an 710121,China;School of Communications and Information Engineering,Xi'an University of Posts and Telecommunications,Xi'an 710121,China)
出处 《计算机科学与探索》 CSCD 北大核心 2022年第3期489-511,共23页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金(62071378)。
关键词 跨模态检索 深度学习 特征学习 图文匹配 实值表示 二进制表示 cross-modal retrieval deep learning feature learning image-text matching real value representation binary representation
  • 相关文献

参考文献5

二级参考文献30

共引文献50

同被引文献78

引证文献17

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部