摘要
基于深度监督的学习结构应用于跨模态图文检索领域,弥补了不同数据模式之间的异质性差异,通过端到端的方式同时保持语义鉴别和模态不变性,有效地学习异构数据的共同表示。本文构建了图像和文本双模态CNN神经网络模型,对损失函数进行改进,优化神经网络模型训练学习过程,以监督网络学习跨模态转换函数。在Pascal sentence数据集的基础上,增加了5种不同类别的图文内容,通过训练数据集调整神经网络模型参数,保存最优模型。实验结果表明,改进算法的图文匹配正确率最高达到了98.2%,通过改进损失函数将算法的平均精度值MAP(Mean average precision)提升到了0.716,较传统深度学习ACMR算法的MAP提高了6.2%,证明本文改进的算法有效提高了跨模态图文检索匹配的精度。
The deep supervision based learning structure is applied to the field of cross-modal image and text retrieval to make up for the heterogeneity of different data modes and to learn the common representation of heterogeneous data effectively by maintaining the semantic identification and modal invariance in an end-to-end manner.The image and bimodal CNN neural network model is constructed and the loss function is improved to optimize the neural network model training and learning process.The supervised network is used to learn the cross-modal transfer function.On the basis of the Pascal sentence data set,five different types of graphic content are added,and the neural network model parameters are adjusted through the training data set to save the optimal model.The experimental results show that the correct rate of image and text matching of the improved algorithm is up to 98.2%.The average accuracy of the algorithm MAP(Mean average precision)is increased to 0.716 through the improved loss function,which is 6.2%higher than the MAP of the traditional deep learning ACMR algorithm,so as to prove that the improved algorithm in this paper effectively improves the accuracy of cross-modal image retrieval and matching.
作者
焦隆
徐慧铭
程海
JIAO Long;XU Huiming;CHENG Hai(School of Electronic Engineering,Heilongjiang University,Harbin 150080,China)
出处
《黑龙江大学自然科学学报》
CAS
2021年第2期246-252,共7页
Journal of Natural Science of Heilongjiang University
基金
国家自然科学基金资助项目(61471158)
黑龙江省省属高等学校基本科研业务费基础研究项目(KJCX201904)。