摘要
近年来,随着信息技术的发展,图像、文本、视频、音频等多媒体数据呈现出快速增长的趋势。当处理大量数据时,某些传统检索方法的效率可能会受到影响,并且无法在可接受的时间内获得令人满意的准确性。此外,海量的数据还导致了巨大的存储消耗问题。为了解决上述问题,哈希学习被提出。现有的哈希学习方法首先为数据生成二进制哈希码,并且在学习中让原本相似的数据有相似的哈希码,让不相似的数据有不同的哈希码。然后,在学到的哈希码空间中,通过异或操作进行快速的相似性比较。通过用二进制哈希码代替数据原始的高维特征,可以达到显著降低存储成本的目的。基于哈希学习高效索引和快速查询的特点,其在跨模态检索领域受到了广泛的关注。但是目前的跨模态哈希方法面临着以下几个问题:(1)大多数方法都尝试保持样本间的成对相似性,而忽视了样本间的相对相似性,即样本的排序信息,但排序信息对检索有很重要的作用,因而导致这些方法效果并非最优;(2)许多基于成对相似性的哈希检索方法的时间复杂度为O(n2),无法直接扩展到大规模数据集上,具有一定的局限性;(3)为了简化离散求解问题,目前很多方法采用松弛策略来学习哈希码的近似解,但这种策略会引入较大的量化误差。为了解决以上问题,我们提出了一种基于排序的监督离散跨模态哈希方法(简称为RSDCH)。该方法由排序信息学习和哈希学习两步骤组成。在排序信息学习阶段,我们通过嵌入数据的流形结构和语义标签来学习一个具有排序信息的得分矩阵。在哈希学习阶段,我们通过保持学到的排序信息来生成训练样本的哈希码并学出对应的哈希函数。为了让模型能够更好地扩展到大规模数据集,我们使用了锚点采样策略,以获得可接受的且与训练样本数成线性关系的时间复杂度。为了学到高质量的哈希码表示,我们设计了两种有效的相似性保持策略。除此之外,为了避免松弛求解策略引入的量化误差,我们设计了一种交替迭代的优化算法来离散地学习哈希码。我们在MIRFlickr-25K及NUS-WIDE这两种广泛使用的多标签数据集上进行了对比实验。结果表明,本文提出的方法在平均精确率均值(MAP)、归一化折损累计增益(NDCG)、精确率-召回率曲线(Precision-Recall Curve)等方面均优于现有的几种跨模态哈希方法。通过消融实验,我们验证了RSDCH模型中各个模块的必要性和有效性。此外,我们还通过额外的实验测试了模型的收敛性、参数敏感性和训练效率,进一步验证了RSDCH模型的有效性。
In recent years,with the development of information technology,the explosion of multimedia data such as images,texts,videos,audios,has occurred.When dealing with a huge amount of data,the efficiency of some traditional retrieval methods may be affected and cannot obtain satisfactory accuracy within an acceptable time.In addition,the massive amount of data has also caused huge storage consumption problems.In order to solve the above problems,hashing is proposed.It first transforms data from original representations into binary codes,minimizing the Hamming distance of similar data points and maximizing that of dissimilar ones.Then,pairwise comparisons can be carried out extremely efficiently in the learned Hamming space,using XOR operations.Moreover,by representing data with binary codes rather than original high-dimensional features,the storage cost can be dramatically reduced.Due to the efficient indexing and quick query,hashing has received extensive attention in the field of cross-modal retrieval,and many cross-modal hashing methods have been proposed.However,there still exist some issues worthy of investigation for existing cross-modal hashing methods.(1)For example,most methods only consider the pairwise similarity between samples and ignore the ranking information.However,lack of ranking information may lead to sub-optimal performance since it is also important.(2)A lot of hashing methods employ a pairwise similarity matrix to preserve similarity,which makes the algorithm complexity O(n 2)and cannot extend to large-scale datasets.(3)Besides,most methods relax the discrete constraint to solve the discrete optimization problem,which may introduce serious quantization error.To overcome the aforementioned issues,in this paper,we propose a new method named Ranking-based Supervised Discrete Cross-modal Hashing(RSDCH for short).RSDCH consists of ranking learning step and hashing learning step.In the first step,the proposed method learns ranking information from the manifold structure and semantic labels of data and generates a ranking score matrix.In the second step,RSDCH jointly learns hash codes and hash functions while preserving the learned ranking information.To make our method scalable to large-scale datasets,anchor sampling is leveraged and the time complexity of our method is linear to the number of training samples.To learn high-quality hash codes,two effective similarity-preserving strategies are proposed.To avoid large quantization error,an alternative optimization algorithm,which discretely solves the binary codes learning problem,is designed.We conducted comparative experiments on two widely-used multi-label datasets,i.e.,MIRFlickr-25K and NUS-WIDE.To comprehensively evaluate our proposed method RSDCH,we adopted three evaluation metrics,i.e.,Mean Average Precision(MAP),Normalized Discounted Cumulative Gain(NDCG)and Precision-Recall Curve.The experimental results have shown that the proposed RSDCH is superior to several state-of-the-art methods,including both non-deep and deep cross-modal hashing methods.To further evaluate the effectiveness of our method,we also carried out ablation experiments in order to test the necessity and effectiveness of each module in the RSDCH model.Finally,the effectiveness of the model convergence,parameter sensitivity,and training efficiency were tested by additional experiments,and the results further demonstrate that the proposed method is effective.
作者
李慧琼
王永欣
陈振铎
罗昕
许信顺
LI Hui-Qiong;WANG Yong-Xin;CHEN Zhen-Duo;LUO Xin;XU Xin-Shun(School of Software,Shandong University,Jinan 250101)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2021年第8期1620-1635,共16页
Chinese Journal of Computers
基金
国家自然科学基金(61991411,61872428)
山东省重点研发项目(2019JZZY010127)
山东省自然科学基金项目(ZR2019ZD06,ZR2020QF036)
山东大学基本科研业务费专项资金(2019GN075)资助
关键词
跨模态检索
哈希学习
排序哈希
离散优化
相似性保持
cross-modal retrieval
learning to hash
ranking-based hashing
discrete optimization
similarity preserving