摘要
在当前大规模数据检索任务中,学习型哈希方法能够学习紧凑的二进制编码,在节省存储空间的同时能快速地计算海明空间内的相似度,因此近似最近邻检索常使用哈希的方式来完善快速最近邻检索机制。对于目前大多数哈希方法都采用离线学习模型进行批处理训练,在大规模流数据的环境下无法适应可能出现的数据变化而使得检索效率降低的问题,提出在线哈希方法并学习适应性的哈希函数,从而在输入数据的过程中连续学习,并且能实时地应用于相似性检索。首先,阐释了学习型哈希的基本原理和实现在线哈希的内在要求;接着,从在线条件下流数据的读取模式、学习模式以及模型更新模式等角度介绍在线哈希不同的学习方式;而后,将在线学习算法分为六类:基于主-被动算法、基于矩阵分解技术、基于无监督聚类、基于相似性监督、基于互信息度量和基于码本监督,并且分析这些算法的优缺点及特点;最后,总结和讨论了在线哈希的发展方向。
In the current large-scale data retrieval tasks,learning to hash methods can learn compact binary codes,which saves storage space and can quickly calculate the similarity in Hamming space.Therefore,for approximate nearest neighbor search,hashing methods are often used to improve the mechanism of fast nearest neighbor search.In most current hashing methods,the offline learning models are used for batch training,which cannot adapt to possible data changes appeared in the environment of large-scale streaming data,resulting in reduction of retrieval efficiency.Therefore,the adaptive hash functions were proposed and learnt in online hashing methods,which realize the continuous learning in the process of inputting data and make the methods can be applied to similarity retrieval in real-time.Firstly,the basic principles of learning to hash and the inherent requirements to realize online hashing were explained.Secondly,the different learning methods of online hashing were introduced from the perspectives such as the reading method,learning mode,and model update method of streaming data under online conditions.Thirdly,the online learning algorithms were further divided into six categories,that is,categories based on passive-aggressive algorithms,matrix factorization technology,unsupervised clustering,similarity supervision,mutual information measurement,codebook supervision respectively.And the advantages,disadvantages and characteristics of these algorithms were analyzed.Finally,the development directions of online hashing were summarized and discussed.
作者
郭一村
陈华辉
GUO Yicun;CHEN Huahui(Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo Zhejiang 315000,China)
出处
《计算机应用》
CSCD
北大核心
2021年第4期1106-1112,共7页
journal of Computer Applications
基金
国家自然科学基金资助项目(61572266)。
关键词
在线学习
学习型哈希
无监督学习
监督学习
最近邻检索
online learning
learning to hash
unsupervised learning
supervised learning
nearest neighbor search