摘要
近似最近邻检索已成为人工智能时代海量数据快速检索主要技术之一。作为高效的近似最近邻检索方法,哈希散列方法受到广泛关注并且层出不穷。到目前为止还没有文献对主流哈希散列方法进行全面地分析和总结。鉴于此,本文首先系统地介绍哈希散列的基本知识,包括距离计算、损失函数、离散约束和外样本计算等。然后,深入对比分析主流哈希散列算法优缺点,并在主流数据库上进行性能评估。最后,总结哈希散列技术目前存在的问题,并提出若干潜在的哈希散列研究方向。本文对设计高效的哈希散列方法具有重要借鉴意义。
Approximate Nearest Neighbor(ANN)search has served as one of the most important technologies for efficient retrieval of large-scale data in the era of artificial intelligence.As a promising solution to the ANN,hashing has received a lot of attention due to its high efficiency and extensive works have been presented in the literature.However,so far,there is no work with attempt to comprehensively analyze and overview the state-of-theart hashing methods.To address this,the basics of hashing,including distance calculation,loss function,discrete constraint and out-of-sample learning,are first systematically introduced.Then,the state-of-the-art hashing based methods are comparatively studied and experiments on the widely used databases are conducted to evaluate their performance.Finally,the key problems of hashing methods are summarized and some potential research directions are pointed out.It is believed that this endeavor could provide other researches with a useful guideline in designing effective and efficient hashing methods.
作者
费伦科
秦建阳
滕少华
张巍
刘冬宁
侯艳
Fei Lun-ke;Qin Jian-yang;Teng Shao-hua;Zhang Wei;Liu Dong-ning;Hou Yan(School of Computers,Guangdong University of Technology,Guangzhou 510006,China)
出处
《广东工业大学学报》
CAS
2020年第3期23-35,共13页
Journal of Guangdong University of Technology
基金
国家自然科学基金资助项目(61702110,61603100,61972102)
广东省自然科学基金资助项目(2019A1515011811)
广东省重点领域研发计划项目(2020B010166006)。
关键词
近似最近邻匹配
哈希学习
哈希散列
数据检索
approximate nearest neighbor search
hashing learning
hashing
data retrieval