面向视觉搜索的空间局部敏感哈希方法被引量：4

Locality-sensitive hashing approach based on semantic space for visual retrieval

导出

摘要目的视觉检索需要准确、高效地从大型图像或者视频数据集中检索出最相关的视觉内容,但是由于数据集中图像数据量大、特征维度高的特点,现有方法很难同时保证快速的检索速度和较好的检索效果。方法对于面向图像视频数据的高维数据视觉检索任务,提出加权语义局部敏感哈希算法(weighted semantic locality-sensitive hashing,WSLSH)。该算法利用两层视觉词典对参考特征空间进行二次空间划分,在每个子空间里使用加权语义局部敏感哈希对特征进行精确索引。其次,设计动态变长哈希码,在保证检索性能的基础上减少哈希表数量。此外,针对局部敏感哈希(locality sensitive hashing,LSH)的随机不稳定性,在LSH函数中加入反映参考特征空间语义的统计性数据,设计了一个简单投影语义哈希函数以确保算法检索性能的稳定性。结果在Holidays、Oxford5k和DataSetB数据集上的实验表明,WSLSH在DataSetB上取得最短平均检索时间0.03425 s;在编码长度为64位的情况下,WSLSH算法在3个数据集上的平均精确度均值(mean average precision,mAP)分别提高了1.2%~32.6%、1.7%~19.1%和2.6%~28.6%,与几种较新的无监督哈希方法相比有一定的优势。结论通过进行二次空间划分、对参考特征的哈希索引次数进行加权、动态使用变长哈希码以及提出简单投影语义哈希函数来对LSH算法进行改进。由此提出的加权语义局部敏感哈希(WSLSH)算法相比现有工作有更快的检索速度,同时,在长编码的情况下,取得了更为优异的性能。 Objective Visual retrieval methods need to accurately and efficiently retrieve the most relevant visual content from large-scale images or video datasets. However, due to a large amount of image data and high feature dimensionality in the dataset, existing methods face difficulty in ensuring fast retrieval speed and good retrieval results. Hashing is a widely studied solution for approximate nearest neighbor search, which aims to convert high-dimensional data items into a low-dimensional representation or a hash code consisting of a set of bit sequences. Locality-sensitive hashing(LSH) is a data-independent, unsupervised hashing algorithm that provides asymptotic theoretical properties, thereby ensuring performance. LSH is considered as one of the most common methods for fast nearest-neighbor search in high-dimensional space. Nevertheless, if the number of hash functions k is set too small, it leads to too many data items falling into each hash bucket, thus increasing the query response time. By contrast, if k is set too large, the number of data items in each hash bucket is reduced. Moreover, to achieve the desired search accuracy, LSH usually needs to use long hash codes, thereby reducing the recall rate. Although the use of multiple hash tables can alleviate this problem, it significantly increases memory cost and query time. Besides, due to the semantic gap between the visual semantic space and metric space, LSH may not obtain good search performance. Method For visual retrieval of high-dimensional data, we first propose a hash algorithm called weighted semantic locality-sensitive hashing(WSLSH), which is based on feature space partitioning, to address the aforementioned drawbacks of LSH. While building the indices, WSLSH considers the distance relationship between reference and query features, divides the reference feature space into two subspaces by a two-layer visual dictionary, and employs weighted-semantic locality sensitive hashing in each subspace to index, thereby forming a hierarchical index structure. The proposed algorithm can rapidly converge the target to a small range in the process of large-scale retrieval and make accurate queries, which greatly improves the retrieval speed. Then, dynamic variable-length hashing codes are applied in a hashing table to retrieve multiple hashing buckets, which can reduce the number of hashing tables and improve the retrieval speed based on guaranteeing the retrieval performance. Through these two improvements, the retrieval speed can be greatly improved. In addition, to solve the random instability of LSH, statistical information reflecting the semantics of reference feature space is introduced into the LSH function, and a simple projection semantic-hashing function is designed to ensure the stability of the retrieval performance. Result Experiment results on Holidays, Oxford5 k, and DataSetB datasets show that the retrieval accuracy and retrieval speed are effectively improved in comparison with the representative unsupervised hash methods. WSLSH achieves the shortest average retrieval time(0.034 25 s) on DataSetB. When the encoding length is 64 bit, the mean average precision(mAP) of the WSLSH algorithm is improved by 1.2%~32.6%,1.7%~19.1%, and 2.6%~28.6%. WSLSH is not highly sensitive to the size change of the reference feature subset involved, so the retrieval time does not change significantly, which reflects the retrieval advantage of WSLSH for large-scale datasets. With the increase of encoding length, the performance of the WSLSH algorithm is improved gradually. When the encoding length is 64 bit, the WSLSH algorithm obtains the highest precision and recall on the three datasets, which is superior to other compared methods. Conclusion The LSH algorithm is improved by performing feature space division twice, weighting the number of hash indexes of reference features, dynamically using variable-length hash codes, and introducing a simple-projection semantic-hash function. Thus, the proposed WSLSH algorithm has faster retrieval speed. In the case of long encoding length, WSLSH achieves better performance than the compared works and shows high application value for large-scale image datasets.

作者黄小燕孙彬杨展源朱映映田奇 Huang Xiaoyan;Sun Bin;Yang Zhanyuan;Zhu Yingying;Tian Qi(College of Computer Science and Software Engineering,Shenzhen University,Shenzhen 518000,China;Huawei Technologies Co.,Ltd.,Shenzhen 518000,China)

机构地区深圳大学计算机与软件学院华为技术有限公司

出处《中国图象图形学报》 CSCD 北大核心 2021年第7期1568-1582,共15页 Journal of Image and Graphics

基金国家自然科学基金项目(62072318) 广东省自然科学基金项目(2021A1515012014) 深圳市科技计划基础研究面上项目(JCYJ20190808172007500)。

关键词特征空间划分局部敏感哈希(LSH) 动态变长哈希码视觉搜索最近邻搜索 feature space partitioning locality-sensitive hashing(LSH) dynamic variable-length hashing code visual retrieval nearest neighbor search

分类号 TP37 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1费伦科,秦建阳,滕少华,张巍,刘冬宁,侯艳.近似最近邻大数据检索哈希散列方法综述[J].广东工业大学学报,2020,37(3):23-35. 被引量：4
2刘颖,程美,王富平,李大湘,刘伟,范九伦.深度哈希图像检索方法综述[J].中国图象图形学报,2020,25(7):1296-1317. 被引量：13
3毛晓蛟,杨育彬.一种基于子空间学习的图像语义哈希索引方法[J].软件学报,2014,25(8):1781-1793. 被引量：8

二级参考文献34

1蒋凯,武港山.基于Web的信息检索技术综述[J].计算机工程,2005,31(24):7-9. 被引量：20
2http://venturebeat.com/2008/07/25/google-finds-that-the-web-has-over-1-trillion-unique-urls.
3/http://www.kullin.net/2010/09/flickr-5-billion-photos/.
4Arya S, Mount DM. Approximate nearest neighbor queries in fixed dimensions. In: Proc. of the 4th Annual ACM/SIGACT-SIAM Symp. on Discrete Algorithms. New York: ACM/SIAM, 1993. 271-280.
5Gionis A,Indyk P, Motwani R. Similarity search in high dimensions via hashing, In: Proc. of the 25th Int'l Conf. on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1999.518-529.
6Weiss Y, Torralba A, Fergus R. Spectral hashing. In: Proc. of the 22th Annual Conf. on Neural Information Processing System, New York: Curran Associates Inc., 2008. 1753-1760.
7Torralba A, Fergus R, Freeman WT. 80 million tiny images: A large dataset for non-parametric object and scene recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, 2008,30(11):1958-1970. [doi: 10,1109/TPAMI,2008.128].
8Torralba A, Fergus R, Weiss Y. Small codes and large databases for recognition, In: Proc, of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Washington: IEEE Computer Society, 2008. 1-8. [doi: 10,1109/CVPR.2008.4587633].
9Kulis B, Jain P, Grauman K. Fast similarity search for learned metric. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009,31(12):2143-2157. [doi: 10.1109/TPAMI.2009,151].
10Xu H, Wang JD, Li Z, Zeng G, Li SP, Yu NH, Complementary hashing for approximate nearest neighbor search. In: Proc. of the IEEE Int'l Conf. on Computer Vision, New York: IEEE, 2011. 1631-1638. [doi: 10.1109/ICCV.2011.6126424].

共引文献22

1张建新,吴悦,张强,魏小鹏.有监督相似性保持的深度二阶哈希方法[J].计算机科学,2022,49(S02):494-501.
2白丰,张明路,张小俊,孙凌宇.局部二进制特征描述算法综述[J].电子测量与仪器学报,2016,30(2):165-178. 被引量：12
3曹玉东,刘艳洋,贾旭,王冬霞.基于改进的局部敏感哈希算法实现图像型垃圾邮件过滤[J].计算机应用研究,2016,33(6):1693-1696. 被引量：13
4杜刚,曹玉东,刘艳洋.图像中的文本区域识别技术研究[J].辽宁工业大学学报（自然科学版）,2016,36(3):141-143.
5白琮,黄玲,陈佳楠,潘翔,陈胜勇.面向大规模图像分类的深度卷积神经网络优化[J].软件学报,2018,29(4):1029-1038. 被引量：62
6王粲.基于hashing的二值加速[J].电子制作,2018,26(20):39-40.
7陈凤,蒙祖强.基于哈希算法的异构多模态数据检索研究[J].计算机科学,2019,46(10):49-54. 被引量：11
8金汉均,曾星.基于注意力机制的深度哈希图像检索方法[J].电子测量技术,2021,44(3):144-148. 被引量：4
9陈秀妍,张梦狄,韩向娣,闫珺.图表数据学术不端案例调研与防范研究[J].中国科技期刊研究,2021,32(5):555-562. 被引量：11
10张巍,张圳彬.联合图嵌入与特征加权的无监督特征选择[J].广东工业大学学报,2021,38(5):16-23. 被引量：2

同被引文献30

1欧阳杰,高金花,文振焜,张盟,刘朋飞,杜以华.融合HVS计算模型的视频感知哈希算法研究[J].中国图象图形学报,2011,16(10):1883-1889. 被引量：7
2方加娟,赵广复.基于聚类分析的高维数据异常特征光流检测系统[J].激光杂志,2019,40(12):128-131. 被引量：4
3李华民,熊维新,赵富荣,王青青.基于模糊双射软集合的城市物流配送可靠性评价[J].物流技术,2020,39(1):28-33. 被引量：1
4许茂增,周翔,崔利刚,刘永,余国印.低配送密度区域快递共同配送模式及利益分配[J].计算机集成制造系统,2020,26(1):181-190. 被引量：26
5李存兵,谢林君,杨金欣.基于精英自适应遗传聚类算法的烟草物流配送优化研究[J].烟草科技,2020,53(2):94-101. 被引量：14
6汤玲玲,罗正英.“16+1”背景下中国企业对外投资物流转运中心选址优化方法研究[J].苏州大学学报（哲学社会科学版）,2020,41(3):112-120. 被引量：3
7花思齐,赵伟,刘建业.基于改进滤波器和图像多尺度变换的背景抑制算法[J].系统工程与电子技术,2020,42(8):1679-1684. 被引量：5
8廖列法,欧阳宗英.基于二次插值的天牛须搜索算法[J].计算机应用研究,2021,38(3):745-750. 被引量：13
9谭志龙,王征,薛桂琴,王新.基于社会化库存的多回程物流配送问题的拉格朗日松弛算法[J].计算机集成制造系统,2021,27(3):965-972. 被引量：4
10倪卫红,陈太.基于聚类-重心法的应急物流配送中心选址[J].南京工业大学学报（自然科学版）,2021,43(2):255-263. 被引量：41

引证文献4

1王劭博.基于人工智能的高维数据异常挖掘方法研究[J].信息与电脑,2022,34(7):207-209. 被引量：1
2武林伟,闫婧,王勇.基于深度学习的海量航拍视频智能处理技术[J].现代电子技术,2023,46(4):182-186.
3韩莉.基于改进灰狼优化算法的区域物流配送点优化分配方法[J].常州工学院学报,2023,36(3):47-53.
4周元鼎,房耀东,秦川.面向感知哈希的图像数据集[J].中国图象图形学报,2024,29(2):343-354.

二级引证文献1

1宋冀峰.基于改进随机森林的海量结构化数据异常辨识算法[J].微型电脑应用,2023,39(11):156-159.

1孙二华,胡云冰.基于鲸鱼优化和深度学习的不平衡大数据分类算法[J].西南师范大学学报（自然科学版）,2021,46(5):127-133. 被引量：8
2刘洋洋,魏国亮,管启,王远.一种基于全局-局部联合二进制特征的快速闭环检测算法[J].小型微型计算机系统,2021,42(8):1720-1726.
3谷倩,张琢,张丽,杨文晓,李焕茹,于洋,刘晓宇,毕学.砷污染场地土壤的稳定化技术工程应用研究[J].环境工程技术学报,2021,11(4):734-739. 被引量：6
4无.重要启事[J].中国南方果树,2021,50(4):169-169.
5缪斯,祝永新.针对图像盲去模糊的可微分神经网络架构搜索方法[J].计算机工程,2021,47(9):313-320. 被引量：7
6Wu Shuai.Analysis of Maintenance and Inspection Status of Aeronautical Machinery and Improvement Countermeasures[J].Journal of Electronic Research and Application,2021,5(3):1-4.
7肖进胜,周景龙,雷俊锋,李亮,丁玲,杜治一.面向图像场景转换的改进型生成对抗网络[J].软件学报,2021,32(9):2755-2768. 被引量：4
8汪哲,任怡,周凯,管剑波,谭郁松.基于代码克隆检测的操作系统脆弱性分析方法[J].计算机科学与探索,2021,15(9):1619-1631. 被引量：1
9Bat-Zion Yemini-Amrani.Unique Linguistic Awareness in the Israeli Media in the Corona Days and Tishrei Holidays[J].Journal of Sociology Study,2021,11(4):145-153.
10Hongchao Qi,Yue Chen,Dongli Xu,Hualin Su,Longwen Zhan,Zhiyin Xu,Ying Huang,Qianshan He,Yi Hu,Henry Lynn,Zhijie Zhang.Impact of meteorological factors on the incidence of childhood hand,foot,and mouth disease(HFMD)analyzed by DLNMs-based time series approach[J].Infectious Diseases of Poverty,2018,7(1):77-86. 被引量：17

中国图象图形学报

2021年第7期

浏览历史

内容加载中请稍等...

面向视觉搜索的空间局部敏感哈希方法被引量：4

参考文献3

二级参考文献34

共引文献22

同被引文献30

引证文献4

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

面向视觉搜索的空间局部敏感哈希方法 被引量：4

参考文献3

二级参考文献34

共引文献22

同被引文献30

引证文献4

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

面向视觉搜索的空间局部敏感哈希方法被引量：4