近似最近邻搜索算法——位置敏感哈希被引量：8

Approximate Nearest Neighbor Searching Algorithm—Locality Sensitive Hashing

下载PDF

导出

摘要寻找查询点的最近邻是信息处理相关领域的主要任务之一。在数据规模较大时需要采用快速检索算法,常用的快速检索算法主要是基于树的算法,但是当数据点维数较高时,这些算法的效率会变低。位置敏感哈希是当前解决高维搜索的最快的算法,文章对汉明空间、欧式空间下的位置敏感哈希算法的实现方案进行了详细分析,对算法中数据点冲突概率、空间时间消耗、参数调整对算法性能的影响进行了详尽的研究和试验,最后讨论算法的优点和缺点,说明了算法应用于视觉聚类的可能性。 Finding nearest neighbor is a main task in information processing field. The fast searching algorithm is needed in large scale database, and tree-based methods are frequently used for fast retrieval. But when the dimension of data point is high, they will become inefficient. Locality Sensi- tive Hashing is the fastest method for solving fast high dimension searching currently. This paper ex- plores the implementation of Locality Sensitive Hashing in hamming space and Euclidean space, and studies the data point collision probability, space and time consuming, the effect of parameter tuning through experiments. Finally discussed are the merits and drawbacks of this algorithm and the feasi- bility of applying LSH in visual clustering.

作者高毫林徐旭李弼程

机构地区信息工程大学 [

出处《信息工程大学学报》 2013年第3期332-340,共9页 Journal of Information Engineering University

基金国家自科学基金资助项目(60872142)

关键词近似最近邻搜索位置敏感哈希精确欧式距离位置敏感哈希视觉聚类 approximate nearest neighbor（ANN） locality sensitive Hashing（LSH） exact euclide- an locality sensitive Hashing（ E2LSH） visual clustering

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献41

1lndyk P, Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality[ C]//The Symposium on Theory of Computing. 1998:604-613.
2Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing[ C ]//The 25th International Conference on Very Large Data Bases. 1999:518-529.
3Datar M, Immorlica N, Indyk P, et al. Locality sensitive hashing scheme based on p-stable distributions[ C ]//The ACM Sym- posium on Computational Geometry. 2004: 253-262.
4Andoni A, Indyk P. E21sh: Exact Euclidean locality-sensitive hashing (E^2LSH 0.1 User Manual) [ EB/OL]. [2005-06-01 ]. http-//www, mit. edu/- andoni/LSH/manual, pdf, October 20,2011.
5Andoni A, Indyk P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions[ J]. Communications of the ACM, 2008,51(1) :117-122.
6Panigrahy R. Entropy-based nearest neighbor algorithm high dimensions[ C]//The ACM-SIAM Symposium on Discrete Algo-rithms. 2006:1185-1195.
7Buhler J, Tompa M. Finding motifs using random projections[ J]. Journal of Computational Biology, 2002 , 9(2) :225-242.
8Shakhnarovich G, Viola P, Darrell T. Fast pose estimation with parameter-sensitive hashing[ C] //The 9th IEEE InternationalConference on Computer Vision. 2003 : 13-16.
9Casey M,Slaney M. Fast recognition of remixed music audio[ C]//The IEEE International Conference on Acoustics, Speech,and Signal Processing. 2007 : 1425-1428 .
10Liang Ying Yu, Li Jian Min, Zhang Bo. Vocabulary-based Hashing for Image Search[ C]//The International Conference onMultimedia. 2009 :589-592.

二级参考文献123

1胡和平,曾庆锐,路松峰.中文词聚类研究[J].计算机工程与科学,2006,28(1):122-124. 被引量：9
2卢炎生,饶祺.一种LSH索引的自动参数调整方法[J].华中科技大学学报（自然科学版）,2006,34(11):38-40. 被引量：6
3梅翔,孟祥武,陈俊亮,徐萌.一种基于语义关联的查询优化方法[J].北京邮电大学学报,2006,29(6):107-110. 被引量：10
4Stein B. Principles of hash - based text retrieval [C]//Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007.
5Athitsos V,Potamias M,Papapetrou P,et al. Nearest Neighbor Retrieval Using Distanee-Based Hashing[C] // Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on. 2008.
6IndykP, DatarM, ImmorlicaN. Locality-SensitiveHashingScheme Based on p-Stable[C]//Annual Symposium on Computational Geometry. 2004.
7Arya S, Mount D. Ann: Library for approximate nearest neighbor search[OL], http: //www. cs. umd. edu/-mount/ANN/.
8Indyk P, Motwani R. Approximate nearest neighbors : Towards removing the curse of dimensionality[C]//Jeffrey V, ed. Proc. of the 30th Annual ACM Symp. on Theory of Computing. New York: ACM Press, 1998 : 604-613.
9Panigrahy R. Entropy based nearest neighbor searchin high dimensions[C]//Proc, of ACM-SIAMSymposium on Discrete Algorithms(SODA). 2006.
10Ravichandran D,Pantel P, Hovy E. Randomized Algorithms and NLP..Using Locality Sensitive Hash Function for High Speed Noun Clustering[M]. Information Sciences Institute University of Southern California, 2004.

共引文献70

1刘文娣,蔡明.有效的结构化P2P信息检索[J].计算机工程与设计,2009,30(16):3787-3789. 被引量：1
2杨恒,王庆,何周灿.面向高维图像特征匹配的多次随机子向量量化哈希算法[J].计算机辅助设计与图形学学报,2010,22(3):494-502. 被引量：9
3何周灿,王庆,杨恒.一种面向快速图像匹配的扩展LSH算法[J].四川大学学报（自然科学版）,2010,47(2):269-274. 被引量：8
4易磊,仲红,袁先平,赵玉.支持容错检索的数据共享方案[J].计算机应用,2011,31(6):1525-1527.
5王娟,孙爱莉,王海雄,蒋永新.图情学主题词表分类体系评价[J].情报资料工作,2011,32(4):54-57.
6陈慧中,陈永光,景宁,陈荦.遥感影像检索中高维特征的快速匹配[J].电子与信息学报,2011,33(9):2144-2151.
7高常鑫,桑农.整合局部特征和滤波器特征的空间金字塔匹配模型[J].电子学报,2011,39(9):2034-2038. 被引量：9
8刘婉,徐望明,石汉路.基于高维局部特征和LSH索引的图像检索技术[J].电子设计工程,2011,19(20):110-112. 被引量：1
9陈慧中,陈永光,景宁,陈荦.PCPF:一种面向多媒体数据库中高维向量匹配的并行索引结构[J].计算机学报,2011,34(10):2009-2017. 被引量：3
10胡正平,涂潇蕾.多方向上下文特征结合空间金字塔模型的场景分类[J].信号处理,2011,27(10):1536-1542. 被引量：5

同被引文献64

1刘小珠,孙莎,曾承,彭智勇.基于缓存的倒排索引机制研究[J].计算机研究与发展,2007,44(z3):153-158. 被引量：8
2郑坤,朱良峰,吴信才,刘修国,李菁.3D GIS空间索引技术研究[J].地理与地理信息科学,2006,22(4):35-39. 被引量：33
3侯剑华,陈悦.战略管理学前沿演进可视化研究[J].科学学研究,2007,25(A01):15-21. 被引量：136
4BREESE J, HECHERMAN D, KADIE C. Empirical analysis of predictive algorithms for collaborative filtering[ C] // Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers, 1998:43 -52.
5ZHANG S, WANG W, FORD J, et al. Learning from incomplete ratings using non-negative matrix factorization[ C]// proceedings of the 6th SIAM Conference on Data Mining. Philadelphia: SIAM, 2006:549-553.
6CAI R, ZHANG C, ZHANG L, et al. Scalable Music recommenda- tion by search[ C]// Proceedings of the 15th ACM International Conference on Multimedia. New York: ACM, 2007:1065 - 1074.
7DATAR M, IMMORLICA N, INDYK P, et al. Locality-sensitive hashing scheme based on p-stable distributions[ C}//Proceedings of the 20th ACM Symposium on Computational Geometry. New York: ACM, 2004:253-262.
8MAREE R, DENIS P, WEHENKEL L. Incremental indexing and distributed image search using shared randomized vocabularies [ C]//Proceedings of the 2007 ACM SIGMM International Confer- ence on Very Large Data Bases. New York: ACM, 2007:950 - 961.
9ANDONI A, INDYK P. Nearest-optimal hashing algorithms for ap- proximate nearest neighbor in high dimensions[ J]. Communications oftheACM, 2008,51(1): 117 -122.
10ANDONI A, INDYK P. E2LSH: Exact Euclidean Locality-Sensi- tive Hashing (E2LSH) 0.1 user manual[ EB/OL]. [ 2005 - 06 - 01]. http://www, mit. edu/ andoni/ LSH/manual. pdf.

引证文献8

1李红梅,郝文宁,陈刚.基于精确欧氏局部敏感哈希的协同过滤推荐算法[J].计算机应用,2014,34(12):3481-3486. 被引量：9
2李红梅,郝文宁,陈刚.基于改进LSH的协同过滤推荐算法[J].计算机科学,2015,42(10):256-261. 被引量：13
3杨定中,陈心浩.基于投影残差量化哈希的近似最近邻搜索[J].计算机工程,2015,41(12):161-165. 被引量：3
4余遵成.近10年国内索引方法与技术研究的文献计量分析[J].图书情报导刊,2016,1(9):113-118.
5余遵成.近十年国内索引方法与技术研究计量分析[J].图书情报论坛,2016,0(5):20-27.
6陶津,王晓东,姚宇.基于乘积量化的近似最近邻算法[J].计算机应用,2018,38(A02):128-131. 被引量：3
7曹界杰,张娟.基于改进局部敏感哈希的协同过滤推荐算法[J].软件,2021,42(5):151-156.
8魏远征.数据库自适应查询优化技术研究[J].计算机应用文摘,2023,39(16):77-79.

二级引证文献24

1王君威,余粟.基于隐式数据和Apriori的协同过滤推荐算法[J].智能计算机与应用,2022,12(3):200-203. 被引量：3
2郝世选.基于位置敏感哈希的入侵检测研究与应用[J].计算机仿真,2016,33(4):308-311. 被引量：3
3戴光麟,许明敏,董天阳.基于空间金字塔视觉词袋模型的交通视频车型分类方法研究[J].浙江工业大学学报,2016,44(3):247-253. 被引量：4
4刘明伟,张晓滨,杨东山.改进RGM的用户情景状态序列信息预测方法[J].西安工程大学学报,2016,30(3):359-363. 被引量：3
5谢人强,陈震.基于共同评分项和权重计算的推荐算法研究[J].计算机技术与发展,2016,26(9):69-72. 被引量：2
6盛伟,余英,王保云.基于相似用户索引和ALS矩阵分解的推荐算法研究[J].陕西理工学院学报（自然科学版）,2016,32(6):47-52. 被引量：3
7钟川,陈军.基于精确欧氏局部敏感哈希的改进协同过滤推荐算法[J].计算机工程,2017,34(2):74-78. 被引量：7
8李道国,何狄江,李连杰.基于用户兴趣变化的协同过滤推荐算法[J].生产力研究,2017(1):19-21.
9郭蕊,张雪锋.基于指纹细节点柱形码的参数自适应选取算法[J].计算机应用研究,2017,34(4):1063-1066.
10刘彦,张琳.位置大数据中一种基于Bloom Filter的匿名保护方法[J].计算机科学,2017,44(6):144-149. 被引量：4

1左晓军,董立勉,曲武.基于Spark框架的分布式入侵检测方法[J].计算机工程与设计,2015,36(7):1720-1726. 被引量：5
2赵启潍,张乐,祝贝利,刘静.面向高维数据的LSH算法及应用[J].福建电脑,2012,28(4):13-14. 被引量：1
3杜丙新.图像检索研究综述及系统实现[J].电子科技,2016,29(6):185-189. 被引量：5
4李灿.基于内容的商品图像检索技术与系统研究[J].移动通信,2016,40(8):63-69. 被引量：1
5袁培森,沙朝锋,王晓玲,周傲英.一种基于学习的高维数据c-近似最近邻查询算法[J].软件学报,2012,23(8):2018-2031. 被引量：18
6刘芳,田凯,周志光,林海.基于SOM和引力场聚类的金融数据可视化[J].计算机辅助设计与图形学学报,2012,24(4):435-442. 被引量：11
7蒋巍.基于位置敏感哈希的海量文本数据查询算法研究[J].科技通报,2013,29(10):70-72. 被引量：1
8王洪亚,吴西送,任建军,赵银龙.分布式平台下MinHash算法研究与实现[J].智能计算机与应用,2014,4(6):44-46. 被引量：5
9王洪峰,刘辛.基于位置敏感哈希的网络视频重复检测[J].计算机应用研究,2012,29(5):1954-1958. 被引量：2
10张小莉.基于KD树的海量图像匹配技术[J].计算机时代,2014(7):40-42. 被引量：1

信息工程大学学报

2013年第3期

浏览历史

内容加载中请稍等...

近似最近邻搜索算法——位置敏感哈希被引量：8

参考文献41

二级参考文献123

共引文献70

同被引文献64

引证文献8

二级引证文献24

相关作者

相关机构

相关主题

浏览历史

近似最近邻搜索算法——位置敏感哈希 被引量：8

参考文献41

二级参考文献123

共引文献70

同被引文献64

引证文献8

二级引证文献24

相关作者

相关机构

相关主题

浏览历史

近似最近邻搜索算法——位置敏感哈希被引量：8