期刊文献+

位置敏感哈希函数数据结构的概率分析

The probabilistic analysis of Locality Sensitive Hashing data structures
下载PDF
导出
摘要 对于高维空间的近邻查找问题,位置敏感哈希(LSH)在查询代价和磁盘空间利用上有着出色表现。在传统分析模型下,LSH被视作随机算法,唯一不确定因素就是哈希函数的选择。研究中将这种模型下得到的碰撞概率称为基于哈希函数的碰撞概率。在本文中,使用了不同的分析模型对LSH作了理论分析。此工作的出发点有2个:1)在现有的分析模型下,用户为了达到理论的效果,必须对每个查询点产生随机的数据结构,这在实际应用中是不现实的。2)用户所关心的性能指标是随机查询点在一个数据结构上的期望碰撞概率。基于此,本篇论文即推导了在汉明距离下,随机点对在任意单个哈希函数上的碰撞概率。研究将此模型下推导出的碰撞概率称为基于随机查询的碰撞概率。同时也一并证明了在汉明空间中,2种碰撞概率完全相同。 Locality Sensitive Hashing ( LSH ) owns nice asymptotic performance bounds on query cost and space consumption for similarity search problem in high-dimensional spaces. In traditional analysis model, LSH is regarded as a randomized algorithm, where the only source of uncertainty is the random choice of hash functions. The research calls the probability of collision obtained under this model the hash-function?based collision probability. The paper conducts the theoretical analysis of LSH using a different model. The motivations are that 1) in the existing analysis model , for the purpose of achieving the ideal performance ,one has to generate a random data structure for each query, which is obviously unaffordable in practice;2) the performance metric that practitioners are interested in is the expected success probability of a random query over a single randomly generated data structure. To this end, the paper analytically derives the probability of collision that random pairs of data points collide over any single hash function for hamming distance. So the research calls the probability of collision derived following this model the random-input?based collision probability. Also, the paper proves that these two kinds of collision probabilities are exactly equivalent.
作者 陆可镜 王洪亚 LU Kejing WANG Hongya(College of Computer and Technology, Donghua University, Shanghai 201620, China)
出处 《智能计算机与应用》 2016年第5期9-10,16,共3页 Intelligent Computer and Applications
基金 国家自然科学基金(61370205) 上海市自然科学基金(13ZR1400800) 中央高校基本科研业务费专项资金
关键词 位置敏感哈希函数 碰撞率 算法的概率分析 Locality Sensitive Hashing the probability of collision the probabilistic analysis of algorithms
  • 相关文献

参考文献14

  • 1ANDONI A,INDYK P. Near-optimal hashing algorithms for approximatenearest neighbor in high dimensions.Commun[ J]. ACM, 2008,51(1): 117-122.
  • 2KE Yan, SUKTHANKAR R, HUSTON L. An efficient parts-basednear-duplicate and sub-image retrieval system [ C ]// ACMMultimedia. New York, NY, USA; ACM, 2004:869-876.
  • 3GAN Junhao, FENG Jianlin, FANG Qiong, et al. Locality-sensitive hashingscheme based on dynamic collision counting [ C ]//SIGMOD.Scottsdale, AZ, USA: ACM, 2012: 541-552.
  • 4INDYK P,MOTWANI R. Approximate nearest nei^ibors:Towards removingthe curse of dimensionality[C]//STOC. Dallas, Texas, USA: ACM,1998: 604-613.
  • 5LV Qin, JOSEPHSON W, WANG Zhe, et al. Multi-probe lsh:Efficient indexing for high-dimensional similarity search[ C]//VLDB.Vienna, Austria:ACM, 2007 : 950-961.
  • 6WANG Hongya, CAO Jiao, SHU L C,et al. Locality sensitivehashing revisited : Filling the gap between theory and algorithmanalysisf C]//CIKM. San Francisco, CA, USA:ACM, 2013: 1969-1978.
  • 7BHODER A Z,CHARIKAR M,FRIEZE A M, et al. Min-wiseindependent permutations ( extended abstract) [ C J//ST0C. Dallas,Texas, USA:ACM,1998,60(3): 327-336.
  • 8CHARIKAR M. Similarity estimation techniques from roundingalgorithms [ C ]//STOC. Montreal, Quebec, Canadapages: ACM,2002; 380-388.
  • 9DAT All M,IMMOHLICA N, INDYK P, et al. Locality-sensitivehashing scheme based on p-stable distributions [ C ]//SoCG.Brooklyn, New York, USA: ACM, 2004 : 253-262.
  • 10TAOYufei, YI Ke, SHENG Cheng, et al. Quality and efficiency inhigh dimensional nearest neighbor search [ C J//SIGMOD.Providence, Rhode Island, USA:ACM, 2009:563-576.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部