摘要
寻找查询点的最近邻是信息处理相关领域的主要任务之一。在数据规模较大时需要采用快速检索算法,常用的快速检索算法主要是基于树的算法,但是当数据点维数较高时,这些算法的效率会变低。位置敏感哈希是当前解决高维搜索的最快的算法,文章对汉明空间、欧式空间下的位置敏感哈希算法的实现方案进行了详细分析,对算法中数据点冲突概率、空间时间消耗、参数调整对算法性能的影响进行了详尽的研究和试验,最后讨论算法的优点和缺点,说明了算法应用于视觉聚类的可能性。
Finding nearest neighbor is a main task in information processing field. The fast searching algorithm is needed in large scale database, and tree-based methods are frequently used for fast retrieval. But when the dimension of data point is high, they will become inefficient. Locality Sensi- tive Hashing is the fastest method for solving fast high dimension searching currently. This paper ex- plores the implementation of Locality Sensitive Hashing in hamming space and Euclidean space, and studies the data point collision probability, space and time consuming, the effect of parameter tuning through experiments. Finally discussed are the merits and drawbacks of this algorithm and the feasi- bility of applying LSH in visual clustering.
出处
《信息工程大学学报》
2013年第3期332-340,共9页
Journal of Information Engineering University
基金
国家自科学基金资助项目(60872142)
关键词
近似最近邻搜索
位置敏感哈希
精确欧式距离位置敏感哈希
视觉聚类
approximate nearest neighbor(ANN)
locality sensitive Hashing(LSH)
exact euclide- an locality sensitive Hashing( E2LSH)
visual clustering