期刊文献+

基于特征选择的局部敏感哈希位选择算法 被引量:1

Algorithm of locality sensitive hashing bit selection based on feature selection
下载PDF
导出
摘要 作为主流的信息检索方法,局部敏感哈希往往需要生成较长的哈希码才能达到检索要求。然而,长哈希码需要消耗巨大的存储空间且携带大量的冗余哈希位。为了解决此问题,采用特征工程中10种简单高效的选择算法从长局部敏感哈希码中选择信息量丰富的哈希位,去除冗余、无效的哈希位。这10种选择算法使用不同的方式来刻画每一个哈希位的性能或两个哈希位之间的相关性,如方差、汉明距离等。通过去除长哈希码中性能较差或具有高相关性的哈希位进行哈希位的选择。将选择后的哈希码与原哈希码的性能进行比较。在4个常用数据集上的实验结果表明,去除冗余哈希位后的哈希码与原哈希码的性能几乎相同,且其哈希位的去除比率能达到30%~70%。 Locality sensitive hashing is one of the most popular information retrieval methods,which needs to generate long hashing bits to meet the retrieval requirement.However,a long hashing bits requires huge storage space,and contains plenty of redundant hashing bits.In order to solve this problem,ten simple and efficient selection algorithms in feature engineering were adopted to extract the hashing bits which carry the largest amount of information from the long hashing bits which were generated by locality sensitive hashing,and the redundant and useless hash bits were removed.Those ten algorithms tried to capture the performance of each hashing bit or the correlation among bits,such as variance and hamming distance.During selection process,the useless or high-correlated hashing bits were removed.Then the selected hashing bits were compared with the original long hashing bits.The experimental results on four common datasets show that the selected hashing bits works as well as the original hashing bits,and their reduction ratio can reach from 30%to 70%.
作者 周文桦 刘华文 李恩慧 ZHOU Wenhua;LIU Huawen;LI Enhui(College of Mathematics and Computer Science,Zhejiang Normal University,Jinhua 321001,China)
出处 《大数据》 2021年第6期67-77,共11页 Big Data Research
基金 国家自然科学基金资助项目(No.61976195)。
关键词 近似近邻搜索 哈希学习 哈希位选择 特征选择 降维 approximate nearest neighbor search hashing learning hashing bit selection feature selection dimensionality reduction
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部