期刊文献+

最优分数位minwise哈希算法的研究 被引量:1

Research on Optimal Fractional Bit Minwise Hashing
下载PDF
导出
摘要 在信息检索中,minwise哈希算法用于估值集合的相似度;b位minwise哈希算法则通过存储哈希值的b位来估算相似度,从而节省了存储空间和计算时间。分数位minwise哈希算法对各种精度和存储空间需求有着更加广泛的可选择性。对于给定的分数位f,构建f的方式有很多。分析了有限的分数位组合方式,给出最优化分数位的理论分析。大量的实验验证了此方法的有效性。 In information retrieval,minwise hashing algorithm is often used to estimate similarities among documents,and b-bit minwise hashing is capable of gaining substantial advantages in terms of computational efficiency and storage space by only storing the lowest b bits of each(minwise) hashed value(e.g.,b=1 or 2).Fractional bit minwise hashing has a wider range of selectivity for accuracy and storage space requirements.For the fixed fraction f,there are so many combinations of f.We theoretically analyzed limited combinations of fractional bit.The optimal fractional bit was found.Experimental results demonstrate the effectiveness of this method.
出处 《计算机科学》 CSCD 北大核心 2012年第8期182-185,共4页 Computer Science
基金 国家自然科学基金项目(M0921005 60873081 60970095 61003033) 湖南省杰出青年基金(11JJ1012) 教育部新世纪优秀人才支持计划(NCET-10-0787)资助
关键词 相似度估值 哈希 最优分数位 Similarity estimation Hasing Optimal fractional bit
  • 相关文献

参考文献1

二级参考文献2

共引文献68

同被引文献11

  • 1Broder A Z,Charikar M,Frieze A M,et al. Min-Wise Independent Permutations[J]. Journal of Computer Systems and Sciences,2000,60(3) :630-659.
  • 2Feigenblat G,Porat E,Shiftan A. Exponential Time Improvement for Min-Wise Based Algorithms[C]// SODA’11 Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms. San Francisco :SIAM,2011:57-66.
  • 3Kane D M,Nelson J,Woodruff D P. An Optimal Algorithm for the Distinct Elements Problem[C]// PODS’10 Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York:ACM,2010:41-52.
  • 4Li Ping,Knig A C. b-Bit Minwise Hashing[C]// Proceedings of the 19th International Conference on World Wide Web. [S. l.] :ACM,2010:671-680.
  • 5Li Ping,Knig A C,Gui W. b-Bit Minwise Hashing for Estimating Three-Way Similarities[C]//Neural Information Processing Systems (NIPS). Vancouver:[s. n.],2010:1387-1395.
  • 6Li Ping,Moore J,Knig A C. b-Bit Minwise Hashing for Large-Scale Linear SVM[J]. Neural Information Processing Systems ,2011:101-109.
  • 7Li Ping,Knig A C. Accurate Estimators for Improving Minwise Hashing and b-Bit Minwise Hashing[R/OL]. [2014-06-27]. http://arxiv.org/pdf/1108.0895.pdf.
  • 8Bray T,Paoli J. Sperberg-McQueen:Extensible Markup Language(XML) 1.0[R/OL]. [2014-03-27]. http://www.w3.org/TR/REC-xml/.
  • 9袁鑫攀,龙军,张祖平,桂卫华.Near-duplicate document detection with improved similarity measurement[J].Journal of Central South University,2012,19(8):2231-2237. 被引量:2
  • 10袁鑫攀,龙军,张祖平,罗跃逸,张昊,桂卫华.连接位Minwise Hash算法的研究[J].计算机研究与发展,2013,50(4):883-890. 被引量:3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部