一种改进的LSH/MinHash协同过滤算法被引量：5

An Improved LSH /MinHash Collaborative Filtering Algorithm

下载PDF

导出

摘要近年来很多基于协同过滤的推荐系统得到了成功应用,但随着系统中用户和项目数量的不断增加,相似度计算量剧增,使得协同过滤推荐系统的扩展性问题变得日益突出。本文提出改进的基于近似最近邻的LSH/MinHash算法,并运用到图书馆资源聚类中,以解决在合理时间复杂度下的高维大数据量聚类问题,降低相似度计算量,提高算法的可扩展性。实验表明此算法有较高的效率与精度。 In recent years , many collaborative filtering-based recommender systems have been successfully applied , but with the increasing number of system users and projects , the amount of similarity calculation increases sharply , collaborative filtering rec-ommendation system scalability issues become increasingly prominent .This paper puts forward a LSH/MinHash algorithm based on the approximate nearest neighbor , and applies it to the clustering of library resources , for solving the problem of high dimen-sion and a amount of data cluster in the complexity under reasonable time .It reduces the amount of similarity calculation , im-proves the scalability of the algorithm .Experiments show that this algorithm is of higher efficiency and accuracy .

作者卞艺杰陈超马玲玲陈远磊

机构地区河海大学商学院

出处《计算机与现代化》 2013年第12期19-22,26,共5页 Computer and Modernization

关键词图书馆个性化推荐协同过滤 LSH library personalized recommendation collaborative filtering LSH

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献10

1Sarwar M B, Karypis G, Konstan A J, et al. hem-based collaborative filtering recommendation algorithms [ C ]/! Proceedings of the 10th International Conference on World Wide Web. 2001:285-295.
2Sarwar M B, Karypis G, Konstan A J, et al. Application of dimensionality reduction in recommender system: A case stud- y[C]// WebKDD Workshop at the ACM SIGKKD. 2000.
3Mcginty L, Smyth B. Adaptive selection: An analysis of critiquing and preference-based feedback in conversational recommender systems [ J ]. International Journal of Elec- tronic Commerce, 2006,11(2) :35-57.
4Gaede V, Gunther O. Multidimensional access methods [ J ]. ACM Computing Surveys, 1998,30 (2) : 170-231.
5Rajaraman A, Ullman J D. Mining of Massive Datasets [ M ]. Cambridge University Press, 2010.
6Broder A Z. On the resemblance and containment of docu- ments [ C ]// Proceedings of the Compression and Com- plexity of Sequences, 1997. 1997:21-29.
7Charikar M S. Similarity estimation techniques from roun- ding algorithms [ C ]/! Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 2002:380- 388.
8黄维篁,李国良,冯建华.高效的数据源选择方式[J].计算机科学与探索,2010,4(10):890-898. 被引量：1
9李晓光.基于联接的高校图聚类方法研究[D].沈阳:辽宁大学,2012.
10蔡衡,李舟军,孙健,李洋.基于LSH的中文文本快速检索[J].计算机科学,2009,36(8):201-204. 被引量：13

二级参考文献28

1Stein B. Principles of hash - based text retrieval [C]//Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007.
2Athitsos V,Potamias M,Papapetrou P,et al. Nearest Neighbor Retrieval Using Distanee-Based Hashing[C] // Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on. 2008.
3IndykP, DatarM, ImmorlicaN. Locality-SensitiveHashingScheme Based on p-Stable[C]//Annual Symposium on Computational Geometry. 2004.
4Arya S, Mount D. Ann: Library for approximate nearest neighbor search[OL], http: //www. cs. umd. edu/-mount/ANN/.
5Indyk P, Motwani R. Approximate nearest neighbors : Towards removing the curse of dimensionality[C]//Jeffrey V, ed. Proc. of the 30th Annual ACM Symp. on Theory of Computing. New York: ACM Press, 1998 : 604-613.
6Panigrahy R. Entropy based nearest neighbor searchin high dimensions[C]//Proc, of ACM-SIAMSymposium on Discrete Algorithms(SODA). 2006.
7Ravichandran D,Pantel P, Hovy E. Randomized Algorithms and NLP..Using Locality Sensitive Hash Function for High Speed Noun Clustering[M]. Information Sciences Institute University of Southern California, 2004.
8Cai Rui, Zhang Chao, Zhang Lei, et al. Scalable Music Recommendation by Search [C]// International Multimedia Conference. 2007.
9Charikar M S. Similarity Estimation Techniques from Rounding Algorithms[C]//Annual ACM Symposium on Theory of Computing. 2002.
10Qin L, Josephson W, Wang Zhe, et al. A TimeSpace Efficient Locality Sensitive Hashing Method for Similarity Search in High Dimensions.

共引文献12

1易磊,仲红,袁先平,赵玉.支持容错检索的数据共享方案[J].计算机应用,2011,31(6):1525-1527.
2赵启潍,张乐,祝贝利,刘静.面向高维数据的LSH算法及应用[J].福建电脑,2012,28(4):13-14. 被引量：1
3高毫林,徐旭,李弼程.近似最近邻搜索算法——位置敏感哈希[J].信息工程大学学报,2013,14(3):332-340. 被引量：8
4於慧,谢萍,李士进,冯钧.基于多特征LSH索引的快速遥感图像检索[J].山西大学学报（自然科学版）,2013,36(3):350-356. 被引量：1
5赵跃华,林聚伟.面向海量病毒样本家族聚类方法的研究[J].计算机工程与应用,2014,50(18):118-121.
6曹玉东,刘艳洋,孙福明,贾旭.低空间复杂度的LSH算法及其在图像检索中的应用[J].计算机工程与科学,2015,37(2):379-383. 被引量：2
7史建东,耿利川,秦永志.基于BSIFT的无人机影像快速拼接算法[J].测绘与空间地理信息,2015,38(5):124-127. 被引量：4
8曹玉东,刘艳洋,贾旭,王冬霞.基于改进的局部敏感哈希算法实现图像型垃圾邮件过滤[J].计算机应用研究,2016,33(6):1693-1696. 被引量：13
9顾志祥,谢龙恩,杜雨.文本相似度计算的Simhash算法的实现与改进[J].信息通信,2020,0(1):27-29. 被引量：5
10邹傲,郝文宁,靳大尉,陈刚,田媛.基于预训练和深度哈希的大规模文本检索研究[J].计算机科学,2021,48(11):300-306. 被引量：2

同被引文献63

1邓爱林,左子叶,朱扬勇.基于项目聚类的协同过滤推荐算法[J].小型微型计算机系统,2004,25(9):1665-1670. 被引量：147
2张海燕,丁峰,姜丽红.基于模糊聚类的协同过滤推荐方法[J].计算机仿真,2005,22(8):144-147. 被引量：25
3Sarwar B M. Sparsity, Scalability, and Distribution in Recommender Systems[D]. Minneapolis, USA: University of Minnesota, 2001.
4Sarwar B M, Karypis G, Konstan J, et al. Recommender Systems for Large-scale E-commerce: Scalable Neighborhood Formation Using Clustering[C]. In: Proceedings of the 5th International Conference on Computer and Information Technology. 2002.
5Rashid A M, Lam S K, Karypis G, et al. ClustKNN: A Highly Scalable Hybrid Model-&Memory-based CF Algorithm[C]. In: Proceedings of the KDD Workshop on Web Mining and Web Usage Analysis, at 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006.
6Rennie J D M, Srebro N. Fast Maximum Margin Matrix Factorization for Collaborative Prediction[C]. In: Proceedings of the 22nd International Conference on Machine Learning. New York: ACM Press, 2005: 713-719.
7Goldberg K, Roeder T, Gupta D, et al. Eigentaste: A Constant Time Collaborative Filtering Algorithm[J]. Information Retrieval, 2001, 4(2):133-151.
8Kim D, Yum B J. Collaborative Filtering Based on Iterative Principal Component Analysis[J]. Expert Systems with Applications, 2005, 28(4): 823-830.
9Aggarwal C C. On the Effects of Dimensionality Reduction on High Dimensional Similarity Search[C]. In: Proceedings of the 20th ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems. 2001: 256-266.
10Papagelis M, Plexousakis D, Kutsuras T. Alleviating the Sparsity Problem of Collaborative Filtering Using Trust Inferences[C]. In: Proceedings of the 3rd International Conference on Trust Management. Berlin, Heidelberg: Springer-Verlag, 2005: 224-239.

引证文献5

1王伟军,宋梅青.一种面向用户偏好定向挖掘的协同过滤个性化推荐算法[J].现代图书情报技术,2014(6):25-32. 被引量：13
2钟川,陈军.基于精确欧氏局部敏感哈希的改进协同过滤推荐算法[J].计算机工程,2017,34(2):74-78. 被引量：7
3张庆梅.舆情去重算法的研究与比较[J].电子设计工程,2017,25(14):23-27. 被引量：1
4王冰玉,吴振宇,沈苏彬,陈佳颖.社交媒体事件检测研究综述[J].计算机技术与发展,2018,28(9):105-111. 被引量：1
5徐运海,李博文,赖伟,史超.基于信令数据的多维度伴随计算分析[J].中国电子科学研究院学报,2022,17(6):572-576.

二级引证文献22

1叶春蕾,邢燕丽.基于LDA和社会网络中心度的研究生个性化检索推荐模型研究[J].图书情报工作,2015,59(13):104-110. 被引量：4
2毕达天,晁亚男.基于数字图书馆信息接受资源情境的推送服务研究[J].情报理论与实践,2015,38(11):40-45. 被引量：12
3贾忠涛,吴颖川,刘志勤.一种协同过滤算法在网络干扰过滤中的应用[J].计算机仿真,2016,33(1):284-287. 被引量：5
4周明建,赵建波,李腾.基于情境相似的知识个性化推荐系统研究[J].计算机工程与科学,2016,38(3):569-576. 被引量：5
5朱子江,刘东,刘寿强.基于用户行为的推荐算法研究[J].软件导刊,2017,16(8):43-45. 被引量：4
6王伟军,王阳,王玉珠,刘凯.移动商务用户个性化推荐采纳行为影响因素的实证研究[J].系统管理学报,2017,26(5):816-823. 被引量：31
7侯银秀,李伟卿,王伟军,张婷婷.基于用户偏好与商品属性情感匹配的图书个性化推荐研究[J].数据分析与知识发现,2017,1(8):9-17. 被引量：22
8蒋勋,苏新宁,唐明伟,蔡玉婷.适应情景演化推演的应急决策知识库协同架构研究[J].情报理论与实践,2017,40(11):67-72. 被引量：9
9贾伟洋,李书琴,李昕宇,刘斌.基于离散量和用户兴趣贴近度的协同过滤推荐算法[J].计算机工程,2018,44(1):226-232. 被引量：13
10张赟,常淑华,王李冬,沈兵虎,练益群.新媒体平台电视节目高效组织和浏览研究[J].电视技术,2017,41(7):64-68.

1米林.大数据实例及未来发展前景[J].软件工程师,2013(6):22-24.
2范联伟.浅谈聚类分析在大数据分析中的应用[J].中国电子商务,2014(17):67-67.
3韩云.互联网时代：电视会被抛在后面吗？[J].首播,2014,0(9):28-29.
4大数据[J].中国无线电,2014(9):43-43. 被引量：3
5高茂科.档案大数据的内涵[J].黑龙江档案,2013(6):6-6.
6潘智琦.关于大数据预测的冷思考——由《美国队长2:冬日战士》说起[J].电影评介,2014(9):60-61.
7孔彬.大数据,广电必须写好的大文章[J].中国数字电视,2013(5):42-45. 被引量：2
8李兰,谢勤岚.一种改进Harris-SIFT算子的图像匹配算法[J].舰船电子工程,2017,37(4):32-34. 被引量：4
9东方飞扬大数据智慧平台助力档案事业发展[J].城建档案,2013(9):15-15.
10林甲祥,樊明辉,陈崇成,江先伟.二阶段近似KNN离群挖掘算法与应用[J].计算机应用,2007,27(10):2598-2601.

计算机与现代化

2013年第12期

浏览历史

内容加载中请稍等...

一种改进的LSH/MinHash协同过滤算法被引量：5

参考文献10

二级参考文献28

共引文献12

同被引文献63

引证文献5

二级引证文献22

相关作者

相关机构

相关主题

浏览历史

一种改进的LSH/MinHash协同过滤算法 被引量：5

参考文献10

二级参考文献28

共引文献12

同被引文献63

引证文献5

二级引证文献22

相关作者

相关机构

相关主题

浏览历史

一种改进的LSH/MinHash协同过滤算法被引量：5