期刊文献+

相似度算法分析与比较研究 被引量:6

Research on the Analysis and Comparison on Similarity Algorithm
下载PDF
导出
摘要 针对RSS阅读器中冗余信息带来的不便,在采用中文分词和TF.IDF算法计算相似度进行预处理后,选取Levenshtein、余弦夹角法、Jaccard这三种相似度算法进行冗余信息鉴别。详细讨论这些方法的特征,并从实际应用的角度对这些方法的长处和不足做分析与比较,并选择Jaccard算法实现一个数据过滤机制。 In order to overcome the disadvantages of redundant RSS information, after using technologies of Chinese Segmentation and TP-IDF algorithm as pretreatment for similarity algorithm com- parison, makes the comparison among Levenshtein, Cosine ratio and Jaccard algorithm. Dis- cusses the features of these algorithms and compares the strengths and weaknesses. And intro- duces a simple data filtration mechanism by using optimal Jaccard algorithm.
作者 陈天 刘文浩
出处 《现代计算机》 2012年第12期18-20,共3页 Modern Computer
关键词 计算机应用技术 TP·IDF 相似度计算:ICTCLAS Computer Applications Technology TP.IDF Similitude Calculate ICTCLAS(Institute of ComputingTechnology, Chinese Lexical Analysis System)
  • 相关文献

参考文献4

二级参考文献17

  • 1屈培,葛蓁.Nutch-0.8.1中二分法中文分词的实现[J].计算机时代,2007(7):9-11. 被引量:5
  • 2Mc Callum A K, Weller B. Conditional Models of Identity Uncertainty with Application to Noun Coreference [ R]. NIPS, 2004.
  • 3Li X Morie P, Roth D. Semantic Interation in Text: From Ambiguous Names to Identifiable Entities[ M ]. AI Magazine: Special Issue on Semantic Integration, 2005.
  • 4Monge A E, Elkan C P. The Field Matching Problem: Algorithms and Apphcations[ C]//Proceedings of the Second International Conference on Knowledge Disco-very and Data Mining(KDD-96). Portland: OR, 1996, 8: 267- 270.
  • 5Navarro G. A Guided Tour to Approximate String Matching[J]. ACM Computing Surveys,2001,33(1) :31 - 88.
  • 6Chaudhuri S, Ganjam K, Ganti V, et al. A Robust and Ef-ficient Fuzzy Match for Online Data Cleaning[C]//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. San Diego: CA, 2003:313 - 324.
  • 7Cohen W W, Ravikumar P, Fienberg S E. A Comparision of String Distance Metrics for Name-Matching Tasks[ C]// Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web. Acapulco: Mexico, 2003,8 : 73 - 78.
  • 8Ananthakrishna R, Chaudhuri S, Ganti V. Eliminating Fuzzy Duplicates in Data Warehouse [ C] // International Conference on Very Large Data Base (VLDB). Hong Kong : 2002 : 586 - 597.
  • 9Kalashnikov D V, Mehrotra S, Chen Z. Exploiting Relationships for Domain-Independent Data Cleaning [ C ] // SIAM International Conference on Data Mining, 2005, 4: 21 - 23.
  • 10Bhattacharya I, Getoor L. Iterative Record Linkage for Cleaning and Integration[C]//SIGMOD 2004 Workshop on Research Issues on Data Mining and Knowledge Discovery, 2004, 6:11- 18.

共引文献17

同被引文献36

  • 1宋雅婷,徐天伟.基于用户兴趣的个性化推荐技术综述[J].云南大学学报(自然科学版),2012,34(S1):20-23. 被引量:6
  • 2Liqin Wei, Xueqing Li, Lei Tang. History and Asym-metric Sierpinski Carpet Based Employment Recom-mendation[C]//Proceedings of the 2012 IEEE Interna-tional Symposium on IT in Medicine Education, IT-ME, 2012:48-52.
  • 3Jobvite 2011 Social Recruiting Survey. Dec. 2011. ht-tp;//recruiting. jobvite. com/.
  • 4Yao Lu,Sandy EL Helou,Denis Gillet. A Recommen-der System for Job Seeking and Recruiting. WWW,2013:963-966.
  • 5Wenxing Hong, Lei Li, Tao Li, et al. iHR: an onlinerecruiting system for Xiamen Talent Service Center.KDD,2013:1177-1185.
  • 6X. Cai,M. Bain, A. Krzywicki, et al. Collaborativefiltering for people to people recommendation in socialnetworks. Advances in Artificial Intelligence, 2011 :476-485.
  • 7Lu,Y.,Helou, S. E.,Gillet, D. Analyzing UserPatterns to Derive Design Guidelines for Job Seekingand Recruiting Website[C]//Proceeding of the FourthInternational Conferences on Pervasive Patterns andApplications,2012:11-16.
  • 8邓薇,王绪本,李文超,毛立峰.2.5D-FDTD数值算法的建立和模拟[J].地球物理学进展,2008,23(1):82-88. 被引量:3
  • 9陈雪峰,李树刚.基于BP神经网络的虚拟物品个性化设计推荐[J].计算机工程,2008,34(10):193-195. 被引量:3
  • 10许海玲,吴潇,李晓东,阎保平.互联网推荐系统比较研究[J].软件学报,2009,20(2):350-362. 被引量:542

引证文献6

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部