期刊文献+

基于搜索结果的聚类算法

Optimization of Search Results Based on Clustering Algorithm
下载PDF
导出
摘要 当前的搜索引擎中,存在大量的冗余搜索结果,且不能对搜索结果进行指导分类。本文提出一种基于密度的聚类算法,能够有效地对搜索结果进行聚类优化和分类。该算法选取搜索结果中权重高于一定值的网页,提取网页的特征值与候选关键字,标注特征范围,再进行网页相似度比较,最大限度地消除冗余网页,并根据网页的候选关键字提供分类,从而提高搜索结果的精准性和满意度,达到更智能的效果。 Nowadays there are many redundancy pages in results of search engine,and the results are not classified.An optimization algorithm of webpage search results based on an improved DBSCAN(density-based spatial clustering of applications with noise) algorithm is proposed and effective to cluster and classify the results.The algorithm selects the webpages with search weights above a certain value from all search results,then it extracts the eigenvalue of pages and candidate keys,compares the pages similarity to maximize the elimination of duplication and redundancy pages.Meanwhile,classifications are provided in accordance with the candidate keys of pages,thereby the precision and satisfaction of search engine could be improved with the effect of more intelligence.
出处 《计算机与现代化》 2012年第11期35-38,共4页 Computer and Modernization
关键词 基于密度的聚类算法 网页相似度 聚类 冗余网页 DBSCAN algorithm page similarity clustering redundancy page
  • 相关文献

参考文献14

  • 1Sung Jin Kim, Sang Ho Lee. An improved computation of the pagerank algorithm [ C ]//Proceedings of the 24th BCS- IRSG European Colloquium on IR Research: Advances in Information Retrieval. 2002:73-85.
  • 2Shavlik J W, Dietterich T G. Readings in Machine Learn- ing[M]. San Mateo, CA: Morgan Kaufmann, 1990.
  • 3Kolda T G, O' Leary D P. A semi-discrete matrix decom- position for latent semantic indexing information retrieval [J]. ACM Transactions on Information Systems, 1998,16 (4) :322-346.
  • 4Giansalvatore Mecca, Salvatore Raunich, Alessandro Pap- palardo. A new algorithm for clustering search results[ J].Data & Knowledge Engineering, 2007,62 (3) :504-522.
  • 5沈盈洪,丰翔龙,黄荣游.基于网页聚类的搜索结果优化算法研究[J].计算机应用,2010,30(A01):51-53. 被引量:3
  • 6Alan Rusbridger. Democracy in tile Decade of Google[ DB/ OL]. http://www, guardian, co. uk/technology/2009/oct/ 17/communications-decade-democracy-guogle-rusbridger, 2009-10-17.
  • 7Huang Lan. A Survey on Web hfformation Retriewd Tech- nologies [ D ]. State University of New York, 2000.
  • 8Liping Jing, Michael K Ng, Joshua Z Huang. Knowledge- based vector space model for text clustering [ J ]. Knowl- edge and Information System, 2010,25( 1 ) :35-55.
  • 9Cfcero Nogueira dos Santos, Ruy Luiz Milidifi. Part-of-speech tagging[M]// Entropy Guided Transformation Learning: Al- gorithms and Applications. Springer, 2012:35-41.
  • 10Clough P. Plagiarism in Natural and Programming I,angua- ges : An Overview of Current Tools and Technologies [ D ]. University of Sheffield, 2000.

二级参考文献70

共引文献93

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部