摘要
当前的搜索引擎中,存在大量的冗余搜索结果,且不能对搜索结果进行指导分类。本文提出一种基于密度的聚类算法,能够有效地对搜索结果进行聚类优化和分类。该算法选取搜索结果中权重高于一定值的网页,提取网页的特征值与候选关键字,标注特征范围,再进行网页相似度比较,最大限度地消除冗余网页,并根据网页的候选关键字提供分类,从而提高搜索结果的精准性和满意度,达到更智能的效果。
Nowadays there are many redundancy pages in results of search engine,and the results are not classified.An optimization algorithm of webpage search results based on an improved DBSCAN(density-based spatial clustering of applications with noise) algorithm is proposed and effective to cluster and classify the results.The algorithm selects the webpages with search weights above a certain value from all search results,then it extracts the eigenvalue of pages and candidate keys,compares the pages similarity to maximize the elimination of duplication and redundancy pages.Meanwhile,classifications are provided in accordance with the candidate keys of pages,thereby the precision and satisfaction of search engine could be improved with the effect of more intelligence.
出处
《计算机与现代化》
2012年第11期35-38,共4页
Computer and Modernization