摘要
针对传统的网页排序算法中容易出现的忽略搜索结果主题相关性和主题漂移的问题,提出了结合PCM聚类算法的网页排序,用来提高搜索结果中网页主题的相关性并减少其主题漂移。首先,通过查询某个主题,运用随机行走(RWM)的方法来计算两个网页之间的对称社会距离(SSD);然后,用SSD和PCM聚类算法对网页进行聚类,得到相关主题的各个社区,通过计算得到各个社区中成员属于该社区的概率表示;最后,根据各社区成员的概率和网页的推荐度对网页进行排序。实验结果表明,与PageRank算法相比,该算法搜索结果中网页主题的相关程度更高;另外,由于是针对某个主题的排序,该算法减少了主题漂移。
The paper proposed a page ranking algorithm based on PCM clustering algorithm in order to solve the problems that the topic relevance of search results are easily ignored and the topics are easily changed in the traditional page sorting algorithms. It improves the topic relevance of the search results and reduces the topic drift. Firstly, by inquiring a theme, random walk method (RWM) is used to calculate the two pages of the symmetrical social distance (SSD) between two web pages. Secondly, SSD and PCM clustering algorithm are used to cluster page and get each community of related topic, and obtain the probability of each member in every community group. Finally, according to the probability and recommended degree of the pages, the web pages are sorted. The experimental results show that, compared with the PageRank algorithm, the proposed page sorting algorithm based on PCM clustering algorithm can obtain a search result with more relevant topic. Because it targets a subject sort, the algorithm reduces the topic drift.
出处
《计算机工程与科学》
CSCD
北大核心
2013年第4期144-149,共6页
Computer Engineering & Science
基金
江西省教育厅科技资助项目(GJJ11463)