摘要
A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.
A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.
基金
Sponsored bythe Huo Ying-Dong Education Foundation of China(91101)