A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also con...A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.展开更多
基金Sponsored bythe Huo Ying-Dong Education Foundation of China(91101)
文摘A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.