期刊文献+

分级聚类与平面划分结合方法在网页分类中的应用 被引量:2

Combinations of Layered Clustering&Plan Partition and Its Applica tion in Web Page Classification
下载PDF
导出
摘要 文章研究分级聚类与平面划分结合方法在网页分类中的应用。阐述了网页分类问题中样本特征分布的特点和复杂性,分级聚类能够生成层次化的嵌套类,且具有较高的准确度,但具有较高的计算复杂度,不适合计算大量样本的计算问题。K-均值算法受初始聚类中心的选择影响较大,对于不规则分布的样本往往聚类的效果不佳。文章考虑利用少数样本和分级聚类算法进行样本集合的初始聚类中心的划分,再利用K-均值算法对整个样本集合做聚类,则既可以避免分级聚类算法的计算复杂又可充分利用K-均值算法的快速特点;另一方面则利用了分级聚类算法准确度高为确定初始聚类中心提供了可靠的方法。文中给出了纯K-均值方法、分级聚类与平面划分结合方法在解决文本分类问题上的实验结果。 This paper proposes combination of layered clustering&plans partition and its application in Web pages classification.In this paper the feature distribution and complexity of samples in Web pages classification are described.But for layered clustering method,layered nesting class can be generated and provided with upper nicety.By the way,layered clustering methods have more high computing complexity and are not suiting to large number of samples.K-mean methods are usually sensitive to initial clustering centers and propose bad results for irregular distributed samples.In the paper,firstly,part samples are used in layered clustering to generate original clustering centers.Secondly,K-mean methods are loaded continuing to classify the whole samples set.This strategy can avoid computing complexity of layered clustering methods and also take full advantage of fast classifying of K-mean method.On the other hand,this strategy imposes that layered clustering methods have high nicety and provide suitable initial clustering centers.Lastly,this paper provides Web pages clustering experiments for K -mean methods and combination of layered clustering&plans partition.
出处 《计算机工程与应用》 CSCD 北大核心 2004年第35期139-141,204,共4页 Computer Engineering and Applications
基金 浙江省教育厅科研项目(编号:20030717) 浙江师范大学计算机应用校级重点学科资助
关键词 文本聚类 层次聚类 K-均值 机器学习 计算复杂度 分级聚类 平面划分 网页分类 text clustering,layered clustering,K-mean,machine learning,computing complexity
  • 相关文献

参考文献6

  • 1Lewis D D,Schapore R E,Callan JP et al.Training algorithms for linear text classifiers[C].In:Proc Nineteenth International ACM SIGIR Conference on Research and Development in Information Retrieval,Zurich, 1996: 298~306
  • 2Cohen W W,Singer Y.Context-sensitive learning methods for text categorization[C].In:Proc Nineteenth International ACM SIGIR Conference on Research and Development in Information Retrieval,Zurich,1996:307~315
  • 3Lin Shian-Hua.Extracting classification knowledge of internet documents with mining term associations:A sementic approach[C].In:Proc International ACM SIGIR Conference on Research and Development in Information Retrieval ,Melbourne, 1998:241~249
  • 4范焱,郑诚,王清毅,蔡庆生,刘洁.用Naive Bayes方法协调分类Web网页[J].软件学报,2001,12(9):1386-1392. 被引量:53
  • 5李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报,2001,24(1):62-68. 被引量:108
  • 6Zhang Yizhong,Zhao Mingsheng,Wu Youshou.The automatic classification of web pages based on neural networks[C].In:Neural information processing, ICONIP2001 Proceedings,Shanghai,China,2001;2:570~575

二级参考文献2

共引文献150

同被引文献18

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部