摘要
科技文献代表了科技发展的方向,对其分析有助于准确把握科技前沿.本文提出一种基于层次聚类的改进算法用于对科技文献进行聚类研究,以便识别科技文献所关注的创新设计方向.该算法通过观测不同距离条件下孤立点数目的变化情况,自动计算并判断层次聚类算法中所需的聚类终止条件.这样既避免了层次聚类算法中需要预先输入终止条件的不足,又保持了层次聚类算法聚类精度高的优点,且改进算法的复杂度和普通层次聚类算法的一致.运用上述改进算法对200篇文献进行聚类运算,与k-means算法的对比实验证明,改进层次聚类算法聚类效果良好,从而验证了该算法的可行性.
Literatures indicate the development of technology.In the flied of techniques study, literature analysis is very helpful for research focus to keep ahead.It is a critical problem for the conventional clustering algorithm to get appropriate value parameters.To solve this problem,this paper proposes an improved hierarchical clustering algorithm to analyze literatures innovative field,which combines outlier detection with clustering.By regarding outliers as important information,the algorithm stops clustering process according to the outlier numerary transformation under different interval conditions. This algorithm keeps good qualities of clusters without additional parameters; meanwhile, its complexity is as same as the conventional algorithm. To verify the advantages of the improved algorithm, 200 literatures are adopted to evaluate the performance of the clustering algorithm, and the result of improved hierarchical clustering is better compared with k-means algorithm.
出处
《数值计算与计算机应用》
CSCD
北大核心
2009年第4期277-287,共11页
Journal on Numerical Methods and Computer Applications
基金
国家自然科学基金(50505017
50775111)资助项目
关键词
层次聚类算法
孤立点检测
创新设计
创新方向识别
hierarchical clustering
outlier detection
innovative design
innovative field identify