摘要
针对传统共词聚类分析法中共词矩阵构建不能全面反映主题词之间的关联问题,提出了基于高频主题词共现于同一篇文献多种格式内容构建共词矩阵的方法,针对传统聚类算法对于类团非球状且类团大小相异较大导致聚类效果不理想等问题,利用改进的CRUE聚类算法对共词矩阵聚类。并对PubMed中肺癌领域相关文献进行共词聚类分析,实验论证了改进后共词聚类分析方法的可行性。
The co-word matrix in the current co-word clustering analysis can not fully reflect the connection between the keywords. This paper proposes a new method to build co-word matrix based on the high frequency keywords co-occurrence in the same paper with variety of formats. The shortcomings of traditional clustering algorithms, such as poor performance in non-spherical cases and difference in size clusters, are pointed out. The paper proposes an improved CRUE algorithm to cluster the Co-word matrix. The new co-word clustering analysis has been made of lung cancer in PubMed, which proves its feasibility.
出处
《电子科技》
2016年第2期53-57,共5页
Electronic Science and Technology
基金
国家高科技研究发展计划(863)基金资助项目(2014AA021502)