摘要
针对文本聚类过程中簇的数量无法动态改变及文本分类结果不够精确等问题,文中引入并改进了成长型分级自组织映射(Growing Hierarchical Self-Organizing Map,GHSOM)算法,以提高文本聚类的精确度,并尝试使用改进后的GHSOM算法构建民航航空法规知识地图。GHSOM算法为多层分级结构,每一层包含数个独立的成长型SOM,通过增长规模来在一定程度上更加详细地描述数据集,提高分类效果。在此基础上,以民用航空领域的各项法律、法规条文为样本资料集,结合中文分词、关键词提取、文件向量等技术手段,利用改进的GHSOM算法对文本进行聚类分析,并最终完成民航航空法规知识地图的构建。实验结果表明,所提算法具有显著的文本聚类能力,利用该算法构建的民航航空法规知识地图取得了较好的分类效果,其精确度、召回率等评价指标也获得了进一步的提升。
Aiming at the problems that the number of clusters cannot be dynamically changed and the text classification results are not accurate enough during the text clustering process,this paper introduces and improves the Growing Hierarchical Self-Organizing Map(GHSOM)algorithm to improve text clustering accuracy,and tries to use the improved GHSOM algorithm to build a knowledge map of civil aviation regulations.The GHSOM algorithm has a multi-level hierarchical structure,and each layer contains several independent growing SOMs.Through the growth of the scale,the data set is described in more detail to a certain extent,and the classification effect is improved.Based on this,taking various laws and regulations in the field of civil aviation as the sample data set,combined with Chinese word segmentation,keyword extraction,file vector and other technical means,the text is clustered and analyzed using the improved GHSOM algorithm,and finally the construction of civil aviation regulation knowledge map is completed.Experimental results show that the proposed algorithm has significant text clustering ability.The civil aviation regulation knowledge map constructed by this algorithm has achieved good classification results,and its evaluation indicators such as accuracy and recall rate have been further improved.
作者
张浩洋
周良
ZHANG Hao-yang;ZHOU Liang(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China)
出处
《计算机科学》
CSCD
北大核心
2020年第S01期429-435,共7页
Computer Science