摘要
随着网络信息指数级的增长,如何高效地组织海量的文本信息成为众多终端信息查询的基本要求。本文利用神经网络的联想记忆原理,提出一种改进自组织映射(SOM)神经网络聚类算法来对这些信息进行索引和分类。改进SOM聚类算法通过文本的预处理和词汇权值的计算,SOM网络的训练过程以及多次聚类来细化各文本类别,最终产生概念空间。试验结果表明该算法对文本有很好的分类管理功能,便于文本检索。
As the number of online information increases in exponential, how to organize numerical information efficiefily has become the basic requirement of terminal information search. An algorithm based on improved SOM duster is presented to categorize text by using the concept of neural network. Pre- process of the text, the calculation of word weight and the training procedure of SOM network, and re - cluster are needed to categorization these information into small groups. After that, the corteept space is built. Test result shows that the proposed algorithm has outstanding categorization function, and can facilitate text indexing.
出处
《现代情报》
北大核心
2007年第9期162-164,共3页
Journal of Modern Information
关键词
SOM聚类
概念空间
文本分类
SOM cluster
concept space
text categorization