摘要
文本聚类的核心问题是找到一种优化的聚类算法对文本向量进行聚类,是典型的高维数据聚类,提出一种基于自组织神经网络SOM和人工免疫网络aiNet的两阶段文本聚类算法TCBSA。新算法先用SOM神经网络进行聚类,把高维的文本数据映射到二维的平面上,然后再用aiNet对文本聚类。该方法利用SOM神经网络对高维数据降维的优点,克服了人工免疫网络对高维数据的聚类能力差的缺点。仿真实验结果表明该文本聚类算法不仅是可行的,而且具有一定的自适应能力和较好的聚类效果。
The core of text clustering is to find an optimised clustering algorithm to cluster the text vectors.Text clustering is a typical high dimensional data clustering.In this thesis we put forward a two-phase text clustering algorithm TCBSA based on self-organising neural network SOM and artificial immune network aiNet.First,the SOM neural network is used by the new algorithm for clustering and is used to map the high dimensional text data to 2-dimensional plane,in the second phase the aiNet is then used to cluster the text.The algorithm takes the advantage of SOM neural network in its reducing the dimensions of high dimensional data,and also overcomes the disadvantage of artificial immune network in its poor ability of clustering high dimensional data.Simulation experiments proved that the new text clustering algorithm is feasible;besides,it has certain adaptive ability and pretty good clustering effect as well.
出处
《计算机应用与软件》
CSCD
2010年第5期118-120,124,共4页
Computer Applications and Software
基金
河南省教育厅科研项目(2008A510007)
关键词
文本聚类
相似度
向量空间模型
人工免疫网络
自组织神经网络
Text clustering Similarity Vector special model Artificial immune network Self-organising neural network