摘要
提出了一种基于词条互信息(WM I)值的统计降维和Kohonen网络(SOFM网)相结合的文本聚类方法,WM I值的方法侧重考虑文本特征项之间的互信息进行降维,可提高特征选择的效率,并使其更趋实用化。采用Kohonen网络进行文本聚类,其学习率函数是随时间单调下降的退火函数,实验结果表明了这种结合方法较一般的降维方法得到的聚类结果具有较高的聚类精度。
A new text clustering method based on WMI(words mutual information) statistical reduction dimension approach and Kohonen network( SOFM network) was proposed. This method considered the mutual information between text features and improved the efficiency of reduction dimension largely. Text clustering used Kohonen network whose learning rate was dropped with time. The experiments indicate that the clustering result obtained using this combined method has much higher precision than that using general reduction dimension approach.
出处
《计算机应用》
CSCD
北大核心
2005年第10期2328-2330,2333,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60275020)