摘要
特征选择是中文文本分类过程中的一个重要过程,特征项选择的优劣直接影响文本分类的准确率。在分析几种特征选择方法的基础上,提出一种类别区分词的特征选择方法。实验结果表明,类别区分词的特征选择方法的分类效率高于传统方法,从而验证了该方法的有效性。
Feature selection is a key process in Chinese text categorisation,which will directly affect the accuracy of text categorisation.After analysing some of the feature selection methods,a new feature selection method based on class-discriminating words is proposed.Experimental results show that the new feature selection method is superior to traditional approaches in categorisation efficiency,therefore verifies the validity of the new method.
出处
《计算机应用与软件》
CSCD
北大核心
2013年第3期193-195,共3页
Computer Applications and Software
关键词
文本分类
特征选择
类别区分词
信息增益
互信息
期望交叉熵
Text categorisation Feature selection Class discriminating word Information gain Mutual information Excepted cross entropy