摘要
预处理是文本分类中的一个重要环节,预处理结果的好坏不仅关系到分类的准确度,而且关系到训练时间的长短和影响到分类的速度。采用一种基于词性选择的文本预处理方法进行文本预处理,该方法与传统方法进行了实验对比,结果显示,该方法降低了特征维数,同时保证了分类性能。实验表明该方法能够获得较好的分类效果。
The text pretreatment is in a text classification important link, the pretreatment result qual- ity not only relates the classification the accuracy, moreover relates the training time the length and affects the classified speed. This article used one kind to carry on the text pretreatment based on the lex- ical category choice text pretreatment method, this method and the conventional routes have carried on the experiment contrast, finally demonstrated that this method reduced the characteristic dimension, simultaneously has guaranteed the classified performance. The experiment indicated that this method can obtain the good classified effect.
出处
《情报科学》
CSSCI
北大核心
2009年第5期717-719,738,共4页
Information Science
关键词
文本分类
停用词
词性
文本预处理
text categorization
stop-words
part of speech
text pretreatment