摘要
提出一种改进的相对熵特征选择方法。该方法基于一个类别的文本属性通常由有限个特征词决定的特点,利用相对熵的基本原理,选取最能区分类内与类外文本的词作为文本分类的特征。在特定文本语料库中进行的实验结果表明,该方法可以降低文本特征维数,提高分类准确率。
This paper proposes a new feature selection method based on relative entropy for feature selection,which is one of the key technologies in text categorization.Based on that text category is decided by limited Key words,this paper uses relative entropy to select the words distinguishing effectively between one category and another.Experimental results show that the proposed method can effectively reduce feature dimension and improve precision rate.
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第10期167-169,共3页
Computer Engineering
关键词
特征选择
相对熵
文本分类
语料库
feature selection
relative entropy
text categorization
corpus