摘要
不等式最大熵模型较为成功地缓解了文本分类任务中的过拟合问题,但它使用的特征选择算法不能完全发挥不等式最大熵的最大优势。针对该问题提出采用改进的顺序前进式选择算法,提高文本分类任务中的识别率,试验结果证明该算法能够更准确地选出文本代表特征,对不等式最大熵模型的分类成绩有一定的改善。
Inequality maximum entropy method has alleviated data sparseness with flexible modeling capability more successfully than other probabilistic models in text classification tasks, but feature selection algorithm used by the model can not fully bring its advantage. This paper proposes a new feature selection method. It improves the recognition rate in text classification. Experimental result shows that this algorithm works more effectively in selecting representative features and improves the text classification performance a lot.
出处
《计算机工程》
CAS
CSCD
北大核心
2009年第18期182-184,共3页
Computer Engineering
关键词
不等式最大熵
特征选择
文本分类
inequality maximum entropy
feature selection
text classification