摘要
分析几种常见的特征选择评价函数,将权值计算函数应用于特征选择,提出一种新的基于改进TFIDF的文本特征选择评价函数,即TFIDF-Dac。它从提高特征项的类区分能力角度考虑,将特征项在类间的分布信息引入公式,弥补了传统的TFIDF的不足。实验测试表明,使用改进的特征选择方法能够有效提高文本分类的准确度。
Analyzes several common evaluation functions for feature selection, then applies the terms weight function in feature selection, proposes a new evaluation function based on improved TFIDF method called TFIDF-Dac. In order to show the ability of separating from categories, introduces the distribution information between categories of feature item in this new method and it made up the disadvantages of the traditional TFIDF. Experiments have proved that the improved feature selection method can effectively improve the precision of the text categorization result.
出处
《现代计算机》
2009年第7期34-36,86,共4页
Modern Computer
关键词
文本分类
特征选择
评价函数
TFIDF
Text Classification
Feature Selection
Evaluation Function
TFIDF