摘要
本文在分析比较几种用于文本分类的特征选择方法的基础上,提出了一种基于术语频率和逆文档频率的特征选择方法TDF。采用KNN和NaiveBayes两种分类算法对该方法进行了测试。实验结果表明,TDF方法较其他几种方法有较好的分类精度。
This paper compares several feature selection methods in text categorization, proposes a new feature selection method based on term frequency and inverse document frequency. We evaluate the effect of feature selection by using KNN and Naive Bayes classifters. Experiments show that the method can gain better effect.
出处
《微计算机信息》
北大核心
2006年第08X期24-26,共3页
Control & Automation
基金
河南省自然科学基金(0211050100)
关键词
文本分类
特征选择
术语频率
逆文档频率
text categorization, feature selection, term frequency, inverse document frequency