摘要
特征选择是文本分类中一个重要的课题.首先给出了一个新型文档频,然后把属性依赖度引入ID3并提出了一个基于优化ID3的属性约简算法,紧接着以此为基础,提出了一个新的特征选择方法.该特征选择方法使用改进的文档频初选特征并用所提属性约简算法消除冗余.仿真结果证明该特征选择方法是有效的.
Feature selection is an important topic in text classification. Firstly, a new document frequency is presented. And then, attribute dependence is introduced into ID3 and an attribute reduction algorithm based on optimized ID3 is proposed. Finally, a new feature selection method which combines the proposed attribute reduction algorithm with the new document frequency is provided. The new method uses the new document frequency to select feature and employs the proposed attribute reduction algorithm to eliminate redundancy. Simulation results show that the new feature selection method is promising.
出处
《暨南大学学报(自然科学与医学版)》
CAS
CSCD
北大核心
2010年第1期20-23,共4页
Journal of Jinan University(Natural Science & Medicine Edition)
基金
四川省科技计划项目(2008GZ0003)
四川省科技攻关项目(07GG006-019)
关键词
特征选择
文档频
ID3算法
属性约简
feature selection
document frequency
ID3 algorithm
attribute reduction