摘要
针对传统的特征选择使用阈值过滤导致有效信息丢失的问题,提出一种粗糙集的文本特征选择方法。该方法以核为起点利用特征属性的重要性和依赖性作为启发式信息进行特征选择,使文本的特征维数得到一定程度的降低。实验表明,此算法不仅易于实现而且能够有效降低特征数目,提高分类效率。
Aiming at the problem that in traditional feature selection the use of threshold filtering often leads to the loss of effective information, a new algorithm based on rough set is proposed for text feature selection. The algorithm takes core as the begging, uses attributes' significance and dependency as the heuristic information to do feature selection,which greatly reduces the dimension of document's eigenvector. Experimental results show that the algorithm is easy to implement and can effectively reduce the features' number, as well as improve the accuracy of classification.
出处
《计算机应用与软件》
CSCD
2010年第3期4-5,74,共3页
Computer Applications and Software
基金
国家自然科学基金项目(60573179)
关键词
粗糙集
特征选择
属性重要性
属性依赖性
Rough set Feature selection Attribute significance Attribute dependency