摘要
特征加权是一种依据特征在分类中起到的作用为特征赋予相应权重的过程,是为了提高分类性能而为特征标记权重的策略。基于类空间密度提出了两个新的特征加权算法:tf*ICSDF和ICSDF-based。实验中,在RCV1-4和20 Newsgroups数据集上,采用支持向量机分类器将提出的方法进行了验证。实验结果显示,该方法相比传统的特征加权方法(prob-based、tf*icf和icf-based)可以有效地提升文本分类性能。
Term weighting is a weighting process for terms which is based on the term's effect to the classification. Term weighting is a strategy that assigns weights to terms in order to improve the performance of text categorization. We propose two new class space density based term weighting scheme,tf-ICSDF and ICSDFbased. In the experiments,we investigate the effects of the proposed scheme on the RCV1-4 and 20 Newsgroups datasets using the SVM(Support Vector Machine) as classifiers. The results show that the proposed scheme outperform other traditional term weighting schemes,such asprob-based,tf-icf and icf-based.
出处
《吉林大学学报(信息科学版)》
CAS
2017年第1期92-97,共6页
Journal of Jilin University(Information Science Edition)
基金
长春市科技局基金资助项目(14KP009)
吉林省科技厅基金资助项目(20130206041GX)
吉林省发改委基金资助项目(2015Y56
[2013]779)
关键词
特征加权
类空间密度
文本分类
机器学习
term weighting
class space density
text categorization
machine learning