期刊文献+

一种基于方差的文本特征选择算法 被引量:6

Text Feature Selection Algorithm Based on Variance
下载PDF
导出
摘要 中文文本分类中传统特征选择算法在低维情况下分类效果不佳。为此,提出一种结合方差思想的评估函数,选出具有较强类别信息的词条,在保证整体分类性能不下降的同时,提高稀有类别的分类精度。采用中心向量分类器,在TanCorpV1.0语料上进行实验,结果表明,该方法在低维空间优势明显,与常用的文档频率、信息增益等9种特征选择算法相比,宏平均值均有较大提高。 The effectiveness of traditional feature selection method is not good when feature dimension is low.Anew method based on variance is proposed to solve this problem.This approach can select class information words in order to maintain categorization accuracy and improve the performance of rare classes.This paper gives a comparative analysis between the new method and other traditional feature selection methods such as Document Frequency(DF),Information Gain(IG),Mutual Information(MI),Chi-square Statistics(CHI),etc.Experiment takes Rocchio as the evaluation classifier.Experimental results on TanCorpV1.0 corpora show that the new feature selection Variance Feature Selection Method(VFSM) outperforms the traditional ones when using macro-averaged-measures F1.
作者 袁轶 王新房
出处 《计算机工程》 CAS CSCD 2012年第12期155-157,161,共4页 Computer Engineering
关键词 文本分类 特征选择 方差 类别信息 宏平均 text categorization feature selection variance class information macro-averaged-measures
  • 相关文献

参考文献7

二级参考文献23

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:96
  • 3寇莎莎,魏振军.自动文本分类中权值公式的改进[J].计算机工程与设计,2005,26(6):1616-1618. 被引量:25
  • 4尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 5Dunija Mladenic,Marko Grobelnik.Feature selection on hierarchy of web documents [J] .Decision Support Systems, 2003,35:45 -87.
  • 6Zhi-hua,Zhou KaiJiang,Ming Li.Multi-instance learning based web mining[J]Applied Intelligenee, 2005,22:135-147.
  • 7Y Yang, J O Pedersen. A comparative study on leature selection in text categorization [C]. In: D H Fisher, ed. Proc of the 14th Int'l Conf on Machine Franicisco: Morgan Kaufmann, Learning ( ICML-97 ) . San 1997. 412-420
  • 8Ying Liu. A comparative study on feature selection methods for drug discovery [J]. Chemical Information Computer Science, 2004, 44:1823-1828
  • 9Stewart M Yang, Xiao-Bin Wu, Zhi-Hong Deng, et al. Modification of feature selection methods using relative term frequency [C]. ICMLC-2002, Beijing, 2002
  • 10J R Quinlan. Induction of decision trees [J]. Machine Learning, 1986, 1(1): 81-106

共引文献569

同被引文献51

引证文献6

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部