期刊文献+

文本分类中特征权重算法的改进 被引量:14

Improvement of Feature Weighting Algorithm in Text Classification
下载PDF
导出
摘要 TFIDF是文档特征权重表示常用方法.该方法简单易行,但忽略了特征词在各个类别中的分布情况,不能真正地反映特征词对区分每个类的贡献.针对这个不足,本文提出了BOR-TFIDF,来重新调整每个特征词对各个类别的区分度,即修正各个特征词的权重,并用分类器来验证其有效性.该方法优于原来的TFIDF算法,实验表明了改进的策略是可行的. TFIDF is a kind of common methods used to measure the terms in a document. The method is easy but ig- nores the distribution of the feature in each class. So, it can not really reflect each feature' s contribution to each class. Aiming at this shortage, we put forward the BOR-TFIDF and use it to readjust each feature' s differentiation to each class, i.e. , modifies each feature' s weight. Then the classifier is used to check its validaty. The method is better than traditional TFIDF and proves that the BOR-TFIDF method is feasible.
出处 《南京师范大学学报(工程技术版)》 CAS 2008年第4期95-98,149,共5页 Journal of Nanjing Normal University(Engineering and Technology Edition)
基金 教育部留学回国人员启动基金 中科院软件所开放课题基金(SYSKF0701) 福州大学科技发展基金(2005-XQ-13) 福建省教育厅基金(JB06023)资助项目
关键词 文本分类 特征权重 TFIDF 类别区分 BOR-TFIDF text classification, feature weight, TFIDF, class difference, BOR-TFIDF
  • 相关文献

参考文献9

  • 1[2]Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1 -47.
  • 2[3]Lewis D D,Na(i)ve Bayes.The independence assumption in information retrieval[C]// The 10th European Conf on Machine Learning.New York:Springer-Verlag,1998.
  • 3[4]Yiming Yang,Xin Liu.A re-examination of text categorization methods[C]// SIGIR' 99.New York:ACM Press,1999:42-49.
  • 4[5]Yang Y,Chute C G.An example-based mapping method for text categorization and retrieval[J].ACM Trans on Information Systems,1994,12(3):252-277.
  • 5[6]Han E H,Karypis G.Centroid-based document classification:analysis and experimental results[C]// Proc of PKDD' 00.London:Springer-Verlag,2000:424-431.
  • 6[7]Schapire R E,Singer Y.Improved boosting algorithms using confidence-rated predications[C]// Proc of the 11th Annual Conf on Computational Learning Theory.Madison:ACM Press,1998:80-91.
  • 7[8]Joachims T.Text categorization with support vector machines:learning with many relevant features[C]// The 10th European Confon Machine Learning.Berlin:Springer,1998:137-142.
  • 8[12]李荣陆.文本分类系统[DB/OL].http://www.nlp.org.cn/docs/download.php?doc_id=102.2004-08-19.Li Ronglu.Text clsssication system[DB/OL].Data Set,hap://www.nlp.org.cn/docs/download.php?doc_id=102.2004-08-19.(in Chinese)
  • 9[13]David D,Lewis.Reuters-21578,Test Collections[R/OL].http://www.daviddlewis.com/resources/testcollections/reuters21578/.1996.

同被引文献117

引证文献14

二级引证文献190

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部