期刊文献+

基于类别信息和特征熵的文本特征权重计算 被引量:4

Feature weighting scheme based on category information and term entropy
下载PDF
导出
摘要 基于类别信息的特征权重计算方法对特征与类别的关系表达不够准确,即对于类别频率相同的特征无法比较其对类别的区分能力,因此要考虑特征在类内的分布情况。将特征的反类别频率(inverse category frequency,ICF)和类内熵(entropy)相结合引入到特征权重计算方案中,构造了两种有监督特征权重计算方案。在维吾尔文文本分类语料上进行的实验结果表明,该方法能够明显改善样本的空间分布状态并提高维吾尔文文本分类的微平均F 1值。 Feature weighting schemes based on category information is not accurate enough to express the relationship between features and categories.That is the classification ability of the features with the same category frequency can’t be compared,so the distribution of the features in the category should be considered.This paper combined the inverse category frequency(ICF)and inner category entropy of the features into the term weight calculation,and constructed two supervised feature weighting schemes.The experimental results on the Uygur text categorization dataset show that this method can obviously improve the spatial distribution of the samples and improve the micro average F 1 value of the Uygur text classification.
作者 阿力木江·艾沙 殷晓雨 库尔班·吾布力 李喆 Alimjan Aysa;Yin Xiaoyu;Kurban Ubul;Li Zhe(Network&Information Technology Center,Xinjiang University,Urumqi 830046,China;School of Information Science&Engineering,Xinjiang University,Urumqi 830046,China)
出处 《计算机应用研究》 CSCD 北大核心 2019年第11期3237-3239,3285,共4页 Application Research of Computers
基金 新疆维吾尔自治区自然科学基金资助项目(2016D01C068)
关键词 文本分类 文本特征 权重计算 类别频率 text classification text feature term weighting category frequency
  • 相关文献

参考文献7

二级参考文献56

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 2龚静,周经野.一种基于多重因子加权的文本特征项权值计算方法[J].计算技术与自动化,2007,26(1):81-83. 被引量:10
  • 3Yang Y.An evaluation of statistical approaches to text categorization[J].Information Retrieval,1999,1:69-90.
  • 4Sebastiani,F.Machine learning in automated text categorization[J],ACM Computing Surveys,2002,34(1):1-47.
  • 5Yang Y,Pedersen J.A Comparative Study on Feature Selection in Text Categorization[C]//Proceedings of the 14th International conference on Machine Learning,1997:412-420.
  • 6Yan J,Liu N,Zhang B,et al.OCFSj optimal orthogonal centroid feature selection for text categorization[C]//Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval,2005:122-129.
  • 7Yang Y,Liu X.A re-examination of text categorization methods[C]//Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval,1999:42-49.
  • 8Thorsten J,Text Categorization with Suport Vector Machines:Learning with Many Relevant Features[C]//Proceedings of the 10th European Conference on Machine Learning,1998:137-142.
  • 9Gerard S,Christopher B,Term-weighting approaches in automatic text retrieval[J].Information Processing and Management:an International Journal,1988,24(5),513-523.
  • 10Hassan S,Banea C,Random-Walk Term Weighting for Improved Text Classification[C]//Proceedings of TextGraphs:2nd Workshop on Graph Based Methods for Natural Language Processing,ACL,2006:53-60.

共引文献81

同被引文献38

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部