期刊文献+

基于NewTF-IDF的新闻文本特征提取算法研究 被引量:6

Research on Feature Extraction Algorithm of News Text Based on NewTF-IDF
下载PDF
导出
摘要 由于新闻文本种类较多、内容繁杂,为更好地提取文本主题特征词,提出了一种新的特征提取算法NewTF-IDF.传统的TF-IDF算法仅仅以逆文档率对词频进行加权,忽略了词性、词频、词位置、词跨度等其他方面的因素对词语信息量的影响,忽略了词语在不同文档中的分布对关键词重要度的影响.NewTF-IDF算法对TF-IDF算法做了多组合特征因子和离散度两个方面的改进,使特征词的加权方式更加科学.实验证明,NewTF-IDF算法在特征词提取方面具有更好的性能. Due to the variety and complexity of news text,a new feature extraction algorithm,NewTF-IDF,is proposed to better extract text subject feature words.The traditional TF-IDF algorithm only weighted the word frequency by the inverse document rate,ignoring the influence of other factors such as part of speech,word frequency,word position,word span on the word information amount,and ignoring the influence of word distribution in different documents on the importance of keywords.NewTF-IDF algorithm is improved on TF-IDF algorithm in terms of multiple combination feature factors and dispersion,which makes the weighted way of feature words more scientific.The experimental results show that NewTF-IDF algorithm has better performance in feature word extraction.
作者 黄敏 闫思贤 HUANG Min;YAN Sixian(School of Software,Zhengzhou University of Light Industry,Zhengzhou 450002,China;School of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou 450002,China)
出处 《湖北民族大学学报(自然科学版)》 CAS 2021年第2期187-192,共6页 Journal of Hubei Minzu University:Natural Science Edition
基金 河南省高等学校重点科研项目(19A520009).
关键词 特征提取 TF-IDF 特征因子 离散度 NewTF-IDF feature extraction TF-IDF feature factor dispersion NewTF-IDF
  • 相关文献

参考文献8

二级参考文献85

共引文献79

同被引文献70

引证文献6

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部