期刊文献+

一种基于TF-IDF的朴素贝叶斯算法改进 被引量:17

An Improved Naive Bayes Algorithm Based on TF-IDF
下载PDF
导出
摘要 目前对以朴素贝叶斯算法为代表的文本分类算法,普遍存在特征权重一致,考虑指标单一等问题。为了解决这个问题,提出了一种基于TF-IDF的朴素贝叶斯改进算法TF-IDF-DL朴素贝叶斯算法。该算法以TF-IDF为基础,引入去中心化词频因子和特征词位置因子以加强特征权重的准确性。为了验证该算法的效果,采用了搜狗实验室的搜狗新闻数据集进行实验,实验结果表明,在朴素贝叶斯分类算法中引入TF-IDF-DL算法,能够使该算法在进行文本分类中的准确率、召回率和F 1值都有较好的表现,相比国内同类研究TF-IDF-dist贝叶斯方案,分类准确率提高8.6%,召回率提高11.7%,F 1值提高7.4%。因此该算法能较好地提高分类性能,并且对不易区分的类别也能在一定程度上达到良好的分类效果。 At present,the text classification algorithm represented by the naive Bayes algorithm generally has the same feature weights and single index.In order to solve this problem,we propose an improved TF-IDF-based naive Bayes algorithm,TF-IDF-DL naive Bayes algorithm.Based on TF-IDF,this algorithm introduces decentralized word frequency factor and feature word position factor to enhance the accuracy of feature weights.In order to verify its effect,we use Sogou’s Sogou news dataset to conduct experiments.The experiment shows that the TF-IDF-DL algorithm is introduced into the naive Bayesian classification algorithm,which can make the algorithm perform well in the accuracy,recall and F 1 value in text classification.Compared with the domestic similar research TF-IDF-dist Bayesian scheme,the classification accuracy rate is increased by 8.6%,the recall rate is increased by 11.7%,and the F 1 value is increased to 7.4%,so the proposed algorithm can improve the classification performance better and achieve a great classification effect to some extent for the indistinguishable categories.
作者 许甜华 吴明礼 XU Tian-hua;WU Ming-li(School of Informatics,North China University of Technology,Beijing 100144,China)
出处 《计算机技术与发展》 2020年第2期75-79,共5页 Computer Technology and Development
基金 国家自然科学基金(61672040)
关键词 朴素贝叶斯 TF-IDF算法 去中心化 位置信息 特征权重 naive Bayes TF-IDF algorithm decentralization location information feature weight
  • 相关文献

参考文献9

二级参考文献72

共引文献218

同被引文献132

引证文献17

二级引证文献81

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部