期刊文献+

基于MapReduce的朴素贝叶斯算法文本分类方法 被引量:6

Text Classification Method of Naive Bayes Algorithm Based on MapReduce
下载PDF
导出
摘要 为了解决传统串行朴素贝叶斯算法分类性能低下的问题,提出一种基于朴素贝叶斯算法的并行化分类方法。选取多项式朴素贝叶斯,搭建Hadoop集群,通过卡方检验选取特征词,利用词频-逆文本频率指数方法计算出每个特征项的权值,并求出每类的权重总和,将权值应用到朴素贝叶斯公式中得到分类结果。实验结果表明:在该集群上设计的并行化朴素贝叶斯分类方法较比传统朴素贝叶斯方法,其精确率,召回率,F1值分别至少提高了7.66%,7.56%,11.98%,且用时更短,说明本文的方法能够提高处理文本的时间效率。 To solve the problem of low classification performance of traditional serial Naive Bayesian algorithms,a parallelized Naive Bayesian classification method was proposed.Polynomial Naive Bayesian was selected and Hadoop cluster was built.First,we selected feature words by the chi-square test.Then,we computed weights of each feature word and sum of weights of each categories by the Term Frequency-inverse document frequency approach.Finally,the weighs were applied to Naive Bayesian formula to get the classification results.Experimental results show that the accuracy,recall and F1 value of the proposed approach are 7.66%,7.56%and 11.98%higher than those of the traditional Naive Bayes method,respectively.Furthermore,the time is shorter,which shows that the method can improve the time efficiency of text processing.
作者 张晨跃 刘黎志 邓开巍 刘杰 ZHANG Chenyue;LIU Lizhi;DENG Kaiwei;LIU Jie(Hubei Key Laboratory of Intelligent Robot(Wuhan Institute of Technology),Wuhan 430205,China)
出处 《武汉工程大学学报》 CAS 2021年第1期102-105,共4页 Journal of Wuhan Institute of Technology
基金 2017年度湖北省教育厅科学研究计划指导性项目(B2017051)。
关键词 朴素贝叶斯 分类 并行化 MAPREDUCE Naive Bayes classification parallelization MapReduce
  • 相关文献

参考文献14

二级参考文献106

共引文献250

同被引文献40

引证文献6

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部