期刊文献+

一种改进的朴素贝叶斯不平衡数据集分类算法 被引量:7

An improved Naive Bayes classification algorithm for unbalanced data sets
下载PDF
导出
摘要 当训练集中各个类别的样本分布不均匀且存在数据稀疏问题时,朴素贝叶斯算法分类不够准确。针对此问题,提出了一种基于数据平滑与加权补集的朴素贝叶斯文本分类算法,该算法引入数据平滑算法计算贝叶斯模型中缺失特征的补偿概率,克服数据稀疏问题;利用当前类别补集的特征来表示当前类别的特征,解决训练集中各个类别的样本分布不均匀时,分类器容易倾向于大类别而忽略小类别的问题。实验结果表明,在样本集分布不均衡时,该算法比传统的朴素贝叶斯分类算法分类效果更好。 When training samples of each class are distributed unevenly and sparsely,the classification efficiency of Naive Bayes is not accurate enough. To solve this problem,a Naive Bayes text classification algorithm based on data smoothing and weighted complementary set was proposed,using data smoothing algorithm to calculate the compensation probability of the missing feature in Naive Bayes model,which can solve the data sparseness problem. Since training samples of each class are distributed unevenly,it uses features of current categories' complementary set to represent the features of current categories,which can solve the problem of recognizing the larger category and ignoring the smaller category. The experimental results show that the classification efficiency of the proposed algorithm is better than the traditional Naive Bayes when the training data set is uneven.
出处 《黑龙江大学自然科学学报》 CAS 北大核心 2015年第5期681-686,共6页 Journal of Natural Science of Heilongjiang University
基金 黑龙江省自然科学基金资助项目(ZD201403) 林业公益性行业科研专项经费(201504307)
关键词 朴素贝叶斯 文本分类 数据平滑 加权补集 Naive Bayes text categorization data smoothing weighted complementary set
  • 相关文献

参考文献5

二级参考文献68

共引文献47

同被引文献65

引证文献7

二级引证文献102

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部