期刊文献+

基于指数分布族的类特定文本分类算法 被引量:2

Class-specifictext classification algorithm based on exponential family
下载PDF
导出
摘要 在文本分类中,选取一个高效的分类算法是提高文本分类准确度,缩短分类时间的关键。提出基于指数分布族的多项式贝叶斯类特定分类算法(exponential family-multinomial naive Bayes,EF-MNB),基于多项式模型构造了 N 个类的分布,利用类特定特征选择算法得到第 N 个类的特征子集及对应类的特征概率密度函数(probability density function,PDF),通过指数分布族构造了 N 个类的原始PDF估计表达式,给定 N 个类的训练集,得到了第 N 个类的最优PDF估计,并基于贝叶斯定理制定了分类规则。仿真结果表明,与基于文档主题生成模型和支持向量机(latent dirichlet allocation-support vector machine,LDA-SVM)的层次分析分类算法、改进的超球支持向量机(improved hyper-sphere support vector machine,IHS-SVM)文本分类算法和基于主成份分析和k最近邻(principal component analysis-k-nearest-neighbor,PCA-KNN)混合分类算法相比,EF-MNB类特定分类算法使用少量的时间就可获得更高分类准确率。 In text categorization, choosing an efficient classification algorithm is the key to improve the accuracy of text classification and shorten the classification time. This paper proposes a multinomial Bayesian-specific classification algorithm (EF-MNB) based on an exponential family, and constructs a distribution of N classes based on a polynomial model. Using the class specific feature selection algorithm to obtain the feature subset of the N th class and the feature probability density function of the corresponding class probability density function, the original PDF estimate expressions of N classes are constructed by exponential family distribution. Given the training sets of N classes, the optimal PDF estimates for the N th class are obtained, and the classification rules are formulated based on Bayes’ theorem. The simulation results show that compared with the hierarchical analysis classification algorithm based on latent dirichlet allocation and support vector machine (LDA-SVM), improved hyper-sphere support vector machine (IHS-SVM) hybrid classification algorithm and the principal component analysis- k -nearest-neighbor (PCA-KNN) hybrid classification algorithm, the EF-MNB class specific classification algorithm achieves higher classification accuracy in a small amount of time.
作者 刘云 黄荣乘 LIU Yun;HUANG Rongcheng(Faculty of Information Engineering andAutomation,Kunming University of Science and Technology,Kunming 650050,P.R.China)
出处 《重庆邮电大学学报(自然科学版)》 CSCD 北大核心 2019年第5期694-701,共8页 Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金 国家自然科学基金(61262040)~~
关键词 指数分布族 类特定特征选择 类条件概率密度函数 多项式朴素贝叶斯分类器 文本分类 exponential family class-specific feature selection class conditional probability density function multinomial naive Bayes classifier text classification
  • 相关文献

参考文献10

二级参考文献82

共引文献58

同被引文献13

引证文献2

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部