在文本分类中,选取一个高效的分类算法是提高文本分类准确度,缩短分类时间的关键。提出基于指数分布族的多项式贝叶斯类特定分类算法(exponential family-multinomial naive Bayes,EF-MNB),基于多项式模型构造了 N 个类的分布,利用类特...在文本分类中,选取一个高效的分类算法是提高文本分类准确度,缩短分类时间的关键。提出基于指数分布族的多项式贝叶斯类特定分类算法(exponential family-multinomial naive Bayes,EF-MNB),基于多项式模型构造了 N 个类的分布,利用类特定特征选择算法得到第 N 个类的特征子集及对应类的特征概率密度函数(probability density function,PDF),通过指数分布族构造了 N 个类的原始PDF估计表达式,给定 N 个类的训练集,得到了第 N 个类的最优PDF估计,并基于贝叶斯定理制定了分类规则。仿真结果表明,与基于文档主题生成模型和支持向量机(latent dirichlet allocation-support vector machine,LDA-SVM)的层次分析分类算法、改进的超球支持向量机(improved hyper-sphere support vector machine,IHS-SVM)文本分类算法和基于主成份分析和k最近邻(principal component analysis-k-nearest-neighbor,PCA-KNN)混合分类算法相比,EF-MNB类特定分类算法使用少量的时间就可获得更高分类准确率。展开更多
Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein ...Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein sequence were defined to describe the protein sequence.A new method to analyze the similarity/dissimilarity of protein sequence was proposed based on the conditional probability of the protein sequence.Finally,the protein sequences of ND6(NADH dehydrogenase subunit 6)protein of eight species were taken as an example to illustrate the new approach.The results demonstrated that the method is convenient and efficient.展开更多
文摘在文本分类中,选取一个高效的分类算法是提高文本分类准确度,缩短分类时间的关键。提出基于指数分布族的多项式贝叶斯类特定分类算法(exponential family-multinomial naive Bayes,EF-MNB),基于多项式模型构造了 N 个类的分布,利用类特定特征选择算法得到第 N 个类的特征子集及对应类的特征概率密度函数(probability density function,PDF),通过指数分布族构造了 N 个类的原始PDF估计表达式,给定 N 个类的训练集,得到了第 N 个类的最优PDF估计,并基于贝叶斯定理制定了分类规则。仿真结果表明,与基于文档主题生成模型和支持向量机(latent dirichlet allocation-support vector machine,LDA-SVM)的层次分析分类算法、改进的超球支持向量机(improved hyper-sphere support vector machine,IHS-SVM)文本分类算法和基于主成份分析和k最近邻(principal component analysis-k-nearest-neighbor,PCA-KNN)混合分类算法相比,EF-MNB类特定分类算法使用少量的时间就可获得更高分类准确率。
基金Project(No.Z111020834)supported by 08 Special Talent Fund of Northwest A&F University,China
文摘Biology sequence comparison is a fundamental task in computational biology.According to the hydropathy profile of amino acids,a protein sequence is taken as a string with three letters.Three curves of the new protein sequence were defined to describe the protein sequence.A new method to analyze the similarity/dissimilarity of protein sequence was proposed based on the conditional probability of the protein sequence.Finally,the protein sequences of ND6(NADH dehydrogenase subunit 6)protein of eight species were taken as an example to illustrate the new approach.The results demonstrated that the method is convenient and efficient.