摘要
基因启动子区域控制一个基因转录的起始。因此,真核启动子预测是DNA序列分析中最重要的问题,也是非常困难的任务。用高斯混合模型(GMM)估计启动子中寡核苷酸位置密度并将其作为特征向量,是一种有效的方法。然而混合度G通常都选的很大,模型训练需要大量的时间。由于每个寡核苷酸位置分布的不同,本文提出用模糊聚类的方法分别确定每个寡核苷酸的最优混合度,提高了寡核苷酸位置分布的检测精度,并减少了计算时间。接着,提出了一种基于最小二乘法的加权贝叶斯分类器算法,用于人类启动子的辨识,进一步提高了辨识精度。仿真结果表明,本算法具有较高的预测效果。
The gene promoter region controls transcription of a gene.The prediction of the eukaryotic promoter is the most important problem in DNA sequence analysis,also a very difficult task.Applying a Gaussian Mixture Model(GMM) to calculate the positional densities of oligonucleotides in promoter sequence which taken as feature vector is an effective method.But the number of mixtures of GMM is usually very large,so training the model needs a lot of time.Since the positional densities of every oligonucleotides is different,in this paper,the fuzzy cluster is used to determine the optimal numbers of GMM components so as to improve the precision of detection and reduce the computational time.Then,a weighted na?ve Bayes classifier based on the Least Square is proposed and applied to the true promoter prediction.The simulation results show the efficiency of the proposed approach.
出处
《电路与系统学报》
CSCD
北大核心
2010年第4期33-37,共5页
Journal of Circuits and Systems
基金
国家自然科学基金资助项目(50877004)
关键词
启动子
寡核苷酸
模糊聚类
高斯混合模型
最小二乘法
加权贝叶斯分类器
promoter
oligonucleotide
fuzzy cluster
Gaussian Mixture Model(GMM)
Least Square
weighted Bayesian classifier(WNB)