摘要
随着基因组研究的发展,利用机器学习方法进行基因识别被广泛使用,这些方法包括神经网络算法、基于规则的方法、决策树、概率推理等。文章描述了一种基于隐马尔科夫模型的基因识别系统,介绍了EM训练算法和Viterbi序列分析算法,该系统运用Burset&Guigo的公共数据集进行测试,核苷识别的Sn和Sp两个参数分别可以达到68%和88%,外显子识别的Sn和Sp参数达到60%和63%。
As the development of the Genome Project,gene recognizing systems use many machine learning techniques,including neural network algorithms ,dynamic programming,rule-based method,decision trees and probabilistic reasoning.This paper describes a precise probabilistic method for modeling sequences of DNA:Hidden Markov Model.A practical program based on the above model is developed,and a good result is acquired by testing the data set of Burset&Guigo.
出处
《计算机工程与应用》
CSCD
北大核心
2003年第24期69-71,共3页
Computer Engineering and Applications
基金
国家自然科学基金重大研究计划项目资助(编号:90203011)