摘要
隐马尔可夫模型是最近几年在许多机器学习领域都得到成功应用的关于序列分析的重要统计模型,特别是在蛋白质家族的识别方面。这主要是由于生物数据的急剧增长导致2个领域(计算科学和生物学)走向结合引起的。探讨了多重序列比对和序列谱隐马尔可夫模型,讨论了隐马尔可夫模型的基本算法以及如何建立HMMs。根据E值和训练分数进行蛋白质家族的识别和分类。
Hidden Markov models(HMMs)are statistical models of sequential data that have been used successfully in many machine learning applications,especially in protein family recognition in the last few years.This is due to the vast increase of data in biology that makes two fields(computational science and biology)associate with each other. To study multiple sequence alignment and profile HMMs,one essential algorithm of HMMs and how to build an HMMs are discussed.By E values and training scores an experiment for recognition and classification of protein family is given.
出处
《中山大学学报(自然科学版)》
CAS
CSCD
北大核心
2005年第2期9-13,共5页
Acta Scientiarum Naturalium Universitatis Sunyatseni
基金
国家自然科学基金资助项目(10371135)
关键词
隐马尔可夫模型
多重序列比对
蛋白质家族的分类
hidden Markov models (HMMs)
multiple sequence alignment
protein classification