摘要
氨基酸序列编码问题一直是在蛋白质结构预测中导致算法输入空间较大的主要原因。只有对氨基酸序列进行更好的编码.才能为后续进行计算机分析打下基础。提出并实现了综合考虑了氨基酸序列的划分和长程作用效应,利用氨基酸正交编码区分每个氨基酸个体,利用基本正交矩阵获得氨基酸在物理、化学、生物上的相似性,利用分属概率来获得当前蛋白质序列中氨基酸构成不同二级结构的趋势的新的混合编码方法,从而改进了氨基酸残基序列编码,并利用现有算法比较了不同编码方式对蛋白质二级结构预测的影响,结果证实该编码方式能够提高蛋白质二级结构预测的准确性。
Amino acid sequence encoding problem will lead to the protein structure prediction overfit. With good encoding scheme, we can get a better prediction. We discuss and implement a better encoding for the computer analysis. We mainly think about the effect of amino acid sequence division and interaction, using cross-matrix to present the comparability of physical, chemical, biological characters of amino acid. The encoding scheme of amino acid is improved and a comparison of different encoding schemes is made. We also make a compare of the difference between our encoding and other encoding for protein structure prediction, and our encoding is proved to be better.
作者
李冠宇
朱宏明
周闻钧
LI Guan-yu, ZHU Hong-ming, ZHOU Wen-jun (School of software engineering, Tongji University, Shanghai 200092, China)
出处
《电脑知识与技术》
2008年第12期1713-1716,共4页
Computer Knowledge and Technology
关键词
蛋白质结构预测
编码
机器学习
protein structure prediction
encoding scheme
machine learning