摘要
蛋白质三维结构叠加面临的主要问题是,参与叠加的目标蛋白质的氨基酸残基存在某些缺失,但是多结构叠加方法却大多数需要完整的氨基酸序列,而目前通用的方法是直接删去缺失的氨基酸序列,导致叠加结果不准确。由于同源蛋白质间结构的相似性,因此,一个蛋白质结构中缺失的某个区域,可能存在于另一个同源蛋白质结构中。基于此,本文提出一种新的、简单、有效的缺失数据下的蛋白质结构叠加方法(ITEMDM)。该方法采用缺失数据的迭代思想计算蛋白质的结构叠加,采用优化的最小二乘算法结合矩阵SVD分解方法,求旋转矩阵和平移向量。用该方法成功叠加了细胞色素C家族的蛋白质和标准Fischer's数据库的蛋白质(67对蛋白质),并且与其他方法进行了比较。数值实验表明,本算法有如下优点:(1)与THESEUS算法相比较,运行时间快,迭代次数少;(2)与PSSM算法相比较,结果准确,运算时间少。结果表明,该方法可以更好地叠加缺失数据的蛋白质三维结构。
The main problem in three-dimensional protein superposition is that some amino acid residues are missing in the superimposed target protein structures.However,most multiple structure superposition methods require the complete amino acid sequence.Current superposition methods deal with this problem usually by excluding amino acid sequence from the proteins,which leads to inaccurate results.Due to the similarity of the homologous protein structures,one structure of a protein may omit a region that is present in another structure of the same protein.In this paper,we propose a noval,simple and effective method(ITEMDM) for superpositioning multiple proteins with missing data.This method uses the idea of the iterative of missing data to compute the protein superposition problem.The rotation matrix and the translation vector are obtained by using the optimized least squares algorithm combined with matrix SVDdecomposition method.We successfully superimpose the cytochrome C family and the standard Fischer’s database(67 pairs of proteins) by using ITEMDM method,and compare them with other methods.Numerical experiments show that our algorithm has the following advantages:1) The operation time is faster and the iterations’ number is smaller when compared with the THESEUS algorithm.2) The result is more accurate and the operation time is smaller than PSSM algorithm.The results show that ITEMDM can superimpose the three-dimensional structures of the protein with missing data.
出处
《中国生物化学与分子生物学报》
CAS
CSCD
北大核心
2017年第6期630-637,共8页
Chinese Journal of Biochemistry and Molecular Biology
基金
国家重点研究发展计划(No.2016YFC1000307和No.2016YFB0201304)
国家自然科学基金(No.21573274)
国家重点研究发展计划子课题(No.2016YFC1000307-10)
国家卫生计生委科学技术研究所科技创新基金面上项目(No.2017GJM04)资助~~