摘要
蛋白质的三维空间结构决定该蛋白质的生物功能,研究蛋白质结构的相似性在计算分子生物学中具有重要意义。本文将蛋白质Cα原子距离矩阵分解成许多小的子矩阵表示蛋白质的局部结构,通过对这些局部结构的统计分析得到局部特征频率向量计算蛋白质的相似性,在此基础上提出一种新的基于自适应局部特征频率向量的方法(ALFF)计算蛋白质三维结构相似性。ALFF在选取蛋白质局部特征的方式上,使用OTSU算法确定局部特征最合适的大小m,并通过MeanShift聚类计算出具有代表性的局部特征数量k。实验结果表明,ALFF可以更好更快地划分蛋白质的局部子结构,相对于人工选择参数的方法,ALFF在SCOP蛋白质结构分类中有更高的一致性,与TM-score比较有更好的准确性。
The three-dimensional spatial structure of protein determine its biological function.Structural similarity between proteins can be a good predictor of functional correlations.In this paper,the Cα atomic distance matrix of protein is decomposed into many small sub-matrices that represent the local structure of the protein.Through the statistical analysis of these local structures,a local feature frequency vector is obtained to calculate the similarity of the protein.Consequently,a new method to measure the similarity of protein structure by adaptive local feature frequency vector(ALFF)is proposed.In the way of selecting the local features of protein in ALFF,OTSU is adopted to determine the most appropriate size of the local features m,and MeanShift is applied to find the representative number of local features k,respectively.Experimental results demonstrate that ALFF can achieve better and faster division of the local substructures of proteins.In addition,compared with the method of manual selection of parameters,ALFF has higher consistency in protein structure classification and better accuracy in TM-score comparison.
作者
张汝昌
邱杰
王明堂
陈庆锋
ZHANG Ruchang;QIU Jie;WANG Mingtang;CHEN Qingfeng(School of Computer Electronics and Information,Guangxi University,Nanning Guangxi 530004,China;School of Computer Science and Engineering,Yulin Normal University,Yulin Guangxi 537000,China)
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2020年第6期40-50,共11页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金(61963004)
广西自然科学基金重点项目(2017GXNSFDA19803)
广西重点研发计划(桂科AB17195055)。
关键词
蛋白质结构相似性
局部特征
距离矩阵
聚类
频率向量
protein structural similarity
local feature
distance matrix
clustering
frequency vector