摘要
通过收集165对蛋白质的结构文件,利用BLASTP比较它们的相似度.建立球极坐标系,分别将球体半径、方位角和仰角二等分和三等分,将蛋白质划分为8块和27块类似球壳碎片的区域.在此基础上,利用MATLAB计算12个参数相似度,用SPSS建立了二等分和三等分时总相似度和12个参数相似度的全回归模型、逐步回归模型和相关性回归模型.利用MATLAB建立BP神经网络模型,并与线性回归模型进行了对比.根据二等分时逐步回归模型的结果可以看出,原子个数相似度,C、N原子个数相似度,P、S的位置相似度以及密度相似度和总体相似度的相关性最显著.二等分时结果较三等分时好,逐步回归模型的结果最好.
One hundred and sixty-five pairs of protein structural files were collected and BLASTP was then utilized to compare their similarities. The spherical polar coordinate was established. The radius of the sphere, the azimuth and elevation were bisected and trisected, respectively, so the protein was divid- ed into 8 and 27 blocks which were similar to spherical shell fragments. On this basis, the similarity of 12 parameters was calculated using MATLAB. The full regression model, stepwise regression model and fil- ter regression model between the total similarity and the similarity of 12 parameters when they were bisec- ted and trisected were established using SPSS. The BP neural network model was established using MATLAB for comparison. According to the results of stepwise regression model, similarity of the atomic number, similarity of C and N atomic number, similarity of P and S position and density had the most significant correlation with the overall similarity. Results of bisection were much better when compared with that of trisection, and stepwise regression model had the best results.
出处
《郑州大学学报(理学版)》
CAS
北大核心
2016年第2期105-109,共5页
Journal of Zhengzhou University:Natural Science Edition
基金
国家自然科学青年基金资助项目(813D3150)
中国中医药行业科研专项基金资助项目(201007001)
关键词
蛋白质
相似度
回归分析
逐步回归
BP神经网络
protein
similarity
regression analysis
stepwise regression
BP neural network