期刊文献+

蛋白质中三联氨基酸数与二级结构数的模型研究 被引量:1

The Model Study Between the Number of Tria-coupled Amino Acid and the Number of Protein Secondary Structure
下载PDF
导出
摘要 蛋白质的一级结构或序列与二级结构的关系在蛋白质结构研究中是很重要的,通过建立模型的方法来研究这种关系.在文献中已有的模型(蛋白质一级结构中的二联氨基酸与蛋白质二级结构的模型)的基础上,建立了蛋白质一级结构中的三联氨基酸个数与蛋白质二级结构个数模型.该模型能够较准确地反映蛋白质的一级结构或序列与蛋白质的二级结构的关系,比较适合应用于氨基酸序列长度变化较大的建模数据,同二联氨基酸与二级结构模型比较,由于三联氨基酸含有更多氨基酸之间的耦合信息,该模型的拟合精度更高.由于蛋白质一级结构中的三联氨基酸的种类数很大(为4200),用以建模的变量数就很大,同时从DSSP数据库得到的样本量也很大(为11600),用以建模的数据量很大.研究结果表明,PLS变量筛选法是一种建立大数据模型有效的方法,可有效地处理变量数为4200,样本数为11600这样大数据量的建模问题. The relation between protein sequence and protein secondary structure is very important, which has been studied by the method of building the model. Based on the models (between pair-coupled amino acid and protein secondary structure) in literature, the models between the number of tria-coupled amino acid in protein sequence and the number of protein secondary structure have been built. The models are more accurately reflect the relation between protein sequence and protein secondary structure. The models are more suitable to deal with the data in which the length of protein sequence varies a lot. Comparing with the models between pair- coupled amino acid and protein secondary structure, the models contain more information about coupling effect among varies kinds of amino acids, and therefore are of the higher fitting accuracy. The data set in the research is very large, because the kinds of tria-coupied amino acid in protein sequence are very big (4 200) and the number of samples from DSSP database is also very large (11 BOO). The results indicate that the PLS variable selection method is effective to deal with the huge data modeling problem in which the number of variables is 4 200 and the number of samples is 11 600.
作者 朱尔一
出处 《厦门大学学报(自然科学版)》 CAS CSCD 北大核心 2009年第5期704-708,共5页 Journal of Xiamen University:Natural Science
基金 福建省自然科学基金(X0750052) 近海海洋环境科学国家重点实验室(厦门大学)开放项目资助
关键词 蛋白质二级结构预测 偏最小二乘法变量筛选 海量数据建模 三联氨基酸 protein secondary structure prediction PLS variable selection huge data modeling tria-coupled amino acid
  • 相关文献

参考文献6

  • 1Chou Kuoehen. Using pair - coupled amino acid eomposi - tion to predict protein secondary structure eontent [J]. Journal of Protein Chemistry,1999,18(4):473--480.
  • 2朱尔一,林燕.利用偏最小二乘法的一种变量筛选法[J].计算机与应用化学,2007,24(6):741-745. 被引量:8
  • 3朱尔一,林燕,庄赞勇.偏最小二乘变量筛选法在毒品来源分析中的应用[J].分析化学,2007,35(7):973-977. 被引量:8
  • 4Chen Chao,Tian Yuanxin, Zou Xiaoyong, et al. Prediction of protein secondary structure content using support vector maehine[J ]. Talanta, 2007,71 (5):2069-- 2073.
  • 5Wolfgan Kabsch, Christian Sander. Dictionary of protein secondary structure: pattern recognition of hydrogen- bonded and geometrical feature[J]. Biopolymer, 1983,22 (12) :2577--2637.
  • 6Cuff J A,Barton G I. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction[J]. Protein, 1999,34(4) : 509-- 519.

二级参考文献11

共引文献10

同被引文献13

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部