一种基于最优局部信息融合的蛋白质亚细胞定位预测方法被引量：3

A Novel Approach for Prediction of Protein Subcellular Localization Using Optimal Local Information

下载PDF

导出

摘要基于蛋白质的合成及分选机制,提出了一种新的蛋白质亚细胞定位预测方法。先采用遍历搜索技术,找出各种亚细胞蛋白质序列分选信号和成熟蛋白质之间的最佳分割位点,把蛋白质序列分为两条子序列,计算这两条子序列中的氨基酸组份并将它们融合起来作为整条蛋白质序列的特征,然后构造用于识别每类蛋白质的最佳子分类器,再根据最大化原则组建集成分类器。在NNPSL数据集上,采用5重交叉验证方法对本文方法进行测试,原核和真核两个蛋白质序列子集分别取得94.1%和87.5%的总体预测精度。同时,此方法在一些蛋白质序列中找到的分割位点与真实生物现象相吻合,能为预测蛋白质序列的剪切位点提供参考信息。 Prediction of protein subcellular localization can help infer the function of proteins and apply insight into the interaction between proteins. A novel approach based on the sorting mechanism of proteins, is proposed for predicting subcellular localization of proteins. An optimal splice site is found through iterative searching technique to divide the sequence into sorting signal and mature protein subsequenee for each kind of proteins. When designing the classifier, a sub-classifier is built to discriminate each kind of protein from the rest, these sub-classifiers are then combined into an ensemble classifier to predict the subcellular localization of unknown proteins. Through fivefold cross-validation tests on NNPSL datasets and TargetP datasets, overall accuracies of 94. 1% and 87.5% are obtained for prokaryotie and eukaryotie proteins respectively, as for TargetP datasets, the overall accuracies are 90. 2% and 93.9% for plant and non-plant proteins respectively. Meanwhile, the optimal splice sites found in this paper are coincided with the biological facts in most of kinds protein, this can help predict the cleavage sites of proteins.

作者张树波赖剑煌何建国

机构地区中山大学数学与计算科学学院中山大学信息技术与科学学院中山大学生命科学技术学院

出处《中山大学学报（自然科学版）》 CAS CSCD 北大核心 2008年第6期16-21,共6页 Acta Scientiarum Naturalium Universitatis Sunyatseni

基金国家自然科学基金资助项目(60675016 60633030)

关键词亚细胞定位N端分选信号成熟蛋白质支持向量机分割位点 subcellular localization N-terminal sorting signal mature protein support vector machine splice site

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献15

1HOGLUND A, DONNES P, BLUM T, et al. MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition [ J ]. Bioinformatics, 2006,22 ( 10 ) : 1158 - 1165.
2Evangelia I. Petsalaki,Pantelis G. Bagos,Zoi I. Litou,Stavros J. Hamodrakas.PredSL: A Tool for the N-terminal Sequence-based Prediction of Protein Subcellular Localization[J].Genomics, Proteomics & Bioinformatics,2006,4(1):48-55. 被引量：5
3REINHARDT A, HUBBARD T. Using neural networks for prediction of the subcellular location of proteins [ J ]. Nucleic Acids Res, 1998,26 (9) :2230 - 2236.
4HUA S J, SUN Z R. Support vector machine approach for protein subcellular location prediction [ J ]. Bioinformatics,2001,17 :721 - 728.
5MATSUDA S, VERT J P, SAIGO H, et al. A novel representation of protein sequences for prediction of subcellular location using support vector machines [ J ]. Protein Sci ,2005,14:2804 - 2813.
6GUO J,LIN Y, SUN Z. A novel method for protein subcellular localization: Combining residue-couple model and SVM[C]. Proceedings of the 3rd Asia-Pacific Bioin- formatics Conference, Singapore,2005,117 - 129.
7CHOU K C, CAI Y D. Using functional domain composition and support vector machines for prediction of protein subcellular location [ J]. J Biol Chem,2002,277 (48) : 45765 - 45769.
8SCOTT M S,THOMAS D Y, HALLETT M T. Predicting subcellular localization via protein motif co-occurrence [J]. Genome Res,2004,14 : 1957 - 1966.
9XIE D, LI A, WANG M. LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST [ J ]. Nucleic Acids Res, 2005,33:105 - 110.
10TAMURA T, AKUTSU T. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition [ J ]. BMC Bioinformatics,2007,8:466.

二级参考文献30

1[1]Blobel,G.2000.Protein targeting (Nobel lecture).Chembiochem 1:86-102.
2[2]Feng,Z.P.2002.An overview on predicting the subcellular location of a protein.In Silico Biol.2:291-303.
3[3]Mott,R.,et al.2002.Predicting protein cellular localization using a domain projection method.Genome Res.12:1168-1174.
4[4]Cedano,J.,et al.1997.Relation between amino acid composition and cellular location of proteins.J.Mol.Biol.266:594-600.
5[5]Chou,K.C.2000.Prediction of protein subcellular locations by incorporating quasi-sequence-order effect.Biochem.Biophys.Res.Commun.278:477-483.
6[6]Nair,R.and Rost,B.2003.Better prediction of subcellular localization by combining evolutionary and structural information.Proteins 53:917-930.
7[7]Emanuelsson,O.,et al.2000.Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.J.Mol.Biol.300:1005-1016.
8[8]Chou,K.C.and Elrod,D.W.1999.Prediction of membrane protein types and subcellular locations.Proteins 34:137-153.
9[9]Hiller,K.,et al.2004.PrediSi:prediction of signal peptides and their cleavage positions.Nucleic Acids Res.32:W375-379.
10[10]Szafron,D.,et al.2004.Proteome Analyst:custom predictions with explanations in a web-based tool for high-throughput proteome annotations.Nucleic Acids Res.32:W365-371.

共引文献4

1Qing-Bo Yu,Guang Li,Guan Wang,Jing-Chun Sun,Peng-Cheng Wang,Chen Wang,Hua-Ling Mi,Wei-Min Ma,Jian Cui,Yong-Lan Cui,Kang Chong,Yi-Xue Li,Yu-Hua Li,Zhongming Zhao,Tie-LiuShi,Zhong-Nan Yang.Construction of a chloroplast protein interaction network and functional mining of photosynthetic proteins in Arabidopsis thaliana[J].Cell Research,2008,18(10):1007-1019. 被引量：4
2赵娟,秦玉芳,刘太岗,王军.基于一种新型马尔科夫模型的预测蛋白质亚细胞位点的方法(英文)[J].上海师范大学学报（自然科学版）,2011,40(2):125-131.
3肖红利,桂月晶,祁伟彦,陈捷胤,李蕾,徐明,戴小枫.大丽轮枝菌分泌蛋白提取方法比较[J].中国农业科学,2014,47(12):2348-2356. 被引量：3
4刘艳丽,周媛,曹丹,马林龙,龚自明,金孝芳.基于茶蛋白质组学数据分析植物亚细胞定位预测软件的应用[J].植物科学学报,2020,38(5):671-677. 被引量：1

同被引文献51

1孙豫峰.基于概率神经网络的蛋白质亚细胞定位[J].太原师范学院学报（自然科学版）,2005,4(2):23-25. 被引量：2
2马翔,王明会,李骜,谢丹,冯焕清.基于加权模糊k近邻方法的蛋白质亚细胞位点预测[J].中国生物医学工程学报,2006,25(1):106-109. 被引量：5
3张振慧,王正华,王勇献.利用分组重量编码预测细胞凋亡蛋白的亚细胞定位[J].生物物理学报,2006,22(4):275-282. 被引量：5
4陈颖丽,李前忠,杨科利,樊国梁.基于离散增量结合支持向量机方法的凋亡蛋白亚细胞位置预测[J].生物物理学报,2007,23(3):192-198. 被引量：8
5Chou K C,Shen H B.Recent progress in protein subcellular location prediction[J].Anal Biochem,2007,370(1):1-16.
6Emanuelsson O,Nielsen H,Brunak S,et al.Predicting Subcellular Localization of Proteins Based on their N-ter-minal Amino Acid Sequence[J].J Mol Biol,2000,300(4):1005-1016.
7Nakashima H,Nishikawa K.Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies[J].J Mol Biol,1994,238(1):54-61.
8Chou K C,Cai Y D.A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology[J].Biochem Biophys Res Conmun,2003,311(3):743-747.
9Yuan Z.Prediction of protein subcellular location using Markov chain models[J].FEBS Letters,1999,451:23-26.
10Cai Y D,Chou K C.Using Neural Networks for Prediction of Subcellular Location of Prokaryotic and Eukaryotie Proteins[J].Mol Cell Biol Res Comm,2000,4:172-173.

引证文献3

1张艳,孙慈,项新媛,左永春,李前忠.氨基酸约化分类对亚线粒体蛋白定位的预测[J].内蒙古大学学报（自然科学版）,2011,42(3):311-317.
2王伟,郑小琪,窦永超,刘太岗,赵娟,王军.基于最优分割位点的蛋白质亚细胞位点预测方法[J].生物信息学,2011,9(2):171-175. 被引量：2
3吴泽月,陈月辉.蛋白质亚细胞定位预测研究进展[J].山东师范大学学报（自然科学版）,2012,27(4):33-37. 被引量：6

二级引证文献8

1郑珊珊,石卓兴,代琦,姚玉华.蛋白质亚细胞定位预测研究进展[J].科技视界,2014(12):12-12.
2石雪娜,王瑞平.基于压缩感知预测凋亡蛋白亚细胞位点[J].北京生物医学工程,2015,34(1):70-74.
3岳英伟,王鑫,杜淼,马文芝,郭宏.牛MARK2、CREB5基因的克隆和生物信息学分析[J].中国畜牧兽医,2016,43(2):311-318. 被引量：1
4靳聪飞,刘新峰,王婷,杨淑萍,郭宏.牛ARRDC3和ARRDC4基因的克隆和生物信息学分析[J].畜牧与兽医,2016,48(4):39-45.
5叶静,陈伟,金殿川.基于不同物种的热休克蛋白90的生物信息学分析[J].生物信息学,2016,14(3):134-138. 被引量：2
6靳聪飞,梁婷玉,刘新峰,郭宏.牛GUCY1A3和SFXN1基因的克隆及生物信息学分析[J].中国畜牧兽医,2017,44(2):357-364. 被引量：5
7赵南,张梁,薛卫,王雄飞,任守纲.词袋模型在蛋白质亚细胞定位预测中的应用[J].食品与生物技术学报,2017,36(3):296-301. 被引量：5
8靳聪飞,张瑞,刘新峰,郭宏.牛TNS1基因的克隆和生物信息学分析[J].黑龙江畜牧兽医,2017(8):98-102. 被引量：2

1程昔恩,吴志诚.一种新的蛋白质亚细胞定位预测方法[J].计算机工程与应用,2012,48(6):126-128. 被引量：1
2王伟,郑小琪,窦永超,刘太岗,赵娟,王军.基于最优分割位点的蛋白质亚细胞位点预测方法[J].生物信息学,2011,9(2):171-175. 被引量：2
3尹自强.CVS及其在银行业务系统开发中的应用[J].中国金融电脑,2004(7):47-49.
4热门网站推荐：[J].品位,2008(11):105-105.
5Cloud.懒人有懒福任务栏变直通车[J].电脑爱好者（普及版）,2008,0(8):18-18.
6孙晶京.基于GO的蛋白质亚细胞定位方法研究[J].农业网络信息,2012(11):21-23. 被引量：1
7冯馨.一种基于改进型伪氨基酸的蛋白质亚细胞定位算法[J].信息与电脑（理论版）,2014,0(11):94-95.
8马军伟,史舵,顾宏,张杰.PCA方法在蛋白质亚细胞定位中应用[J].大连理工大学学报,2012,52(3):426-430. 被引量：1
9张健沛,杨静,李泓波.粗糙集理论中核概念的讨论[J].哈尔滨商业大学学报（自然科学版）,2008,24(3):363-365.
10宋杰.蛋白质亚细胞定位预测的最近邻算法[J].计算机应用研究,2007,24(11):30-31. 被引量：1

中山大学学报（自然科学版）

2008年第6期

浏览历史

内容加载中请稍等...

一种基于最优局部信息融合的蛋白质亚细胞定位预测方法被引量：3

参考文献15

二级参考文献30

共引文献4

同被引文献51

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

一种基于最优局部信息融合的蛋白质亚细胞定位预测方法 被引量：3

参考文献15

二级参考文献30

共引文献4

同被引文献51

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

一种基于最优局部信息融合的蛋白质亚细胞定位预测方法被引量：3