期刊文献+

利用半监督降维算法预测蛋白质亚细胞位置 被引量:2

The Prediction of Protein Subcelluler Localization by Using Semi-Supervised Dimensionality Reduction Method
下载PDF
导出
摘要 首先采用伪氨基酸组成(Pse AA)和特定位点记分矩阵(PSSM)2种方法组合的特征提取方法来表达蛋白质序列。通过该方法将蛋白质序列转化成特征向量,虽然该向量在很大程度上保留了蛋白质序列的原始信息,但是它产生的相应的维数会很高,这使得蛋白质亚细胞位置的预测过程变得很复杂。同时,就目前的情况来看,想要获取大量已标记的蛋白质亚细胞位置样本也很困难。为了解决这些问题,提出采用半监督降维算法(SS-MVP)对特征向量进行降维的同时能从标记和未标记的样本点中提取对分类有用的信息。基于降维后的样本利用支持向量机(SVM)的算法来预测蛋白质亚细胞位置类型。实验结果表明,采用上述方法既能简化蛋白质亚细胞位置的预测系统,又能提高其分类性能。 Firstly, a fusion feature extraction method by combining Pseudo Amino Acid composition (PseAA) and Position-Specific Scoring Matrix (PSSM) is adopted to represent the features of proteins. Through this method, proteins are changed to feature vectors which can mostly retain the original information of protein sequence. But this high-dimensional feature vectors produced by using this fusion method may make the prediction system of protein subceUuler localization complex. At the same time, to obtain a large sample of marked protein subcellular location is also very difficult. To overcome these problems, a dimensionality reduction algorithm called Semi-Supervised Maximum Variance Projections (SS-MVP) is introduced to reduce the dimensional of feature vectors and extract useful information for classification from labeled and unlabeled sample points at the same time. Based on the reduced samples, Support Vector Machine (SVM) was applied for the prediction of protein subcelhiler localization. Finally, the obtained results prove that the prediction system of protein subcelluler localization is simplified and classification performances are improved by adopting aboved methods.
出处 《上海第二工业大学学报》 2015年第3期260-265,共6页 Journal of Shanghai Polytechnic University
基金 国家自然科学基金(No.61301249 No.61272036) 上海市自然科学基金(No.15ZR1417000) 上海市教委优青项目(No.ZZegd14001) 上海第二工业大学校基金(No.EGD14XQD13)资助
关键词 蛋白质亚细胞定位 预测 半监督 降维 subcelluler localization of protein prediction semi-supervised dimension reduction
  • 相关文献

参考文献11

  • 1陈铭.生物信息学[M].北京:科学出版社,2012.
  • 2SHEN H B, CHOU K C. Nuc-PLoc: A new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM [J]. Protein Engineering Design & Selection, 2007, 20: 561-567.
  • 3CHOU K C, CAI Y D. Predicting protein quaternary struc- ture by pseudo amino acid composition [J]. Proteins, 2003, 53: 282-289.
  • 4川韩榕.细胞生物学[M].北京:科学出版社,2011.
  • 5SCHAFFER A A, ARAVIND L, MADDEN T L, et al. Improving the accuracy of PSI-BLAST protein database searches with composition- based statistics and other re- finements [J]. Nucleic Acids Research, 2001, 29: 2994- 3005.
  • 6WANG T, LU H, CAO X X, et al. Predicting protein subcellular localization based on a semi-supervised algo- rithm [C]//2012 International Symposium on Instrumen-tation and Measurement, Sensor Network and Automa- tion(IMSNA). Sanya: IEEE Computer Society, 2012: 130- 133.
  • 7王彤,杨志珍,曹晓夏.基于线性降维方法的蛋白质四级结构类型预测[J].上海第二工业大学学报,2013,30(1):12-17. 被引量:2
  • 8刘志宇.一种半监督判别邻域嵌入算法[J].计算机工程与应用,2011,47(19):173-175. 被引量:2
  • 9BELKIN M, NIYOGI E SINDHWANI V. Manifold regu- larization: A geometric framework for learning from la- beled and unlabeled examples [J]. Journal of Machine Leanfing Research, 2006, 7: 2399-2434.
  • 10HUA S J, SUN Z R. Support vector machine approach for protein subcellular localization prediction [J]. Bioinfor- matics, 2001, 17: 721-728.

二级参考文献19

  • 1鲁珂,赵继东,叶娅兰,曾家智.一种用于图像检索的新型半监督学习算法[J].电子科技大学学报,2005,34(5):669-671. 被引量:9
  • 2祝磊,朱善安.KSLPP:新的人脸识别算法[J].浙江大学学报(工学版),2007,41(7):1066-1069. 被引量:11
  • 3Tenenbaum J B, de Silva V, Langford J C.A global geometricframework for nonlinear dimensionality reduction[J].Science, 2000,290 (5500) : 2319-2323.
  • 4Rowels S T, Saul L K.Nonlinear dimensionality reduction by lo- cally linear embedding[J].Science,2000,290:2323-2326.
  • 5Belkin M.Laplacian eigenmaps for dimensionality reduction and data representation[J].Neural Computation, 2003,15 (6) : 1373-1396.
  • 6He Xiaofei, Cai Deng, Yah Shulcheng, et al.Neighborhood pre- serving embedding[C]//Tenth IEEE International Conference on Computer Vision, 2005 : 1208-1213.
  • 7He Xiaofei, Niyogi ELocality preserving projections[C]//Advances in Neural Information Processing System,2003.
  • 8Cai Deng,He Xiaofei,Han Jiawei.lsometric projection[C]//Associ- ation for the Advancement of Artificial Intelligence, 2007.
  • 9Belkin M, Niyogi P, Sindhwani V.Manifold regularization: A geometric framework for learning from examples[J].Journal of Machine Learning Research,2006.
  • 10Sindhwani V, Niyogi P, Belkin M.Beyond the point cloud: From transductive to semi-supervised learning[C]//Proc 2005 Int Conf Machine Learning, ICML ' 05,2005.

共引文献9

同被引文献10

  • 1SACHDEVA G, KUMAR K, JAIN P, et al. SPAAN: A softwarefor prediction of adhesins and adhesin-like proteinsusing neural networks [J]. Bioinformatics, 2005, 21(4):483-491.
  • 2GARG A, GUPTA D. VirulentPred: A SVM based predictionmethod for virulent proteins in bacterial pathogens [J].Bmc Bioinjormatics, 2008, 9(2): 62-73.
  • 3NANNI L, LUMINI A. An ensemble of support vector machinesfor predicting virulent proteins[J]. Expert Systemswith Applications an International Journal, 2009, 36(4):7458-7462.
  • 4LIU B, ZHU W, LI B, et al. A combination of feature extractionmethods with an ensemble of support vector machinesfor bacterial virulent proteins prediction [J]. Journalof Computational and Theoretical Nanoscience, 2015,12(8): 1813-1817.
  • 5WANG X, ZHANG J, LI G. Multi-location gram-positiveand gram-negative bacterial protein subcellular localizationusing gene ontology and multi-label classifier ensemble[J]. Bmc Bioinformatics, 2015, 16(suppl 12): 1-7.
  • 6JONES D. Protein secondary structure prediction based onposition-specific scoring matrices [J]. Journal of MolecularBiology, 1999, 292(2): 195-202.
  • 7ALTSCHUL S, MADDEN T, SCHAFFER A, et al.Gapped BLAST and PSI-BLAST: A new generation ofprotein database search programs [J]. Nucleic Acids Research,1997, 25(17): 3389-3402.
  • 8HE X, CAI D, YAN S, et al. Neighborhood preserving embedding[C].Proceedings of the 10th IEEE InternationalConference on Computer Vision. Beijing: IEEE Press,2005: 1208-1213.
  • 9王彤,杨志珍,曹晓夏.基于线性降维方法的蛋白质四级结构类型预测[J].上海第二工业大学学报,2013,30(1):12-17. 被引量:2
  • 10王彤,薛建新,孔亮亮.细菌性病原体内病毒蛋白的预测和研究[J].上海第二工业大学学报,2016,33(3):231-235. 被引量:1

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部