摘要
提出一种新的能依据蛋白质序列自动地识别被查询蛋白质的四级结构类型的方法。首先采用伪特定位点记分矩阵方法(PsePSSM)提取蛋白质序列的特征。采用这种方法提取出的特征能尽可能多地反映蛋白质序列的原始信息如顺序和进化等信息。但随之产生的问题是特征维数很高,使得预测系统复杂化。因此,引入线性维数约简算法最大方差映射方法(MVP),它可以从高维的特征空间中提取出低维的关键特征。最后,在约简后的特征上再应用分类算法预测未知蛋白质的四级结构。试验结果表明,采用降维方法不但使得预测系统得到简化,同时还提高了分类性能。
An automated method to identify the quaternary structure of queried protein is proposed.Firstly,a PsePSSM(Pseudo Position-Specific Score Matrix) is adopted to extract the features of proteins.The features extracted by PsePSSM can mostly reflect the original information of protein sequence such as the evolution information and sequence-correlated information.But it may cause the "high dimension disaster" problem and make the prediction system complex.To overcome such a problem,a linear dimensionality reduction algorithm MVP(Maximum Variance Projections) is introduced to extract the key features from the high-dimensional PsePSSM space.Finally,based on the reduced features,classifier is used to identify the protein quaternary structure.Experiment results prove that the prediction system is simplified and classification performances are improved by adopting dimension reduction methods.
出处
《上海第二工业大学学报》
2013年第1期12-17,共6页
Journal of Shanghai Polytechnic University
基金
上海市教委科研创新项目(No.12YZ175)
关键词
蛋白质四级结构
同源寡聚蛋白质
分类
降维
quaternary structure of protein
homo-oligomers
classification
dimension reduction