摘要
蛋白质的亚细胞定位与其生物功能密切相关,蛋白质数据库急剧膨胀,迫切需要设计出功能强大的高吞吐量的算法来预测蛋白质的亚细胞位置.许多预测工具都是基于伪氨基酸组成构建而成,应用一种数据分析方法——主成分分析(PCA)法,确定能反映序列次序效应的最优λ值.首先让λ取最大以包含尽可能多的序列次序信息,然后利用主成分分析法提取关键主特征.实验结果表明此方法能解决确定最优λ值困难的问题,且性能优于已有的预测工具.
The location of a protein subcellular is closely correlated with its biological function. With the rapid expansion of protein databases, it is very important to design a powerful high-throughput algorithm for predicting protein subcellular location. Many prediction tools have been designed based on the pseudo-amino acid composition, and a data analysis method, principal component analysis (PCA) method, is applied to determining in advance the optimal value of ~ which reflects sequence order effects. Firstly, the parameter 2 is set to the maximum to contain more sequence order information; then, PCA is employed to extract the essential features. Experimental results show that the proposed method solves the above problem, and its performance is better than those of other predictors.
出处
《大连理工大学学报》
EI
CAS
CSCD
北大核心
2012年第3期426-430,共5页
Journal of Dalian University of Technology
关键词
蛋白质亚细胞定位
主成分分析
伪氨基酸组成
K近邻分类器
BP神经网络
protein subcellular location
principal component analysis
pseudo-amino acid composition
k-NN classifier
BP neural network