摘要
为了提高糖基化位点的识别率,提出主成分分析(PCA)和独立成分分析(ICA)相结合的新方法对O-糖基化位点进行预测和分析。以窗口长度为51的蛋白质序列为研究对象,采用稀疏编码方案,首先利用PCA算法对蛋白质序列进行去相关预处理,以降低原始蛋白质序列的维数。然后利用ICA算法进行训练,提取特征向量构建子空间。测试序列投影到每一类子空间,计算测试序列和每类子空间重构序列的距离,根据距离大小确定所属的类。实验表明,提出的新方法有较高的预测性能。
To improve prediction accuracy of glycosylation site.A new method is proposed based on principal component analysis(PCA) and independent component analysis(ICA) for prediction O-linked glycosylation site and pattern analysis.Sparse coding scheme of protein sequence is applied when the window size is 51 in this research.PCA is firstly used to reduce dimension and second order correlation.Then ICA is used to extract independent components to construct a subspace(main basis) of protein sequence by training.The test protein sequence is projected on every subspace.By calculating the distance between the test protein vector and the reconstruction vector of every subspace,the test protein sequence is classified into the nearest class.The experimental results show that the proposed new approach is superior to PCA subspace method.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2011年第5期565-568,共4页
Computers and Applied Chemistry
基金
中南林业科技大学青年基金项目(101-0041)
湖南省教育厅青年基金项目(06C902)
关键词
糖基化位点
主成分分析
独立成分分析
位置概率函数
glycosylation site
principal component analysis
independent component analysis
positional probability functions