摘要
文中提出了一种简单有效的蛋白质亚细胞区间定位预测方法,为进一步了解蛋白质的功能和性质提供理论基础。运用稀疏编码,结合氨基酸组成信息提取蛋白质序列特征,基于不同字典大小对得到的特征进行多层次池化整合,并送入支持向量机进行分类。经Jackknife检验,在数据集ZD98、CH317和Gram1253上的预测成功率分别达到95.9%、93.4%和94.7%。实验证明基于多层次稀疏编码的分类预测算法能显著提高蛋白质亚细胞区间定位的预测精度。
In order to provide a theoretical basis for better understanding the function and properties of proteins,we proposed a simple and effective feature extraction method for protein sequences to determine the subcellular localization of proteins.First,we introduced sparse coding combined with the information of amino acid composition to extract the feature values of protein sequences.Then the multilayer pooling integration was performed according to different sizes of dictionaries.Finally,the extracted feature values were sent into the support vector machine to test the effectiveness of our model.The success rates in data set ZD98,CH317 and Gram1253 were 95.9%,93.4%and 94.7%,respectively as verified by the Jackknife test.Experiments showed that our method based on multilayer sparse coding can remarkably improve the accuracy of the prediction of protein subcellular localization.
作者
陈行健
胡雪娇
薛卫
Xingjian Chen;Xuejiao Hu;Wei Xue(School of Information Science and Technology,Nanjing Agricultural University,Nanjing 210095,Jiangsu,China)
出处
《生物工程学报》
CAS
CSCD
北大核心
2019年第4期687-696,共10页
Chinese Journal of Biotechnology
基金
国家重点研发计划(No.2017YFD0800204)
中央高校基本科研业务费专项资金(No.KYZ201600175)资助~~
关键词
稀疏编码
氨基酸组成
多层次池化
支持向量机
亚细胞区间定位
sparse coding
amino acid composition
multilayer pooling
support vector machine
subcellular localization prediction