期刊文献+

基于间隔二肽组分和递归特征消除法的DNA结合蛋白的鉴定

Identification of DNA-binding Proteins Using Gapped-dipeptide Composition and Recursive Feature Elimination Algorithm
下载PDF
导出
摘要 DNA结合蛋白(DNA-binding proteins,DBPs)的鉴定在原核和真核生物的基因和蛋白质功能注释研究中具有十分重要的意义.本研究首次运用间隔二肽组分(gapped-dipeptide composition,Gap DPC)结合递归特征消除法(recursive feature elimination,RFE)鉴定DBPs.首先获得待测蛋白质氨基酸序列的位置特异性得分矩阵(position specific scoring matrix,PSSM),在此基础上提取蛋白质的Gap DPC特征,通过RFE法选择最优特征,然后利用支持向量机(support vector machine,SVM)作为分类器,在蛋白质序列数据集PDB396和LB1068中进行夹克刀交叉验证(jackknife cross validation test).研究结果显示,基于PDB396和LB1068数据集,DBPs预测的准确率、Matthews相关系数、敏感性和特异性分别达到93.43%、0.86、89.04%和96.00%,以及86.33%、0.73、86.49%和86.18%,明显优于文献报道中的相关方法,为DBPs的鉴定提供了新的模型. The identification of DNA-binding proteins(DBPs) plays an important role in functional annotation of genes and proteins of prokaryote and eukaryote organisms. This study, for the first time, combined the gapped-dipeptide composition(Gap DPC) and recursive feature elimination(RFE) to identify DBPs. The position specific scoring matrix(PSSM) of each tested amino acid sequence was obtained. Based on the PSSM, their Gap DPC features of the amino acid sequences were extracted, and then the optimal features were selected using the RFE method. Subsequently, the support vector machine(SVM) was chosen as a classifier and the datasets PDB396 and LB1068 were tested using the jackknife cross validation test. The result showed that the values of accuracy, Matthews correlation coefficient, sensitivity, and specificity for the identification of DBPs were 93.43%,0.86, 89.04% and 96%, and 86.33%, 0.73, 86.49% and 86.18% for the datasets PDB396 and LB1068, respectively,which were obviously superior to the methods reported previously in the literature. The new model established in this study improved the identification methods of DBPs.
作者 汤亚东 刘潇 刘太岗 谢鹭 陈兰明 TANG Ya-Dong;LIU Xiao;LIU Tai-Gang;XIE Lu;CHEN Lan-Ming(Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation(Shanghai), Ministry of Agriculture, College of Food Science and Technology, Shanghai 201306, China;College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;Shanghai Center for Bioinformation Technology, Shanghai 201203, China)
出处 《生物化学与生物物理进展》 SCIE CAS CSCD 北大核心 2018年第4期453-459,共7页 Progress In Biochemistry and Biophysics
基金 国家自然科学基金(31671946,11601324) 上海市科委基金(17050502200)资助项目
关键词 DNA结合蛋白 间隔二肽组分 位置特异性得分矩阵 递归特征消除法 支持向量机分类器 DNA-binding proteins, gapped-dipeptide composition, position specific score matrix, recursive feature elimination algorithm, support vector machine classifier
  • 相关文献

参考文献1

二级参考文献24

  • 1Kroeze W K,Sheffler D J,Roth B L.G-protein-coupled receptors at a glance[J].Journal of Cell Science,2003,116(24):4867-4869.
  • 2Agrawal N J,Helk B,Trout B L.A computational tool to predict the evolutionarily conserved protein-protein interaction hot-spot residues from the structure of the unbound protein[J].FEBS Letters,2014,588(2):326-333.
  • 3Chou Kuochen.Prediction of G-protein-coupled receptor classes[J].Journal of Proteome Research,2005,4(4):1413-1418.
  • 4Karnik S S,Gogonea C,Patil S,et al.Activation of G-protein-coupled receptors:a common molecular mechanism[J].Trends in Endocrinology & Metabolism,2003,14(9):431-437.
  • 5Albert B,Johnson A,Lewis J,et al.Molecular biology of the cell[M].4th ed.New York:Garland Science Press,2002.
  • 6Yamanishi Y,Araki M,Gutteridge A,et al.Prediction of drug-target interaction networks from the integration of chemical and genomic spaces[J].Bioinformatics,2008,24(13):i232-i240.
  • 7He Zhisong,Zhang Jian,Shi Xiaohe,et al.Predicting drug-target interaction networks based on functional groups and biological features[J].PloS One,2010,5(3):e9603.
  • 8Xiao Xuan,Min Jianliang,Wang Pu,et al.iGPCR-Drug:A web server for predicting interaction between GPCRs and drugs in cellular networking[J].PloS One,2013,8(8):e72234.
  • 9Kanehisa M,Goto S,Hattori M,et al.From genomics to chemical genomics:new developments in KEGG[J].Nucleic Acids Research,2006,34(suppl 1):D354-D357.
  • 10Chou Kuochen.Prediction of protein cellular attributes using pseudo-amino acid composition[J].Proteins:Structure,Function,and Bioinformatics,2001,43(3):246-255.

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部