期刊文献+

Predicting potential cancer genes by integrating network properties,sequence features and functional annotations 被引量:1

Predicting potential cancer genes by integrating network properties,sequence features and functional annotations
原文传递
导出
摘要 The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes. The discovery of novel cancer genes is one of the main goals in cancer research. Bioinformatics methods can be used to accel- erate cancer gene discovery, which may help in the understanding of cancer and the development of drug targets. In this paper, we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence, including protein-protein interaction network properties, and sequence and functional features. We detected 55 features that were significantly different between cancer genes and non-cancer genes. Fourteen cancer-associated features were chosen to train the classifier. Four machine learning methods, logistic regression, support vector machines (SVMs), BayesNet and deci- sion tree, were explored in the classifier models to distinguish cancer genes from non-cancer genes. The prediction power of the different models was evaluated by 5-fold cross-validation. The area under the receiver operating characteristic curve for logistic regression, SVM, Baysnet and J48 tree models was 0.834, 0.740, 0.800 and 0.782, respectively. Finally, the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database, and 197,6 cancer gene candidates were identified, We found that the integrated prediction model performed much better than the models based on the individual biological evidence, and the network and functional features had stronger powers than the sequence features in predicting cancer genes.
出处 《Science China(Life Sciences)》 SCIE CAS 2013年第8期751-757,共7页 中国科学(生命科学英文版)
基金 supported by the National Natural Science Foundation of China (31000591,31000587,31171266)
关键词 cancer gene logistic regression network property sequence feature functional annotation 序列特征 癌基因 网络性能 预测力 logistic回归分析 功能注释 整合 Logistic回归
  • 相关文献

参考文献26

  • 1Vogel stein B, Kinzler K W. Cancer genes and the pathways they control. Nat Med, 2004, 10: 789-799.
  • 2Futreal P A, Coin L, Marshall M, et al, A census of human cancer genes. Nat Rev Cancer, 2004, 4: 177-183.
  • 3Strausberg R L, Simpson A J, Wooster R. Sequence-based cancer genomics: progress, lessons and opportunities. Nat Rev Genet, 2003, 4:409-418.
  • 4Altshuler D, Daly M J, Lander E S. Genetic mapping in human dis?ease. Science, 2008, 322: 881-888.
  • 5Aragues R, Sander C, Oliva B. Predicting cancer involvement of genes from heterogeneous data. BMC Bioinformatics, 2008, 9: 172.
  • 6Furney S J, Higgins D G, Ouzounis C A, et al. Structural and func?tional properties of genes involved in human cancer. BMC Genomics, 2006,7:3.
  • 7Ostlund G, Lindskog M, Sonnhammer E L. Network-based Identifi?cation of novel cancer genes. Mol Cell Proteomics, 2010, 9: 648-655.
  • 8Li L, Zhang K, Lee J, et aJ. Discovering cancer genes by integrating network and functional properties. BMC Med Genomics, 2009, 2: 61.
  • 9Wang E, Lenferink A, O'Connor-McCourt M. Cancer systems biol?ogy: exploring cancer-associated genes on cellular networks. Cell Mol Life Sci, 2007, 64: 1752-1762.
  • 10Milenkovic T, Memisevic V, Ganesan A K, et al. Systems-level can?cer gene identification from protein interaction network topology ap?plied to melanogenesis-related functional genomics data. J R Soc, 2010,7: 423-437.

同被引文献4

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部