Predicting potential cancer genes by integrating network properties,sequence features and functional annotations 被引量：1

Predicting potential cancer genes by integrating network properties,sequence features and functional annotations

导出

摘要 The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes. The discovery of novel cancer genes is one of the main goals in cancer research. Bioinformatics methods can be used to accel- erate cancer gene discovery, which may help in the understanding of cancer and the development of drug targets. In this paper, we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence, including protein-protein interaction network properties, and sequence and functional features. We detected 55 features that were significantly different between cancer genes and non-cancer genes. Fourteen cancer-associated features were chosen to train the classifier. Four machine learning methods, logistic regression, support vector machines （SVMs）, BayesNet and deci- sion tree, were explored in the classifier models to distinguish cancer genes from non-cancer genes. The prediction power of the different models was evaluated by 5-fold cross-validation. The area under the receiver operating characteristic curve for logistic regression, SVM, Baysnet and J48 tree models was 0.834, 0.740, 0.800 and 0.782, respectively. Finally, the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database, and 197,6 cancer gene candidates were identified, We found that the integrated prediction model performed much better than the models based on the individual biological evidence, and the network and functional features had stronger powers than the sequence features in predicting cancer genes.

作者 LIU Wei XIE HongWei

机构地区 College of Mechanical & Electronic Engineering and Automatization

出处《Science China(Life Sciences)》 SCIE CAS 2013年第8期751-757,共7页 中国科学（生命科学英文版）

基金 supported by the National Natural Science Foundation of China (31000591,31000587,31171266)

关键词 cancer gene logistic regression network property sequence feature functional annotation 序列特征癌基因网络性能预测力 logistic回归分析功能注释整合 Logistic回归

分类号 R730.4 [医药卫生—肿瘤]

引文网络
相关文献

参考文献26

1Vogel stein B, Kinzler K W. Cancer genes and the pathways they control. Nat Med, 2004, 10: 789-799.
2Futreal P A, Coin L, Marshall M, et al, A census of human cancer genes. Nat Rev Cancer, 2004, 4: 177-183.
3Strausberg R L, Simpson A J, Wooster R. Sequence-based cancer genomics: progress, lessons and opportunities. Nat Rev Genet, 2003, 4:409-418.
4Altshuler D, Daly M J, Lander E S. Genetic mapping in human dis?ease. Science, 2008, 322: 881-888.
5Aragues R, Sander C, Oliva B. Predicting cancer involvement of genes from heterogeneous data. BMC Bioinformatics, 2008, 9: 172.
6Furney S J, Higgins D G, Ouzounis C A, et al. Structural and func?tional properties of genes involved in human cancer. BMC Genomics, 2006,7:3.
7Ostlund G, Lindskog M, Sonnhammer E L. Network-based Identifi?cation of novel cancer genes. Mol Cell Proteomics, 2010, 9: 648-655.
8Li L, Zhang K, Lee J, et aJ. Discovering cancer genes by integrating network and functional properties. BMC Med Genomics, 2009, 2: 61.
9Wang E, Lenferink A, O'Connor-McCourt M. Cancer systems biol?ogy: exploring cancer-associated genes on cellular networks. Cell Mol Life Sci, 2007, 64: 1752-1762.
10Milenkovic T, Memisevic V, Ganesan A K, et al. Systems-level can?cer gene identification from protein interaction network topology ap?plied to melanogenesis-related functional genomics data. J R Soc, 2010,7: 423-437.

同被引文献4

1徐赛娟,郭红,吕暾.构建时延基因调控网络的多数据源融合算法[J].小型微型计算机系统,2013,34(2):375-379. 被引量：1
2王沛,吕金虎.基因调控网络的控制:机遇与挑战[J].自动化学报,2013,39(12):1969-1979. 被引量：22
3倪小虹,孙应飞.融合多数据源的非平稳动态贝叶斯网络学习算法[J].小型微型计算机系统,2014,35(2):374-378. 被引量：2
4亓慧,王文剑,郭虎升.一种基于特征选择的SVM Bagging集成方法[J].小型微型计算机系统,2014,35(11):2533-2537. 被引量：9

引证文献1

1孟军,周广博,黄楚冰.基于多特征融合的基因调控网络构建方法研究[J].小型微型计算机系统,2016,37(4):743-747. 被引量：2

二级引证文献2

1刘晓燕,张诚诚,郭茂祖,邢林林.基于组合模型的转录调控网络构建算法研究[J].计算机科学与探索,2018,12(7):1154-1161. 被引量：2
2邱彬.无线传感网络中继节点属性特征融合方法研究[J].黑龙江工业学院学报（综合版）,2020,20(11):52-56.

1谭生龙.氨基酸序列特征向量提取方法的探讨[J].电脑知识与技术,2016,0(8):169-170.
2邓伟康,刘锋,朱二周.基于新型PSO算法优化BP神经网络的软件缺陷预测方法研究[J].微电子学与计算机,2017,34(4):39-43. 被引量：5
3Hualong Yu 1,, Jun Ni 2 , Yuanyuan Dan 3 , Sen Xu 4 1. School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China,2. Department of Radiology, Carver College of Medicine, The University of Iowa, Iowa City, IA 52242, USA,3. School of Biology and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China,4. School of Information Engineering, Yancheng Institute of Technology, Yancheng 224051, China.Mining and Integrating Reliable Decision Rules for Imbalanced Cancer Gene Expression Data Sets[J].Tsinghua Science and Technology,2012,17(6):666-673. 被引量：4
4宋东光.MEDLINE摘要本地下载与更新及癌基因表达数据的文本挖掘(英文)[J].生物信息学,2010,8(3):263-266.
5庞古乾,伍志方,郭春迓,叶爱芬.广东省前汛期分区强对流潜势预报方法研究[J].热带气象学报,2016,32(2):265-272. 被引量：13
6秦锋,杨波,程泽凯.分类器性能评价标准研究[J].计算机技术与发展,2006,16(10):85-88. 被引量：27
7田小锋,宋爱国,黄惟一.力觉临场感遥控机器人的预测控制系统设计[J].自动化仪表,2000,21(10):7-9. 被引量：3
8Li, Peizheng, Zeng, Xiaodong, Deng, Jiqiu.GEOLOGICAL FEATURE OF YUNKAI RIFT[J].中国有色金属学会会刊：英文版,1998,8(1):161-164.
9Zhao Yongheng (Beijing Astronomical Observatory, Chinese Academy of Sciences, Beijing 100012, China yzhao  lamost. bao. ac. cn).LAMOST Project and Its Scientific Goals[J].天文研究与技术,1999(S1):1-7.
10张峥嵘,詹天明,韦志辉.结合相似性拟合与空间约束的图像分割[J].中国图象图形学报,2014,19(11):1596-1603.

Science China(Life Sciences)

2013年第8期

浏览历史

内容加载中请稍等...

Predicting potential cancer genes by integrating network properties,sequence features and functional annotations 被引量：1

参考文献26

同被引文献4

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史