摘要
结直肠癌是消化系统常见的恶性肿瘤之一,死亡率居发达国家恶性肿瘤死亡率的第3位。本文通过生物分析进行结直肠癌致病基因的识别。首先,基于GEO中GSE9348基因表达数据集,利用R语言的LIMMA包筛选出P<0.05,Fold change>2的结直肠癌差异基因339个;其次,基于OMIM数据库中已知结直肠癌的致病基因和STRING数据库,获得差异表达基因与致病基因的蛋白质互作网络;接着利用Cytoscape软件的ClusterONE插件进行蛋白质互作网络模块分析,获得一个含有53个基因的子网络;最后,通过对子网络的拓扑分析,获得了FOS、CCND1、CEBPB、EGR1和NOS3等5个新结直肠癌致病基因。同时,通过功能富集分析和文献挖掘对新发现的致病基因进行验证。
Colorectal cancer is one of the common malignant tumors in digestive system,and its mortality rate is third of malignant tumor mortality in developed countries.The aim of this paper is to identify the pathogenic gene of colorectal cancer through biological analysis and data mining.Firstly,the expression spectrum dataset GSE9348 is downloaded from GEO database,and 339 differentially expressed genes are screened with P<0.05 and Fold change>2 in colorectal cancer by using LIMMA Package in R language.Secondly,based on the known disease genes of colorectal cancer,OMIM database and STRING database,the PPI network composed by the differentially expressed genes and known disease genes is obtained.Furthermore,the network module analysis is performed through ClusterONE plugin of Cytoscape software,and a subnetwork containing 53 genes is obtained.Finally,through network topology analysis,5 candidate genes of colorectal cancer are considered to be candidate disease genes of colorectal cancer,including CCND1,EGR1,FOS,CEBPB and NOS3.Simultaneously,the newly discovered genes are verified by using the functional enrichment analysis and literature mining.
作者
吴慧慧
唐旭清
Wu Huihui;Tang Xuqing(College of Science,Jiangnan University,Wuxi,214122,China;Wuxi Engineering Research Center for Biocomputing,Jiangnan University,Wuxi,214122,China)
出处
《数据采集与处理》
CSCD
北大核心
2018年第4期654-661,共8页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(11371174)资助项目
国际科技合作研究(2011DFR 70500)资助项目
关键词
结直肠癌
蛋白互作网络
聚类分析
网络拓扑分析
功能富集分析
colorectal cancer
protein interaction network
clustering analysis
network topology analysis
functional enrichment analysis