期刊文献+

肺腺癌吸烟相关甲基化模式识别分类模型及特征基因的识别研究 被引量:2

Genome-Wide Smoke Related Methylation Signature Genes Identification for Lung Adenocarcinomas
下载PDF
导出
摘要 吸烟是导致肺癌的一个重要诱导因素,从全基因组基因甲基化水平出发,利用生物信息学方法,通过建立对当前吸烟/不吸烟样本的模式识别分类模型,识别甲基化特征基因,为揭示不吸烟肺癌患者的患病机理奠定基础。为避免甲基化微阵列数据超高维小样本、高噪声、高相关性以及信息饱和现象淹没真正的特征基因,首次采用迭代多重筛选方法,分别从显著性差异、与基因表达水平的关系、生物功能、分类重要性等多个角度对全基因组甲基化数据进行多步筛选,从而识别吸烟相关特征基因。以TCGA数据库中127个肺腺癌样本为训练集,64个EDRN肺腺癌样本为独立测试集,最终确定了48个关键基因。相应模式识别模型对训练集精度达到87.5%(敏感性、特异性分别为87.2%和87.8%),独立测试集分类精度达到76.4%(敏感性、特异性分别为80.2%和73.6%)。交叉研究表明,其中17个基因对癌症发展的重要性已经在其他研究中有所证实,进一步的研究则证明其甲基化的重要性。同时,KEGG和IPA对特征基因在基因调控网络和代谢通路水平的分析表明,特征基因与癌症的发展以及生物功能、细胞发育等都有着密切的联系。 To understand the biological mechanism of never smoker lung adenocarcinomas,we focused on the genome-wide methylation values( ME) to discover signature genes for the distinguishing of current / never smokers. In order to overcome the disadvantages of small-size-high-dimension,high noise and to overcome the predominate influence of the whole genome to the dozens of signature genes,a new integrative selection method was used iteratively to uncover the real signature genes. To do this,instead of using only one criteria for gene selection,we identified genes according to their significance test performance,the relationship between their methylation levels and expression levels,the biological function and the contribution to the current / never smoker classification. As a result,48 genes were identified as ME smoke related signature genes based on the127 lung adenocarcinoma samples downloaded from TCGA database. Then we used 64 EDRN lung adenocarcinoma samples as an independent validation set. Only using the methylation values of these 48 signature genes,the current / never smoker classification accuracy of TCGA training set is 87. 5%( SN =87. 2%,SP = 87. 8%) and for EDRN validation set is 76. 4%( SN = 80. 2%,SP = 73. 6%),respectively.Cross-study proved the highly cancer related of 17 important genes in our 48 signature genes. Addition to these results,we proved the importance of their corresponding methylation values. The ingenuity pathway( IPA) and Kyoto encyclopedia of genes and genomes( KEGG) pathways analysis indicated the relationships among these genes on the genetic network level and pathway levels. They also indicated they are involved in the highly cancer-related pathways.
出处 《中国生物医学工程学报》 CAS CSCD 北大核心 2016年第3期301-309,共9页 Chinese Journal of Biomedical Engineering
基金 国家自然科学基金(31271351)
关键词 肺腺癌 甲基化数据 吸烟史 模式识别 分类 lung adenocarcinoma methylation values smoke exposure pattern recognition classification
  • 相关文献

参考文献34

  • 1Figueroa JD, Han SS, Garcia-Closas M, et al. Genome-wide interaction study of smoking and bladder cancer risk [ J ]. Carcinogenesis, 2014, 35(8) : 1737 - 1744.
  • 2Figueroa JD, Han SS, Garcia-Closas M, et al. Genome-wide interaction study of smoking and bladder cancer risk [ J ]. Carcinogenesis, 2014, 35 (8) : 1737 - 1744.
  • 3Toh CK, Gao F, Lim WT, et al. Never-smokers with lung cancer: epidemiologic evidence of a distinct disease entity [ J]. Journal of Clinical Oncology, 2006, 24 ( 15 ) : 2245 - 2251.
  • 4Kiyohara C, Wakai K, Mikami H, et aL Risk modification by CYP1A1 and GSTM1 polymorphisms in the association of environmental tobacco smoke and lung cancer: a case-control study in Japanese nonsmoking women [ J]. International Journal of Cancer, 2003, 107( 1 ) : 139 - 144.
  • 5Gabrielson E. Worldwide trends in lung cancer pathology [ J ]. Respirology, 2006, 11 (5) : 533 - 538.
  • 6Radzikowska E, Glaz P, Roszkowski K. Lung cancer in women: age, smoking, histology, performance status, stage, initial treatment and survival [ J]. Annals of Oncology, 2002, 13 (7) : 1087 - 1093.
  • 7Allison DB, Cui X, Page GP, et al. Microarray data analysis: from disarray to consolidation and consensus [ J ]. Nature Reviews Genetics, 2006, 7 ( 1 ) : 55 - 65.
  • 8Kim SC, Jung Y, Park J, et al. A high-dimensional, deep- sequencing study of lung adenocarcinoma in female never-smokers [J]. PLoSONE, 2013,8(2): e55596.
  • 9Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data [ J]. Bioinformatics, 2014,30(15) : 2114 -2120.
  • 10Lee KW, Pausova Z. Cigarette smoking and DNA methylation [ J]. Frontiers in Genetics, 2013, 4 ( 1 ) : 132 - 142.

同被引文献14

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部