摘要
目的:整合基因表达谱和拷贝数变异谱数据来揭示Ⅱ~Ⅲ期结直肠癌的分子分型,并探索各分型与结直肠癌术后复发的关系。方法:从网络公共资源中下载结直肠癌的基因表达谱数据及对应的拷贝变异谱数据,经批间校正、四分位数标准化、缺失值估算及特征过滤等处理后获得用于后续整合分析的基因表达谱和基因组拷贝数谱数据;选用贝叶斯一致性聚类(BCC)算法整合上述两种谱学数据进行结直肠癌分子亚型分析;结合结直肠癌患者的复发和生存数据,利用生存分析评价各亚型的预后预测能力;并用基因系列富集度分析软件比较不同亚型所富集的生物学信号。所有谱数据分析基于R-3.0.1平台,统计分析采用SPSS 16.0软件包。结果:从公共数据库中共选用335例结直肠癌患者的结直肠癌组织谱学数据,在特征选择后有1578个mRNA探针和345个拷贝数变异位置用于BCC综合聚类;BCC法聚类将335例结直肠癌分为4个亚型,该结果与单纯的基因表达谱分型(Cramer’s V=0.49,P〈0.001)、基因拷贝数谱分型(Cramer’s V=0.51,P〈0.001)及芯片数据提供者的表达谱分型(Cramer’s V=0.32,P〈0.001)显著相关,其中,BCC-Ⅰ亚型预后最好,BCC-Ⅳ亚型预后最差,而BCC-(Ⅱ+Ⅲ)亚型预后居中,三组的log—rank P〈0.001,单因素Cox模型分析所得HR=1.55(95%CI:1.22~1.99);基因富集度分析显示BCC—Ⅳ亚型相对于BCC-Ⅰ亚型的最大差异生物功能信号是DNA损伤修复(涉及52个基因),用DNA损伤修复基因重新对BCC-Ⅳ亚型和BCC—Ⅰ亚型的结直肠癌样本分组,该分组的预后效果明显优于BCC分型,但与BCC分型结果显著相关(Cramer's V=0.39,P〈0.001)。结论:BCC法能够有效整合不同组学数据进行结直肠癌肿瘤分型;BCC-Ⅳ亚型的预后最差,可能与DNA损伤修复能力降低有关。
Objective: To classify colorectal carcinoma (CRC) by TNM staging integrated with the gene expression profile and copy number variation (CNV). Methods: Profile data of gene expression and CNV of CRC were downloaded from public database and processed with batch bias adjustment, quartile normalization, missing value estimation and feature filtration. The processed profiles of mRNA and CNV were introduced into the codes of Bayesian consensus clustering (BCC) method and were used to calculate the subclasses of CRC. With the follow-up information of disease free survival of CRC patients, the prognostic values of the subclasses was investigated and the software of function enrichment analysis was employed to discover the major pathway signaling to each interesting subclass. All statistic analyses were performed under R-3.0.1 environment or by using SPSS 16.0 software. Results: Profile data of gene expression and corresponding CNV from 335 CRC patients with TNM stage Ⅱ-Ⅲ and followed-up information were obtained. After feature filtering, the profiles contained 1578 probes of mRNA and 345 location of CNV. Four CRC subclasses were identified by the integrative analysis with BCC, and the concordances of BCC subclasses and each of gene-based subclasses (Cramer's V = 0. 49) , CNV -based subclasses (Cramer's V = 0. 51) and Marisa's subclasses (Cramer's V = 0. 32) were statistically significant (Ps 〈 0. 001). Among BCC subclasses, BCC-Ⅰ had a favorable prognosis, while BCC- Ⅳ had more unfavorable prognosis. The differences of prognosis were significant among BCC-Ⅰ, BCC-( Ⅱ + Ⅲ) and BCC- Ⅳ with an overall log-rank P 〈 0.001. The top enriched function was DNA damage and repair signaling when BCC-Ⅰ compared to BCC- Ⅳ, and the new subgroups classified by the genes associated with enriched signaling had the better prognostic value than BCC subclasses but both of them were significantly correlated (Cramer's V = 0. 39, P 〈0. 001). Conclusion: BCC method is effective to integrate multi-type genomic data for molecular classification of colorectal carcinoma, and the BCC - Ⅳ subclass has poor prognosis, which may be associated with the decreased repairing function of DNA damage.
出处
《浙江大学学报(医学版)》
CAS
CSCD
北大核心
2014年第4期420-426,共7页
Journal of Zhejiang University(Medical Sciences)