摘要
目的 联合使用遗传因素和吸烟信息构建中国汉族人群的肺癌风险预测模型.方法 基于中国汉族人群全基因组关联研究(GWAS)数据,根据样本地区来源将样本分为训练集(南京与上海:1 473名病例vs.1 962名对照)和测试集(北京与武汉:858名病例vs.1 115名对照).系统整理已报道肺癌易感位点,在训练集中用逐步后退法筛选具有独立效应的位点,并通过加权法估算个体遗传得分用于建模.在训练集中分别构建基于吸烟信息、遗传得分和联合使用吸烟与遗传信息的3种风险预测模型(吸烟模型、遗传效应模型和联合模型),并根据受试者工作特征(ROC)曲线、曲线下面积(AUC)、净分类指数(NRI)和整体鉴别指数(IDI)评价模型对肺癌风险预测的效能.对于构建的模型,进一步在测试集中进行验证.结果 在训练集中,联合模型、吸烟模型和遗传效应模型AUC分别为0.69(0.67 ~ 0.71)、0.65(0.63 ~ 0.66)和0.60(0.59 ~ 0.62).在训练集和测试集中联合模型的风险预测效能高于吸烟模型或遗传模型,差异有统计学意义(P<0.001).重分类结果显示,联合模型与吸烟模型相比,在训练集中NRI增加4.57% (2.23% ~6.91%),IDI增加3.11%(2.52% ~ 3.69%).在测试集中,NRI和IDI分别增加2.77%和3.16%.结论 遗传得分可以显著提高肺癌传统风险模型的预测效能.联合使用遗传因素和吸烟信息构建的中国汉族人群肺癌风险预测模型可用于筛选中国汉族人群中肺癌发病的高危人群.
Objective To evaluate the predictive power of risk model by combining traditional epidemiological factors and genetic factors.Methods Our previous GWAS data of lung cancer in Chinese were used in training set (Nanjing and Shanghai:1 473 cases vs.1 962 control) and testing set (Beijing and Wuhan:858 cases vs.1 115 control).All the single nucleotide polymorphisms (SNPs) associated with lung cancer risk were systematically selected and stepwise logistic regression analysis was used to select independent factors in the training set.The wGRS (weighted genetic score) was further used to calculate genetic risk score.To evaluate the contribution of the genetic factors,3 risk models were established by using the training set,i.e.smoking model (based on smoking status),genetic risk model (based on genetic risk score) and combined model (based on smoke and genetic risk score).The predictability of the models were evaluated by the areas under the receiver operating characteristic (ROC) curves,area under curve (AUC),net reclassification improvement (NRI) and integrated discrimination index (IDI).Besides,the results were further verified in the testing set.Results In the training set,it was found that the AUC of the smoking,genetic risk and combined models were 0.65 (0.63-0.66),0.60 (0.59-0.62) and 0.69 (0.67-0.71),respectively.Compared with combined model,the predictive power of other two models significantly declined,the difference was statistically significant (P〈0.001).Furthermore,compared with the smoking model,the NRI of the combined model increased by 4.57% (2.23%-6.91%) and IDI increased by 3.11% (2.52%-3.69%) in the training set,the difference was statistically significant (P〈 0.001).Similarly,in the testing set NRI increased by 2.77%,the difference was not statistically significant (P=0.069),and IDI increased by 3.16%,the difference was statistically significant (P〈0.001).Conclusion This study showed that combining 14 genetic variants with traditional epidemiological factors could improve the predictive power of risk model for lung cancer.The model could be used in the screening of high-risk population of lung cancer in Chinese and provide evidence for the early diagnosis and treatment of lung cancer.
出处
《中华流行病学杂志》
CAS
CSCD
北大核心
2015年第10期1047-1052,共6页
Chinese Journal of Epidemiology
基金
国家自然科学基金重点项目(81230067)
江苏高校优势学科建设工程专项资金(公共卫生与预防医学)
关键词
肺癌
全基因组关联研究
风险预测模型
Lung cancer
Genome-wide association study
Risk prediction model