摘要
选取癌症基因组图谱数据库的肺鳞状细胞癌(Lung Squamous Cell Carcinoma,LUSC)样本作为数据集,在全基因组的水平上研究肺鳞状细胞癌病人从正常到发病I期基因表达的变化,寻找与LUSC发病密切相关的早期标志物,并建立一种基于早期标志基因的肿瘤预测模型。方法采用模式识别分类法和基因通路和功能分析相结合的筛选方法,对LUSC的早期标志物进行识别,并运用Fisher判别建立肿瘤预测模型。得到12个LUSC的早期标志物,分别是CLDN18,CD34,ESAM,JAM2,CDH5,F11,F8,CFD,MRC1,MARCO,SFTPA2和SFTPA1,机器学习建模后对LUSC早期癌症样本和正常肺组织样本的分类精度达到了98%以上。由基因SFTPA1和ESAM建立的LUSC早期肿瘤预测模型,对正常肺组织和LUSC肿瘤Ⅰ期样本的分类敏感性和特异性分别为99.18%和100%,并且独立验证集的分类准确率也在90%以上。结论筛选出的12个早期分子标志物有望成为LUSC诊断的标志分子,并且建立的肿瘤预测模型具有极高的准确性,可以为LUSC的发生机理研究以及早期肿瘤预测提供帮助。
Lung squamous cell carcinoma(LUSC)samples selected from the cancer genome atlas(TCGA)database were used as dataset to investigate differences of gene expression in cancer patients from normal to stage I cancer at the whole genome-level.Early molecular markers of LUSC were explored,and a tumor prediction model based on early marker genes was established.The early markers of LUSC were identified by the combination of pattern recognition classification,gene pathway and functional analysis,and the prediction model was established by Fisher discriminant.According to the screening procedure,12 early markers of LUSC were obtained,namely CLDN18,CD34,ESAM,JAM2,CDH5,F11,F8,CFD,MRC1,MARCO,SFTPA2,and FTPA1.Modeling by machine learning method,the classification accuracy rate of early cancer samples and normal lung tissue samples of LUSC was over 98%.Based on the selected early LUSC markers,the Fisher discriminant analysis method was used to establish a prediction model.The specificity and sensitivity of the LUSC early tumor prediction model established on the basis of SFTPA1 and ESAM for normal lung tissue and stage I cancer samples were 100%and 99.18%,respectively.The classification accuracy of the independent validation set was more than 90%.The 12 early molecular markers are expected to be the marker molecules for the diagnosis of LUSC,and the established tumor prediction model has high accuracy,which can be helpful for the study of the pathogenesis of LUSC and early tumor prediction.
作者
尚文慧
王晓曦
李晓琴
高斌
SHANG Wenhui;WANG Xiaoxi;LI Xiaoqin;GAO Bin(College of Life Science and Bioengineering,Beijing University of Technology,Beijing 100124,China)
出处
《生物信息学》
2020年第4期223-235,共13页
Chinese Journal of Bioinformatics
基金
国家自然科学基金项目(No.11572014)
国家科技部重点研发项目(No.2017YFC0111104).
关键词
肺鳞状细胞癌
基因表达
肿瘤发生
早期标志物
诊断模型
Lung squamous cell carcinoma
Gene expression
Tumorigenesis
Early markers
Prediction model