摘要
AIM: To verify gene expression profiles for colorectal cancer using 12 internet public microarray datasets.
AIM:To verify gene expression profiles for colorectal cancer using 12 internet public microarray datasets.METHODS:Logistic regression analysis was performed,and odds ratios for each gene were determined between colorectal cancer(CRC)and controls.Twelvepublic microarray datasets of GSE 4107,4183,8671,9348,10961,13067,13294,13471,14333,15960,17538,and 18105,which included 519 cases of adenocarcinoma and 88 normal mucosa controls,were pooled and used to verify 17 selective genes from 3 published studies and estimate the external generality.RESULTS:We validated the 17 CRC-associated genes from studies by Chang et al(Model 1:5 genes),Marshall et al(Model 2:7 genes)and Han et al(Model 3:5genes)and performed the multivariate logistic regression analysis using the pooled 12 public microarray datasets as well as the external validation.The goodnessof-fit test of Hosmer-Lemeshow(H-L)showed statistical significance(P=0.044)for Model 2 of Marshall et al in which observed event rates did not match expected event rates in subgroups of the model population.Expected and observed event rates in subgroups were similar,which are called well calibrated,in Models 1,3and 4 with non-significant P values of 0.460,0.194 and1.000 for H-L tests,respectively.A 7-gene model of CPEB4,EIF2S3,MGC20553,MS4A1,ANXA3,TNFAIP6and IL2RB was pairwise selected,which showed the best results in logistic regression analysis(H-L P=1.000,R2=0.951,areas under the curve=0.999,accuracy=0.968,specificity=0.966 and sensitivity=0.994).CONCLUSION:A novel gene expression profile was associated with CRC and can potentially be applied to blood-based detection assays.