摘要
Personalized medicine is critical for lung cancer treatment. Different gene signatures that can classify lung cancer patients as high- or low-risk for cancer recurrence have been found. The aim of this study is to identify a novel gene signature that has higher recurrence risk prediction accuracy for non-small cell lung cancer patients than previous re-search, which can clearly differentiate the high- and low-risk groups. To accomplish this we employed an ensemble of feature selection algorithms, an ensemble of classification algorithms, and a genetic algorithm, an evolutionary search algorithm. Compared to one previous study, our 12-gene signature more accurately classifies the patients in the training set (n = 256), 57.32% compared to 50.78%, as well as in the two test sets (n = 104 and n = 82), 67.07% compared to 54.9% and 57.32% compared to 54.8%;where the prediction accuracy was determined by the average of the four classifiers. Through Kaplan-Meier analysis on high- and low-risk patients our 12-gene signature revealed statistically significant risk differentiation in each data set: the training set had a p-value less than 0.001 (log-rank) and the two test sets had (log-rank) p-values less than 0.05. Analysis of the posterior probabilities revealed strong correlation between 5-year survival and the 12-gene signature. Also, functional pathway analysis uncovered associations between the 12-gene signature and cancer causing genes in the literature.
Personalized medicine is critical for lung cancer treatment. Different gene signatures that can classify lung cancer patients as high- or low-risk for cancer recurrence have been found. The aim of this study is to identify a novel gene signature that has higher recurrence risk prediction accuracy for non-small cell lung cancer patients than previous re-search, which can clearly differentiate the high- and low-risk groups. To accomplish this we employed an ensemble of feature selection algorithms, an ensemble of classification algorithms, and a genetic algorithm, an evolutionary search algorithm. Compared to one previous study, our 12-gene signature more accurately classifies the patients in the training set (n = 256), 57.32% compared to 50.78%, as well as in the two test sets (n = 104 and n = 82), 67.07% compared to 54.9% and 57.32% compared to 54.8%;where the prediction accuracy was determined by the average of the four classifiers. Through Kaplan-Meier analysis on high- and low-risk patients our 12-gene signature revealed statistically significant risk differentiation in each data set: the training set had a p-value less than 0.001 (log-rank) and the two test sets had (log-rank) p-values less than 0.05. Analysis of the posterior probabilities revealed strong correlation between 5-year survival and the 12-gene signature. Also, functional pathway analysis uncovered associations between the 12-gene signature and cancer causing genes in the literature.