摘要
Several gene signatures have been identified to build predictors of chemosensitivity for breast cancer. It is crucial to understand how each gene in a signature contributes to the prediction, i.e., to make the prediction model interpretable instead of using it as a black box. We utilized Random Forests (RFs) to build two interpretable predictors of pathologic complete response (pCR) based on two gene signatures. One signature consisted of the top 31 probe sets (27 genes) differentially expressed between pCR and residual disease (RD) chosen from a previous study, and the other consisted of the genes involved in Notch singling pathway (113 genes). Both predictors had a higher accuracy (82% v 76% & 79% v 76%), a higher specificity (91% v 71% & 98% v 71%), and a higher positive predictive value (PPV) (68% v 52% & 73% v 52%)) than the predictor in the previous study. Furthermore, Random Forests were employed to calculate the importance of each gene in the two signatures. Findings of our functional annotation suggested that the important genes identified by the feature selection scheme of Random Forests are of biological significance.
Several gene signatures have been identified to build predictors of chemosensitivity for breast cancer. It is crucial to understand how each gene in a signature contributes to the prediction, i.e., to make the prediction model interpretable instead of using it as a black box. We utilized Random Forests (RFs) to build two interpretable predictors of pathologic complete response (pCR) based on two gene signatures. One signature consisted of the top 31 probe sets (27 genes) differentially expressed between pCR and residual disease (RD) chosen from a previous study, and the other consisted of the genes involved in Notch singling pathway (113 genes). Both predictors had a higher accuracy (82% v 76% & 79% v 76%), a higher specificity (91% v 71% & 98% v 71%), and a higher positive predictive value (PPV) (68% v 52% & 73% v 52%)) than the predictor in the previous study. Furthermore, Random Forests were employed to calculate the importance of each gene in the two signatures. Findings of our functional annotation suggested that the important genes identified by the feature selection scheme of Random Forests are of biological significance.