Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis th...Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis than those without. Therefore, p CR is now considered as the best prognosticator for patients with neoadjuvant chemotherapy. However, not all patients can benefit from this treatment. As a result, we need to find a way to predict what kind of patients can induce p CR. Various gene signatures of chemosensitivity in breast cancer have been identified, from which such predictors can be built. Nevertheless, many of them have their prediction accuracy around 80%. As such, identifying gene signatures that could be employed to build high accuracy predictors is a prerequisite for their clinical tests and applications. Furthermore, to elucidate the importance of each individual gene in a signature is another pressing need before such signature could be tested in clinical settings. In this study, Genetic Algorithm(GA) and Sparse Logistic Regression(SLR) along with t-test were employed to identify one signature. It had 28 probe sets selected by GA from the top 65 probe sets that were highly overexpressed between p CR and Residual Disease(RD) and was used to build an SLR predictor of p CR(SLR-28). This predictor tested on a training set(n = 81) and validation set(n = 52) had very precise predictions measured by accuracy,specificity, sensitivity, positive predictive value, and negative predictive value with their corresponding P value all zero. Furthermore, this predictor discovered 12 important genes in the 28 probe set signature. Our findings also demonstrated that the most discriminative genes measured by SLR as a group selected by GA were not necessarily those with the smallest P values by t-test as individual genes, highlighting the ability of GA to capture the interacting genes in p CR prediction as multivariate techniques. Our gene signature produced superior performance over a signature found in one previous study with prediction accuracy 92% vs 76%, demonstrating the potential of GA and SLR in identifying robust gene signatures in chemo response prediction in breast cancer.展开更多
文摘Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis than those without. Therefore, p CR is now considered as the best prognosticator for patients with neoadjuvant chemotherapy. However, not all patients can benefit from this treatment. As a result, we need to find a way to predict what kind of patients can induce p CR. Various gene signatures of chemosensitivity in breast cancer have been identified, from which such predictors can be built. Nevertheless, many of them have their prediction accuracy around 80%. As such, identifying gene signatures that could be employed to build high accuracy predictors is a prerequisite for their clinical tests and applications. Furthermore, to elucidate the importance of each individual gene in a signature is another pressing need before such signature could be tested in clinical settings. In this study, Genetic Algorithm(GA) and Sparse Logistic Regression(SLR) along with t-test were employed to identify one signature. It had 28 probe sets selected by GA from the top 65 probe sets that were highly overexpressed between p CR and Residual Disease(RD) and was used to build an SLR predictor of p CR(SLR-28). This predictor tested on a training set(n = 81) and validation set(n = 52) had very precise predictions measured by accuracy,specificity, sensitivity, positive predictive value, and negative predictive value with their corresponding P value all zero. Furthermore, this predictor discovered 12 important genes in the 28 probe set signature. Our findings also demonstrated that the most discriminative genes measured by SLR as a group selected by GA were not necessarily those with the smallest P values by t-test as individual genes, highlighting the ability of GA to capture the interacting genes in p CR prediction as multivariate techniques. Our gene signature produced superior performance over a signature found in one previous study with prediction accuracy 92% vs 76%, demonstrating the potential of GA and SLR in identifying robust gene signatures in chemo response prediction in breast cancer.