Background Colorectal cancer(CRC)is the second leading cause of cancer fatalities and the third most common human disease.Identifying molecular subgroups of CRC and treating patients accordingly could result in better...Background Colorectal cancer(CRC)is the second leading cause of cancer fatalities and the third most common human disease.Identifying molecular subgroups of CRC and treating patients accordingly could result in better therapeutic success compared with treating all CRC patients similarly.Studies have highlighted the significance of CRC as a major cause of mortality worldwide and the potential benefits of identifying molecular subtypes to tailor treatment strategies and improve patient outcomes.Methods This study proposed an unsupervised learning approach using hierarchical clustering and feature selection to identify molecular subtypes and compares its performance with that of conventional methods.The proposed model contained gene expression data from CRC patients obtained from Kaggle and used dimension reduction techniques followed by Z-score-based outlier removal.Agglomerative hierarchy clustering was used to identify molecular subtypes,with a P-value-based approach for feature selection.The performance of the model was evaluated using various classifiers including multilayer perceptron(MLP).Results The proposed methodology outperformed conventional methods,with the MLP classifier achieving the highest accuracy of 89%after feature selection.The model successfully identified molecular subtypes of CRC and differentiated between different subtypes based on their gene expression profiles.Conclusion This method could aid in developing tailored therapeutic strategies for CRC patients,although there is a need for further validation and evaluation of its clinical significance.展开更多
文摘Background Colorectal cancer(CRC)is the second leading cause of cancer fatalities and the third most common human disease.Identifying molecular subgroups of CRC and treating patients accordingly could result in better therapeutic success compared with treating all CRC patients similarly.Studies have highlighted the significance of CRC as a major cause of mortality worldwide and the potential benefits of identifying molecular subtypes to tailor treatment strategies and improve patient outcomes.Methods This study proposed an unsupervised learning approach using hierarchical clustering and feature selection to identify molecular subtypes and compares its performance with that of conventional methods.The proposed model contained gene expression data from CRC patients obtained from Kaggle and used dimension reduction techniques followed by Z-score-based outlier removal.Agglomerative hierarchy clustering was used to identify molecular subtypes,with a P-value-based approach for feature selection.The performance of the model was evaluated using various classifiers including multilayer perceptron(MLP).Results The proposed methodology outperformed conventional methods,with the MLP classifier achieving the highest accuracy of 89%after feature selection.The model successfully identified molecular subtypes of CRC and differentiated between different subtypes based on their gene expression profiles.Conclusion This method could aid in developing tailored therapeutic strategies for CRC patients,although there is a need for further validation and evaluation of its clinical significance.