摘要
目的基于microRNA组学数据,探讨加权随机森林在三阴性乳腺癌分类预测中的应用,为疾病诊断提供方法学支撑。方法以TCGA乳腺癌数据为例,采用加权随机森林构建三阴性乳腺癌的分类预测模型,并与随机森林、logistic回归、支持向量机、LASSO和岭回归五种模型进行比较。结果通过比较六种模型的5个评价指标,加权随机森林模型的预测性能明显优于其他五种模型,加权随机森林模型的灵敏度为0.852、特异度为0.873、准确度为0.871、AUC值为0.862和G-means值为0.861。结论加权随机森林构建的分类预测模型较好地识别了三阴性乳腺癌患者,可为三阴性乳腺癌的诊断提供方法学上的参考。
Objective Based on microRNA omics data,this study explored the application of weighted random forest(WRF)in the classification prediction of triple negative breast cancer(TNBC),providing methodological support for disease diagnosis.Methods Taking the TCGA breast cancer data as an example,the WRF was used to construct the classification prediction model for TNBC,and was compared with five methods:random forest,logistic regression,support vector machine,LASSO and ridge regression.Results By comparing five evaluation indexes of six models,the classification performance of WRF model was significantly better than the other five models.The sensitivity,specificity,accuracy,AUC and G-means of WRF model were 0.852,0.873,0.871,0.862 and 0.861,respectively.Conclusion The classification prediction model constructed by WRF can identify the patients with TNBC well,andcan provide a methodological reference for the diagnosis of TNBC.
作者
郭志飞
王碧珏
杨海涛
李治
王菊平
曹红艳
周立业
Guo Zhifei;Wang Bijue;Yang Haitao(Department of health management,Shanxi Medical University(030001),Taiyuan)
出处
《中国卫生统计》
CSCD
北大核心
2020年第6期809-812,817,共5页
Chinese Journal of Health Statistics
基金
国家自然科学基金资助(71403156)
山西省回国留学人员科研资助(2017-054)
山西省应用基础研究计划(201901D111204)
河北省自然科学基金(H2019206558)。