摘要
目的 探讨利用实验室全血细胞计数(complete blood count, CBC)数据和机器学习算法构建侵袭性乳腺癌预测模型,并评价其临床应用价值。方法 回顾性收集2014年1月至2022年6月来自北京市3家医院的15 979条患者数据,将其划分为训练集、验证集和测试集。应用递归特征消除法确定特征变量。使用支持向量机、随机森林、梯度提升树、逻辑回归和K最近邻5种机器学习算法构建模型。采用受试者工作特征曲线下面积(area under the receiver operating characteristic curve, AUC)、灵敏度、特异性和准确度4项指标及其95%置信区间(95%CI)评估模型性能。利用混淆矩阵图验证最佳模型的临床有效性。结果 纳入AGE、EO%、RBC、NEUT#、MCH、MPV、PDW、EO#、RDW-CV和LYMPH#10个特征变量构建模型,随机森林模型性能表现最佳,在测试集中,AUC为0.923(95%CI 0.890~0.955),灵敏度为91.4%(95%CI 0.876~0.901),特异性为83.8%(95%CI 0.832~0.837),准确度为84.2%(95%CI 0.835~0.840)。经临床有效性验证的最佳模型准确度、灵敏度和特异性分别达到85.40%、72.97%和90.00%。结论 利用CBC数据和机器学习算法构建的侵袭性乳腺癌预测模型具有高灵敏度和高特异性,作为一种便捷、高效的辅助工具,可以帮助医生早期识别具有侵袭性乳腺癌风险的患者。
Objective To develop a predictive model for invasive breast cancer using laboratory complete blood count(CBC)data and machine learning algorithms,and to assess its clinical application value.Methods A retrospective collection of 15979 patients’data from January,2014 to June,2022 from 3 hospitals in Beijing area was conducted and data was divided into training,validation,and test sets.Recursive feature elimination(RFE)technology was used to determine features.Five machine learning algorithms,including support vector machine(SVM),random forest(RF),gradient boosting tree(GBT),logistic regression(LR),and k-nearest neighbors(KNN),were employed to construct the model.Model performance was evaluated using four indicators:area under the receiver operating characteristic curve(AUC),sensitivity,specificity,and accuracy,as well as their 95%confidence intervals(95%CI).The clinical effectiveness of the optimal model was validated using a confusion matrix.Results Ten features,including AGE,EO%,RBC,NEUT#,MCH,MPV,PDW,EO#,RDW-CV,and LYMPH#,were included in the model.RF model showed the best performance.In the test set,the model exhibited an AUC of 0.923(95%CI 0.890-0.955),sensitivity of 91.4%(95%CI 0.876-0.901),specificity of 83.8%(95%CI 0.832-0.837),and accuracy of 84.2%(95%CI 0.835-0.840).With the clinical effectiveness validation,the optimal model showed accuracy,sensitivity,and specificity of 85.40%,72.97%,and 90.00%,respectively.Conclusion The invasive breast cancer prediction model based on CBC data and machine learning algorithms has a high sensitivity and specificity.As a convenient and efficient tool,it could facilitate clinicians to identify patients at risk for invasive breast cancer at an early stage.
作者
安旭
黄大伟
焦明远
周睿
王清涛
AN Xu;HUANG Dawei;JIAO Mingyuan;ZHOU Rui;WANG Qingtao(Department of Clinical Laboratory,Beijing Chaoyang Hospital,Capital Medical University,Beijing 100020,China;Department of Clinical Laboratory,Beijing Longfu Hospital,Beijing 100009,China;Department of Clinical Laboratory,Tongzhou Maternal and Child Health Hospital of Beijing,Beijing 101100,China)
出处
《标记免疫分析与临床》
CAS
2023年第4期665-671,679,共8页
Labeled Immunoassays and Clinical Medicine
基金
2022年度北京市通州区科技计划项目(编号:2022-TZFY-015-01)。
关键词
乳腺癌
全血细胞计数
机器学习
预测模型
Breast cancer
Complete blood count
Machine learning
Prediction model