摘要
目的探索利用数据挖掘技术预测CD4+T淋巴细胞计数<350个/μL和<500个/μL的可能性和效果,从而为CD4+T淋巴细胞计数预测模型的选取提供参考依据。方法收集345名HIV感染男男性接触者的2837例总淋巴细胞计数、血红蛋白、血小板计数、白细胞计数和红细胞计数,分别建立基于Logistic回归、线性判别分析、决策树、反向传播多层感知机和径向基函数神经网络的预测模型,并对预测效果进行10折交叉验证。利用敏感度、特异度、阳性预测值、阴性预测值和受试者操作特征(ROC)曲线下面积对5种预测模型的性能进行评价。结果上述5种模型预测CD4<350个/μL的ROC曲线下面积(Az)分别为0.698、0.697、0.706、0.746和0.745,预测CD4<500个/μL的Az值分别为0.735、0.737、0.724、0.707和0.714,均高于单独使用总淋巴细胞计数(Az值分别为0.657和0.691)进行预测的效果。结论数据挖掘模型在预测CD4+T淋巴细胞计数方面具有一定水平,不同模型在完成不同预测任务时表现出了不同性能。
Objective To investigate the feasibility and performance of data mining technology for predicting a CD4 + T lymphocyte count of less than 350 cells/μL and 500 cells/μL in HIV-infected patients, respectively, so as to provide a reference for the selection of model for predicting CD4 + T lymphocyte count. Methods Total lymphocyte counts (TLC), hemoglobin, platelet count, white blood cell count and red blood cell count of 2837 samples, collected from 345 HIV-infected men who have sex with men, were used to build five predictive models, including Logistic regression model, linear discriminate analysis model, decision tree,BP-based multi-layer perceptron and radial basis function neural network,which were then tested by the method of lO-fold cross validation. The predictive performances of the five models were assessed and compared by measures of sensitivity, specificity, positive predictive value, negative predictive value and area under the receiver operating characteristic (ROC) curve. Results Areas under the ROC curves (Az) of the five models for predicting a CD4 + T lymphocyte count 〈 350 cells/μL were 0. 698, 0. 697, 0. 706, 0. 746 and 0. 745, respectively,and those for predicting a CD4 + T lymphocyte count 〈500 cells/μL were 0. 735,0. 737,0. 724, 0. 707 and 0. 714 ,respectively. All of them were greater than those (0. 657 and 0. 691 ,respectively) for TLC as a stand-alone predictor. Conclusions The data mining models for predicting CD4 + T lymphocyte count have a high performance, and different models for different prediction tasks may have different performances.
出处
《北京生物医学工程》
2013年第5期479-484,共6页
Beijing Biomedical Engineering
基金
北京市科委重大专项课题(D0906003040591)资助