摘要
考虑样本数据集的差异性和相关性对疾病预测结果有着直接影响,提出一种基于多特征属性患者相似的糖尿病早期预测方法,根据患者之间特征具有相似性这一特点,对患者特征进行混合属性相似预分组,再把分组结果导入随机森林分类器进行疾病预测。首先以临床概念作为患者的特征项,通过聚类定量化分析不同特征属性类型间的距离来度量患者之间的混合相似度,根据患者混合相似度将患者集预分组为多个患者相似组。最后以随机森林分类器对相似组进行细分类,得到最终的疾病预测结果,该结果与基于全样本数据的随机森林分类结果相比,分类准确率提高了8.3%;与基于单一属性相似组的随机森林分类结果相比,分类准确率提高了5.1%。结果表明:所提方法具有较高的预测准确率,可为糖尿病诊断预测提供支持。
Considering that the difference and correlation of sample data sets has a direct impact on disease prediction results,a method for early prediction of diabetes based on the similarity of patients with multi-feature attributes was proposed.According to the characteristics of the similarity between patients,the characteristics of patients were mixed.The attributes were similar to pre-grouping,and then the grouping results were imported into the random forest classifier for disease prediction.Firstly,the clinical concept was used as the patient's feature item,and the distance between different feature attribute types was measured by clustering and quantitative analysis to measure the mixed similarity between patients,and the patient set was pre-grouped into multiple patient similar groups according to the mixed similarity of patients.Finally,a random forest classifier was used to subdivide the similar groups to obtain the final disease prediction result.Compared with the random forest classification result based on the full sample data,the classification accuracy is increased by 8.3%.Compared with the random forest classification results based on a single attribute similarity group,the classification accuracy rate is increased by 5.1%.The results show that the proposed method has a high prediction accuracy rate and can provide support for the diagnosis and prediction of diabetes.
作者
乔瀚
容芷君
许莹
但斌斌
赵慧
QIAO Han;RONG Zhi-jun;XU Ying;DAN Bin-bin;ZHAO Hui(Department of Industrial Engineering, Wuhan University of Science and Technology, Wuhan 430081, China;Wuhan Fifth Hospital, Wuhan 430050, China)
出处
《科学技术与工程》
北大核心
2021年第36期15497-15502,共6页
Science Technology and Engineering
基金
武汉市科技局企业技术创新项目(201901070211288)。
关键词
患者相似性
特征属性
聚类
分类
糖尿病预测
patient similarity
characteristic attribute
clustering
classification
diabetes prediction