摘要
利用K近邻算法预测心衰患者死亡率是一种积极影响患者健康的重要手段。但K近邻算法难以利用单一距离准确度量带有离散和连续型变量的样本距离,同时K近邻所采用的投票法不能衡量距离远近对于待测样本类别的影响。针对上述问题,提出了一种混合加权距离的K近邻死亡率评估模型。首先,利用卡方检测和基于L1正则化的逻辑斯蒂回归对特征的筛选和排序。然后,应用值差度量和曼哈顿距离混合计算样本间的距离。最后,采用softmin函数对距离加权处理后,输出最终待测样本类别。通过MIMIC-Ⅲ公开数据库的2 743位心衰患者数据实验验证,改进的算法对于评估死亡率具有良好性能。
Using K-nearest neighbor(KNN) to predict mortality is an important mean to positively affect patient health. However, it is difficult for KNN to use a single distance to accurately measure the distance of samples with discrete and continuous variables. Furthermore, the voting method applied in KNN cannot measure the impact of distance on results. To solve above problems, a KNN mortality prediction model with mixed weighted distance was proposed. First, the chi-square test and logistic regression with L1 regularization are used for feature selection and ranking. Next, a mixture of Value Difference Metric(VDM) and Manhattan distance is applied to calculate the distance. Then, the softmin function is chosen to weight the distance and finally give the category for testing sample. In the end, the data of 2 743 htart failure patients in the MIMIC-Ⅲ public database were experimentally evaluated, which verifies that the improved algorithm has a good performance in mortality prediction.
作者
付健
李灯熬
赵菊敏
FU Jian;LI Dengao;ZHAO Jumin(College of Data Science,Taiyuan University of Technology,Jinzhong 030600,China;College of Information and Computer,Taiyuan University of Technology,Jinzhong 030600,China)
出处
《太原理工大学学报》
CAS
北大核心
2022年第5期933-939,共7页
Journal of Taiyuan University of Technology
基金
国家重大科研仪器研制资助项目(6202780085)
国家自然科学基金资助项目(62076177、61772358)
山西省关键核心技术和共性技术研发专项资助项目(2020XXX007)。
关键词
心力衰竭
混合加权距离
K近邻算法
死亡率预测
heart failure
mixed weighted distance
K-nearest neighbor algorithm
mortality prediction