摘要
近年来,随着健康医疗大数据平台的快速发展,越来越多的体检数据整合到大数据平台上。如何挖掘并利用健康医疗海量数据提高医疗服务质量,提升医患沟通水平是一个全新的挑战。文中应用机器学习算法对45,374个体检用户,共3,529,829条体检数据进行分析数据的探索性分析和特征工程。在个人信用风险评分模型的基础上,将预测模型由梯度集成决策树改进为LASSO回归模型,增加评分卡的可解释性,同时结合体检的应用场景和输入数据,建立体检评分模型。实验结果表明在体检大数据集上,健康指数分数基本上服从正态分布,符合线性回归模型的先验假设。该评分模型同时具有稳健性和区分度的特点,可综合各项体检指标,较为客观地描述用户身体健康状况水平,降低体检用户同医生的沟通成本,督促用户更加关注身体整体健康状况水平。
In recent years, with the rapid development of health care big data platform, more and more phys-ical examination data are integrated into the big data platform. A new challenge is how to improve the quality of medical services by using massive medical data. In this paper, we use machine learn-ing algorithm to visually analyze 3,529,829 physical examination data of 45,374 physical examina-tion users. On the basis of personal credit risk scoring model, the prediction model is improved from gradient integrated decision tree to lasso regression model, which increases the interpretabil-ity of scorecard. At the same time, combined with the application scenarios and input data of physi-cal examination, we established the health score model. The health index score basically obeys normal distribution, which is consistent with the prior hypothesis of the linear regression model It can integrate various physical examination indicators, objectively describe the health status of us-ers, reduce the communication cost between users and doctors, and urge users to pay more atten-tion to the overall health status.
出处
《数据挖掘》
2021年第1期1-10,共10页
Hans Journal of Data Mining