Objective:To determine the most influential data features and to develop machine learning approaches that best predict hospital readmissions among patients with diabetes.Methods:In this retrospective cohort study,we s...Objective:To determine the most influential data features and to develop machine learning approaches that best predict hospital readmissions among patients with diabetes.Methods:In this retrospective cohort study,we surveyed patient statistics and performed feature analysis to identify the most influential data features associated with readmissions.Classification of all-cause,30-day readmission outcomes were modeled using logistic regression,artificial neural network,and Easy Ensemble.F1 statistic,sensitivity,and positive predictive value were used to evaluate the model performance.Results:We identified 14 most influential data features(4 numeric features and 10 categorical features)and evaluated 3 machine learning models with numerous sampling methods(oversampling,undersampling,and hybrid techniques).The deep learning model offered no improvement over traditional models(logistic regression and Easy Ensemble)for predicting readmission,whereas the other two algorithms led to much smaller differences between the training and testing datasets.Conclusions:Machine learning approaches to record electronic health data offer a promising method for improving readmission prediction in patients with diabetes.But more work is needed to construct datasets with more clinical variables beyond the standard risk factors and to fine-tune and optimize machine learning models.展开更多
Electronic Health Records(EHRs)are the digital form of patients’medical reports or records.EHRs facilitate advanced analytics and aid in better decision-making for clinical data.Medical data are very complicated and ...Electronic Health Records(EHRs)are the digital form of patients’medical reports or records.EHRs facilitate advanced analytics and aid in better decision-making for clinical data.Medical data are very complicated and using one classification algorithm to reach good results is difficult.For this reason,we use a combination of classification techniques to reach an efficient and accurate classification model.This model combination is called the Ensemble model.We need to predict new medical data with a high accuracy value in a small processing time.We propose a new ensemble model MDRL which is efficient with different datasets.The MDRL gives the highest accuracy value.It saves the processing time instead of processing four different algorithms sequentially;it executes the four algorithms in parallel.We implement five different algorithms on five variant datasets which are Heart Disease,Health General,Diabetes,Heart Attack,and Covid-19 Datasets.The four algorithms are Random Forest(RF),Decision Tree(DT),Logistic Regression(LR),and Multi-layer Perceptron(MLP).In addition to MDRL(our proposed ensemble model)which includes MLP,DT,RF,and LR together.From our experiments,we conclude that our ensemble model has the best accuracy value for most datasets.We reach that the combination of the Correlation Feature Selection(CFS)algorithm and our ensemble model is the best for giving the highest accuracy value.The accuracy values for our ensemble model based on CFS are 98.86,97.96,100,99.33,and 99.37 for heart disease,health general,Covid-19,heart attack,and diabetes datasets respectively.展开更多
Effective storage of healthcare information is the foundation of the rapid development of electronic health recorder (EHR). This paper presents a research on the data model of EHR storage, focusing on solving the co...Effective storage of healthcare information is the foundation of the rapid development of electronic health recorder (EHR). This paper presents a research on the data model of EHR storage, focusing on solving the complex and abstract information model and data types in HL7 V3(Health Level 7 Version 3.0) as well as HL7 localizing storage. Using health-care information exchange and sharing standards may settle the problem of interoperability between heterogeneous systems. HL7 is the most widely accepted and used standard. HL7 standardizes the information format in the process of transmission. Nevertheless, it can not guide the storage of health-care data directly. HL7 V3 developed an abstract information model-reference information model (RIM) and data types in it are complex. In this paper, we propose an approach on converting from the abstract HL7 V3 information model to the relational data model. Our approach resolves RIM's complex relationships and its data types and localizes the HL7 V3.展开更多
Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and var...Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and varied in their structure and components, continued innovation of modelling strategies is required to identify architectures that can best model outcomes. In this work, we trained a Heterogeneous Graph Model(HGM) on electronic health record(EHR) data and used the resulting embedding vector as additional information added to a Convolutional Neural Network(CNN) model for predicting in-hospital mortality. We show that the additional information provided by including time as a vector in the embedding captured the relationships between medical concepts, lab tests, and diagnoses, which enhanced predictive performance. We found that adding HGM to a CNN model increased the mortality prediction accuracy up to 4%. This framework served as a foundation for future experiments involving different EHR data types on important healthcare prediction tasks.展开更多
基金supported in part by the Key Research and Development Program for Guangdong Province(No.2019B010136001)in part by Hainan Major Science and Technology Projects(No.ZDKJ2019010)+3 种基金in part by the National Key Research and Development Program of China(No.2016YFB0800803 and No.2018YFB1004005)in part by National Natural Science Foundation of China(No.81960565,No.81260139,No.81060073,No.81560275,No.61562021,No.30560161 and No.61872110)in part by Hainan Special Projects of Social Development(No.ZDYF2018103 and No.2015SF 39)in part by Hainan Association for Academic Excellence Youth Science and Technology Innovation Program(No.201515)
文摘Objective:To determine the most influential data features and to develop machine learning approaches that best predict hospital readmissions among patients with diabetes.Methods:In this retrospective cohort study,we surveyed patient statistics and performed feature analysis to identify the most influential data features associated with readmissions.Classification of all-cause,30-day readmission outcomes were modeled using logistic regression,artificial neural network,and Easy Ensemble.F1 statistic,sensitivity,and positive predictive value were used to evaluate the model performance.Results:We identified 14 most influential data features(4 numeric features and 10 categorical features)and evaluated 3 machine learning models with numerous sampling methods(oversampling,undersampling,and hybrid techniques).The deep learning model offered no improvement over traditional models(logistic regression and Easy Ensemble)for predicting readmission,whereas the other two algorithms led to much smaller differences between the training and testing datasets.Conclusions:Machine learning approaches to record electronic health data offer a promising method for improving readmission prediction in patients with diabetes.But more work is needed to construct datasets with more clinical variables beyond the standard risk factors and to fine-tune and optimize machine learning models.
文摘Electronic Health Records(EHRs)are the digital form of patients’medical reports or records.EHRs facilitate advanced analytics and aid in better decision-making for clinical data.Medical data are very complicated and using one classification algorithm to reach good results is difficult.For this reason,we use a combination of classification techniques to reach an efficient and accurate classification model.This model combination is called the Ensemble model.We need to predict new medical data with a high accuracy value in a small processing time.We propose a new ensemble model MDRL which is efficient with different datasets.The MDRL gives the highest accuracy value.It saves the processing time instead of processing four different algorithms sequentially;it executes the four algorithms in parallel.We implement five different algorithms on five variant datasets which are Heart Disease,Health General,Diabetes,Heart Attack,and Covid-19 Datasets.The four algorithms are Random Forest(RF),Decision Tree(DT),Logistic Regression(LR),and Multi-layer Perceptron(MLP).In addition to MDRL(our proposed ensemble model)which includes MLP,DT,RF,and LR together.From our experiments,we conclude that our ensemble model has the best accuracy value for most datasets.We reach that the combination of the Correlation Feature Selection(CFS)algorithm and our ensemble model is the best for giving the highest accuracy value.The accuracy values for our ensemble model based on CFS are 98.86,97.96,100,99.33,and 99.37 for heart disease,health general,Covid-19,heart attack,and diabetes datasets respectively.
基金supported by Dongguan City Medical and Health Research Project under Grant No. 200910515018Guangdong Provincial Department Cooperation Project under Grant No. 2009B090300362+1 种基金Sichuan Province Science & Technology Pillar Program under Grant No.2010SZ0062the Fundamental Research Funds for the Central Universities under Grant No. ZYGX2009X016
文摘Effective storage of healthcare information is the foundation of the rapid development of electronic health recorder (EHR). This paper presents a research on the data model of EHR storage, focusing on solving the complex and abstract information model and data types in HL7 V3(Health Level 7 Version 3.0) as well as HL7 localizing storage. Using health-care information exchange and sharing standards may settle the problem of interoperability between heterogeneous systems. HL7 is the most widely accepted and used standard. HL7 standardizes the information format in the process of transmission. Nevertheless, it can not guide the storage of health-care data directly. HL7 V3 developed an abstract information model-reference information model (RIM) and data types in it are complex. In this paper, we propose an approach on converting from the abstract HL7 V3 information model to the relational data model. Our approach resolves RIM's complex relationships and its data types and localizes the HL7 V3.
文摘Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and varied in their structure and components, continued innovation of modelling strategies is required to identify architectures that can best model outcomes. In this work, we trained a Heterogeneous Graph Model(HGM) on electronic health record(EHR) data and used the resulting embedding vector as additional information added to a Convolutional Neural Network(CNN) model for predicting in-hospital mortality. We show that the additional information provided by including time as a vector in the embedding captured the relationships between medical concepts, lab tests, and diagnoses, which enhanced predictive performance. We found that adding HGM to a CNN model increased the mortality prediction accuracy up to 4%. This framework served as a foundation for future experiments involving different EHR data types on important healthcare prediction tasks.