Partial Differential Equation(PDE)is among the most fundamental tools employed to model dynamic systems.Existing PDE modeling methods are typically derived from established knowledge and known phenomena,which are time...Partial Differential Equation(PDE)is among the most fundamental tools employed to model dynamic systems.Existing PDE modeling methods are typically derived from established knowledge and known phenomena,which are time-consuming and labor-intensive.Recently,discovering governing PDEs from collected actual data via Physics Informed Neural Networks(PINNs)provides a more efficient way to analyze fresh dynamic systems and establish PEDmodels.This study proposes Sequentially Threshold Least Squares-Lasso(STLasso),a module constructed by incorporating Lasso regression into the Sequentially Threshold Least Squares(STLS)algorithm,which can complete sparse regression of PDE coefficients with the constraints of l0 norm.It further introduces PINN-STLasso,a physics informed neural network combined with Lasso sparse regression,able to find underlying PDEs from data with reduced data requirements and better interpretability.In addition,this research conducts experiments on canonical inverse PDE problems and compares the results to several recent methods.The results demonstrated that the proposed PINN-STLasso outperforms other methods,achieving lower error rates even with less data.展开更多
To study the dynamic behavior of a process,time-resolved data are collected at different time instants during each of a series of experiments,which are usually designed with the design of experiments or the design of ...To study the dynamic behavior of a process,time-resolved data are collected at different time instants during each of a series of experiments,which are usually designed with the design of experiments or the design of dynamic experiments methodologies.For utilizing such time-resolved data to model the dynamic behavior,dynamic response surface methodology(DRSM),a datadriven modeling method,has been proposed.Two approaches can be adopted in the estimation of the model parameters:stepwise regression,used in several of previous publications,and Lasso regression,which is newly incorporated in this paper for the estimation of DRSM models.Here,we show that both approaches yield similarly accurate models,while the computational time of Lasso is on average two magnitude smaller.Two case studies are performed to show the advantages of the proposed method.In the first case study,where the concentrations of different species are modeled directly,DRSM method provides more accurate models compared to the models in the literature.The second case study,where the reaction extents are modeled instead of the species concentrations,illustrates the versatility of the DRSM methodology.Therefore,DRSM with Lasso regression can provide faster and more accurate datadriven models for a variety of organic synthesis datasets.展开更多
Background:Colorectal cancer(CRC)is a leading cause of cancer mortality globally.This study aims to develop a prognostic model based on disulfidptosis-related genes to assess survival outcomes in CRC,highlighting the ...Background:Colorectal cancer(CRC)is a leading cause of cancer mortality globally.This study aims to develop a prognostic model based on disulfidptosis-related genes to assess survival outcomes in CRC,highlighting the tumor microenvironment’s role.Methods:The thought of traditional Chinese medicine syndrome differentiation and treatment runs through the whole study.We analyzed CRC tissue data from The Cancer Genome Atlas and the Gene Expression Omnibus using single-sample gene set enrichment and weighted gene correlation network analyses to identify prognostic markers and evaluate immune infiltration.We also investigated predictive drug sensitivities.Results:We identified seven disulfidptosis-related markers–complement C1q A chain(C1QA),solute carrier family 11 member 1(SLC11A1),cluster of differentiation 36(CD36),cluster of differentiation 6(CD6),interleukin 1 receptor associated kinase 3(IRAK3),S100 calcium binding protein A8(S100A8),and CD8 subunit alpha(CD8A)–that significantly influence prognosis.Patients classified in the low-risk group demonstrated improved overall survival compared to those in the high-risk group across training(P=0.0026)and validation cohorts(P=0.032).Differential gene expression was significant in the high-risk group(P<0.001),and prevalent mutations included APC regulator of WNT signaling pathway(APC),tumor protein P53(TP53),Titin(TTN),and Kirsten rat sarcoma viral oncogene(KRAS).The risk score correlated linearly with tumor microenvironment attributes.The results of drug analysis showed that some traditional drugs may have anticancer effects through the vertical action of disulfidptosis.Conclusion:Our prognostic model,integrating seven disulfidptosis-related genes,categorizes CRC patients by survival probability and underscores these genes as potential biomarkers linked to the tumor microenvironment.These findings support their use in refining therapeutic strategies for CRC.展开更多
Soybean frogeye leaf spot(FLS) disease is a global disease affecting soybean yield, especially in the soybean growing area of Heilongjiang Province. In order to realize genomic selection breeding for FLS resistance of...Soybean frogeye leaf spot(FLS) disease is a global disease affecting soybean yield, especially in the soybean growing area of Heilongjiang Province. In order to realize genomic selection breeding for FLS resistance of soybean, least absolute shrinkage and selection operator(LASSO) regression and stepwise regression were combined, and a genomic selection model was established for 40 002 SNP markers covering soybean genome and relative lesion area of soybean FLS. As a result, 68 molecular markers controlling soybean FLS were detected accurately, and the phenotypic contribution rate of these markers reached 82.45%. In this study, a model was established, which could be used directly to evaluate the resistance of soybean FLS and to select excellent offspring. This research method could also provide ideas and methods for other plants to breeding in disease resistance.展开更多
Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is...Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is increasing year by year,so the in-depth study of the pathogenesis of depression,and the development of depression risk prediction model is becoming increasingly important.Method:This study data is derived from the 2017–2018 follow-up data from the National Health and Nutrition Examination Survey database,a publicly available database using a multi-stage,hierarchical,clustered,probability sampling design to determine a nationally representative sample of non-institutionalized US civilians.Participants completed home interviews,laboratory measurements,and a physical examination.Details of the survey design have been published previously.This study evaluated the risk factors for the occurrence of depression from this study from multiple variables such as age,sex,and combined complications.Four machine learning algorithms(logistic regression,Lasso regression,support vector machine,random forest)were used to establish predictive classification models and compare the area under the subject operating feature curve and accuracy.The dataset was validated using a 10-fold cross-validation.Result:We excluded the invalid samples for 815 included samples,of which 570 cases were divided into the validation set and 245 cases were divided into the training set.The area under the curve(AUC)of Nomogram establishing risk of depression based on logistic regression was 0.73.Among the three machine learning models,the Lasso regression-based model AUC was 0.548,a mean AUC for support vector machines was 0.695,and a random forest AUC of 0.613.The support vector machines-based model predicted the best performance compared to other machine models.Conclusion:Random forest-based prediction models are able to assist clinicians in providing decision support when it is difficult to give an exact diagnosis.The model has good clinical utility and facilitates clinicians to identify high-risk patients and perform individualized treatment.The established four models of logistic regression,Lasso regression,support vector machine,and random forest all have good predictive power.展开更多
The theory of tune feedback correction and the principle of a feedback algorithm based on machine learning are introduced,with a focus on the application of lasso regression for tune feedback correction.Simulation ver...The theory of tune feedback correction and the principle of a feedback algorithm based on machine learning are introduced,with a focus on the application of lasso regression for tune feedback correction.Simulation verification and online feedback correction results are presented.The results show that,after applying machine learning,the feedback accuracy of the tune feedback system was higher,and the betatron tune stability was further improved.展开更多
There are many influencing factors of fiscal revenue,and traditional forecasting methods cannot handle the feature dimensions well,which leads to serious over-fitting of the forecast results and unable to make a good ...There are many influencing factors of fiscal revenue,and traditional forecasting methods cannot handle the feature dimensions well,which leads to serious over-fitting of the forecast results and unable to make a good estimate of the true future trend.The grey neural network model fused with Lasso regression is a comprehensive prediction model that combines the grey prediction model and the BP neural network model after dimensionality reduction using Lasso.It can reduce the dimensionality of the original data,make separate predictions for each explanatory variable,and then use neural networks to make multivariate predictions,thereby making up for the shortcomings of traditional methods of insufficient prediction accuracy.In this paper,we took the financial revenue data of China’s Hunan Province from 2005 to 2019 as the object of analysis.Firstly,we used Lasso regression to reduce the dimensionality of the data.Because the grey prediction model has the excellent predictive performance for small data volumes,then we chose the grey prediction model to obtain the predicted values of all explanatory variables in 2020,2021 by using the data of 2005–2019.Finally,considering that fiscal revenue is affected by many factors,we applied the BP neural network,which has a good effect on multiple inputs,to make the final forecast of fiscal revenue.The experimental results show that the combined model has a good effect in financial revenue forecasting.展开更多
To protect and promote the originality and authenticity of mountain foodstuffs, the European Union set Regulation No 1151/2012 to create the optional quality term "mountain product". Our research aimed at ex...To protect and promote the originality and authenticity of mountain foodstuffs, the European Union set Regulation No 1151/2012 to create the optional quality term "mountain product". Our research aimed at exploring the attractiveness of the mountain product label for consumers, considering both attitude towards the label itself and purchase intentions. We propose a model to investigate relationships between four latent constructs-mountain attractiveness, mountain food attractiveness, attitude towards the mountain product label, and purchase intention-which have been tested, thus confirming the statistical relevance of the relationships. All 47 items selected for describing the latent constructs are suitable for this purpose. Ridge and LASSO results also show that 17 items of the first three constructs are relevant in explaining purchase intentions. Some contextual variables, such as age, income, geographical origin of consumers, and knowledge of mountain products and mountains for tourism purposes, can positively influence consumers’ behavior. These findings could support the design of mountain development strategies, in particular marketing actions for both the product and the territory.展开更多
This study attempts to investigate the factors determining COVID-19 deaths during the pandemic across countries by employing a rich dataset sourced from 94 countries updated till 6 February,2022.For empirical analysis...This study attempts to investigate the factors determining COVID-19 deaths during the pandemic across countries by employing a rich dataset sourced from 94 countries updated till 6 February,2022.For empirical analysis,the study makes use of cross-sectional linear regression technique in the first part and after required diagnostic tests use 2SLS regression technique for correcting possible endogeneity bias in the second part.Findings from the study indicate that factors like total reported cases,population size,population over 70 years of age,extreme poverty,and human development index play significant role in determining COVID-19-related death.Further,to check the robustness of the findings the present study employed LASSO regression.Findings from the study highlight the possibility of government intervention to devise appropriate policies to control COVID-related incidence and death.展开更多
We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and...We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and the ideas of informative subset idea. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. Simulation study and real data analysis are conducted to evaluate the finite sample performance of the proposed approach.展开更多
This paper provides the first empirical study on bond defaults in the Chinese market.It overcomes the deficiencies of existing methods,which suffer from lack of actual default data for back testing.With newly availabl...This paper provides the first empirical study on bond defaults in the Chinese market.It overcomes the deficiencies of existing methods,which suffer from lack of actual default data for back testing.With newly available bond default data,we analyze the roles of market variables against accounting variables under various models.While we find that Merton's market-based structural model and KMV's Distance to Default exhibit languid discriminating power compared with hazard models that have carefully constructed predictors,other market variables carry significant information about bond defaults and could help improve on models with only the accounting variables.This implies that the collective intelligence of the market could somehow mitigate the distortion caused by misreported accounting information.Further,model performance can be significantly improved by adding predicting variables that link an individual financial measure to the broader market performance,such as the relative margin—a business environment proxy introduced in this study.We not only shed light on the default behavior of the Chinese bond market,but also provide a promising approach to improve the variable selection process.展开更多
Background:Novel coronavirus disease 2019(COVID-19)is an ongoing global pandemic with high mortality.Although several studies have reported different risk factors for mortality in patients based on traditional analyti...Background:Novel coronavirus disease 2019(COVID-19)is an ongoing global pandemic with high mortality.Although several studies have reported different risk factors for mortality in patients based on traditional analytics,few studies have used artificial intelligence(AI)algorithms.This study investigated prognostic factors for COVID-19 patients using AI methods.Methods:COVID-19 patients who were admitted in Wuhan Infectious Diseases Hospital from December 29,2019 to March 2,2020 were included.The whole cohort was randomly divided into training and testing sets at a 6:4 ratio.Demographic and clinical data were analyzed to identify predictors of mortality using least absolute shrinkage and selection operator(LASSO)regression and LASSO-based artificial neural network(ANN)models.The predictive performance of the models was evaluated using receiver operating characteristic(ROC)curve analysis.Results:A total of 1145 patients(610 male,53.3%)were included in the study.Of the 1145 patients,704 were assigned to the training set and 441 were assigned to the testing set.The median age of the patients was 57 years(range:47-66 years).Severity of illness,age,platelet count,leukocyte count,prealbumin,C-reactive protein(CRP),total bilirubin,Acute Physiology and Chronic Health Evaluation(APACHE)II score,and Sequential Organ Failure Assessment(SOFA)score were identified as independent prognostic factors for mortality.Incorporating these nine factors into the LASSO regression model yielded a correct classification rate of 0.98,with area under the ROC curve(AUC)values of 0.980 and 0.990 in the training and testing cohorts,respectively.Incorporating the same factors into the LASSO-based ANN model yielded a correct classification rate of 0.990,with an AUC of 0.980 in both the training and testing cohorts.Conclusions:Both the LASSO regression and LASSO-based ANN model accurately predicted the clinical outcome of patients with COVID-19.Severity of illness,age,platelet count,leukocyte count,prealbumin,CRP,total bilirubin,APACHE II score,and SOFA score were identified as prognostic factors for mortality in patients with COVID-19.展开更多
文摘Partial Differential Equation(PDE)is among the most fundamental tools employed to model dynamic systems.Existing PDE modeling methods are typically derived from established knowledge and known phenomena,which are time-consuming and labor-intensive.Recently,discovering governing PDEs from collected actual data via Physics Informed Neural Networks(PINNs)provides a more efficient way to analyze fresh dynamic systems and establish PEDmodels.This study proposes Sequentially Threshold Least Squares-Lasso(STLasso),a module constructed by incorporating Lasso regression into the Sequentially Threshold Least Squares(STLS)algorithm,which can complete sparse regression of PDE coefficients with the constraints of l0 norm.It further introduces PINN-STLasso,a physics informed neural network combined with Lasso sparse regression,able to find underlying PDEs from data with reduced data requirements and better interpretability.In addition,this research conducts experiments on canonical inverse PDE problems and compares the results to several recent methods.The results demonstrated that the proposed PINN-STLasso outperforms other methods,achieving lower error rates even with less data.
基金Yachao Dong is grateful for the financial support of Fundamental Research Funds for the Central Universities(Grant No.DUT20RC(3)070).
文摘To study the dynamic behavior of a process,time-resolved data are collected at different time instants during each of a series of experiments,which are usually designed with the design of experiments or the design of dynamic experiments methodologies.For utilizing such time-resolved data to model the dynamic behavior,dynamic response surface methodology(DRSM),a datadriven modeling method,has been proposed.Two approaches can be adopted in the estimation of the model parameters:stepwise regression,used in several of previous publications,and Lasso regression,which is newly incorporated in this paper for the estimation of DRSM models.Here,we show that both approaches yield similarly accurate models,while the computational time of Lasso is on average two magnitude smaller.Two case studies are performed to show the advantages of the proposed method.In the first case study,where the concentrations of different species are modeled directly,DRSM method provides more accurate models compared to the models in the literature.The second case study,where the reaction extents are modeled instead of the species concentrations,illustrates the versatility of the DRSM methodology.Therefore,DRSM with Lasso regression can provide faster and more accurate datadriven models for a variety of organic synthesis datasets.
文摘Background:Colorectal cancer(CRC)is a leading cause of cancer mortality globally.This study aims to develop a prognostic model based on disulfidptosis-related genes to assess survival outcomes in CRC,highlighting the tumor microenvironment’s role.Methods:The thought of traditional Chinese medicine syndrome differentiation and treatment runs through the whole study.We analyzed CRC tissue data from The Cancer Genome Atlas and the Gene Expression Omnibus using single-sample gene set enrichment and weighted gene correlation network analyses to identify prognostic markers and evaluate immune infiltration.We also investigated predictive drug sensitivities.Results:We identified seven disulfidptosis-related markers–complement C1q A chain(C1QA),solute carrier family 11 member 1(SLC11A1),cluster of differentiation 36(CD36),cluster of differentiation 6(CD6),interleukin 1 receptor associated kinase 3(IRAK3),S100 calcium binding protein A8(S100A8),and CD8 subunit alpha(CD8A)–that significantly influence prognosis.Patients classified in the low-risk group demonstrated improved overall survival compared to those in the high-risk group across training(P=0.0026)and validation cohorts(P=0.032).Differential gene expression was significant in the high-risk group(P<0.001),and prevalent mutations included APC regulator of WNT signaling pathway(APC),tumor protein P53(TP53),Titin(TTN),and Kirsten rat sarcoma viral oncogene(KRAS).The risk score correlated linearly with tumor microenvironment attributes.The results of drug analysis showed that some traditional drugs may have anticancer effects through the vertical action of disulfidptosis.Conclusion:Our prognostic model,integrating seven disulfidptosis-related genes,categorizes CRC patients by survival probability and underscores these genes as potential biomarkers linked to the tumor microenvironment.These findings support their use in refining therapeutic strategies for CRC.
基金Supported by the National Key Research and Development Program of China(2021YFD1201103-01-05)。
文摘Soybean frogeye leaf spot(FLS) disease is a global disease affecting soybean yield, especially in the soybean growing area of Heilongjiang Province. In order to realize genomic selection breeding for FLS resistance of soybean, least absolute shrinkage and selection operator(LASSO) regression and stepwise regression were combined, and a genomic selection model was established for 40 002 SNP markers covering soybean genome and relative lesion area of soybean FLS. As a result, 68 molecular markers controlling soybean FLS were detected accurately, and the phenotypic contribution rate of these markers reached 82.45%. In this study, a model was established, which could be used directly to evaluate the resistance of soybean FLS and to select excellent offspring. This research method could also provide ideas and methods for other plants to breeding in disease resistance.
文摘Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is increasing year by year,so the in-depth study of the pathogenesis of depression,and the development of depression risk prediction model is becoming increasingly important.Method:This study data is derived from the 2017–2018 follow-up data from the National Health and Nutrition Examination Survey database,a publicly available database using a multi-stage,hierarchical,clustered,probability sampling design to determine a nationally representative sample of non-institutionalized US civilians.Participants completed home interviews,laboratory measurements,and a physical examination.Details of the survey design have been published previously.This study evaluated the risk factors for the occurrence of depression from this study from multiple variables such as age,sex,and combined complications.Four machine learning algorithms(logistic regression,Lasso regression,support vector machine,random forest)were used to establish predictive classification models and compare the area under the subject operating feature curve and accuracy.The dataset was validated using a 10-fold cross-validation.Result:We excluded the invalid samples for 815 included samples,of which 570 cases were divided into the validation set and 245 cases were divided into the training set.The area under the curve(AUC)of Nomogram establishing risk of depression based on logistic regression was 0.73.Among the three machine learning models,the Lasso regression-based model AUC was 0.548,a mean AUC for support vector machines was 0.695,and a random forest AUC of 0.613.The support vector machines-based model predicted the best performance compared to other machine models.Conclusion:Random forest-based prediction models are able to assist clinicians in providing decision support when it is difficult to give an exact diagnosis.The model has good clinical utility and facilitates clinicians to identify high-risk patients and perform individualized treatment.The established four models of logistic regression,Lasso regression,support vector machine,and random forest all have good predictive power.
基金supported by the National Natural Science Foundation of China (No. 11975227)Hefei Science Center,CAS (No.2019HSC-KPRD003)
文摘The theory of tune feedback correction and the principle of a feedback algorithm based on machine learning are introduced,with a focus on the application of lasso regression for tune feedback correction.Simulation verification and online feedback correction results are presented.The results show that,after applying machine learning,the feedback accuracy of the tune feedback system was higher,and the betatron tune stability was further improved.
基金This research was funded by the National Natural Science Foundation of China(No.61304208)Scientific Research Fund of Hunan Province Education Department(18C0003)+2 种基金Research project on teaching reform in colleges and universities of Hunan Province Education Department(20190147)Changsha City Science and Technology Plan Program(K1501013-11)Hunan Normal University University-Industry Cooperation.This work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province,Open project,grant number 20181901CRP04.
文摘There are many influencing factors of fiscal revenue,and traditional forecasting methods cannot handle the feature dimensions well,which leads to serious over-fitting of the forecast results and unable to make a good estimate of the true future trend.The grey neural network model fused with Lasso regression is a comprehensive prediction model that combines the grey prediction model and the BP neural network model after dimensionality reduction using Lasso.It can reduce the dimensionality of the original data,make separate predictions for each explanatory variable,and then use neural networks to make multivariate predictions,thereby making up for the shortcomings of traditional methods of insufficient prediction accuracy.In this paper,we took the financial revenue data of China’s Hunan Province from 2005 to 2019 as the object of analysis.Firstly,we used Lasso regression to reduce the dimensionality of the data.Because the grey prediction model has the excellent predictive performance for small data volumes,then we chose the grey prediction model to obtain the predicted values of all explanatory variables in 2020,2021 by using the data of 2005–2019.Finally,considering that fiscal revenue is affected by many factors,we applied the BP neural network,which has a good effect on multiple inputs,to make the final forecast of fiscal revenue.The experimental results show that the combined model has a good effect in financial revenue forecasting.
基金financially supported by the Department of Agricultural,Food,Environmental and Animal Sciences,University of Udine,Italy。
文摘To protect and promote the originality and authenticity of mountain foodstuffs, the European Union set Regulation No 1151/2012 to create the optional quality term "mountain product". Our research aimed at exploring the attractiveness of the mountain product label for consumers, considering both attitude towards the label itself and purchase intentions. We propose a model to investigate relationships between four latent constructs-mountain attractiveness, mountain food attractiveness, attitude towards the mountain product label, and purchase intention-which have been tested, thus confirming the statistical relevance of the relationships. All 47 items selected for describing the latent constructs are suitable for this purpose. Ridge and LASSO results also show that 17 items of the first three constructs are relevant in explaining purchase intentions. Some contextual variables, such as age, income, geographical origin of consumers, and knowledge of mountain products and mountains for tourism purposes, can positively influence consumers’ behavior. These findings could support the design of mountain development strategies, in particular marketing actions for both the product and the territory.
基金This research is supported by a seed money Grant provided by Maulana Azad National Institute of Technology,Bhopal under the Grant No.:DeanR&C/1420.
文摘This study attempts to investigate the factors determining COVID-19 deaths during the pandemic across countries by employing a rich dataset sourced from 94 countries updated till 6 February,2022.For empirical analysis,the study makes use of cross-sectional linear regression technique in the first part and after required diagnostic tests use 2SLS regression technique for correcting possible endogeneity bias in the second part.Findings from the study indicate that factors like total reported cases,population size,population over 70 years of age,extreme poverty,and human development index play significant role in determining COVID-19-related death.Further,to check the robustness of the findings the present study employed LASSO regression.Findings from the study highlight the possibility of government intervention to devise appropriate policies to control COVID-related incidence and death.
基金supported by National Natural Science Foundation of China (Grant Nos. 11401383, 11301391 and 11271080)
文摘We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and the ideas of informative subset idea. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. Simulation study and real data analysis are conducted to evaluate the finite sample performance of the proposed approach.
文摘This paper provides the first empirical study on bond defaults in the Chinese market.It overcomes the deficiencies of existing methods,which suffer from lack of actual default data for back testing.With newly available bond default data,we analyze the roles of market variables against accounting variables under various models.While we find that Merton's market-based structural model and KMV's Distance to Default exhibit languid discriminating power compared with hazard models that have carefully constructed predictors,other market variables carry significant information about bond defaults and could help improve on models with only the accounting variables.This implies that the collective intelligence of the market could somehow mitigate the distortion caused by misreported accounting information.Further,model performance can be significantly improved by adding predicting variables that link an individual financial measure to the broader market performance,such as the relative margin—a business environment proxy introduced in this study.We not only shed light on the default behavior of the Chinese bond market,but also provide a promising approach to improve the variable selection process.
基金supported by the National Natural Science Foundation of China(Grant No.81,873,944 and 81,971,869)the Shanghai Science and Technology Commission(Grant No.20DZ2200500).
文摘Background:Novel coronavirus disease 2019(COVID-19)is an ongoing global pandemic with high mortality.Although several studies have reported different risk factors for mortality in patients based on traditional analytics,few studies have used artificial intelligence(AI)algorithms.This study investigated prognostic factors for COVID-19 patients using AI methods.Methods:COVID-19 patients who were admitted in Wuhan Infectious Diseases Hospital from December 29,2019 to March 2,2020 were included.The whole cohort was randomly divided into training and testing sets at a 6:4 ratio.Demographic and clinical data were analyzed to identify predictors of mortality using least absolute shrinkage and selection operator(LASSO)regression and LASSO-based artificial neural network(ANN)models.The predictive performance of the models was evaluated using receiver operating characteristic(ROC)curve analysis.Results:A total of 1145 patients(610 male,53.3%)were included in the study.Of the 1145 patients,704 were assigned to the training set and 441 were assigned to the testing set.The median age of the patients was 57 years(range:47-66 years).Severity of illness,age,platelet count,leukocyte count,prealbumin,C-reactive protein(CRP),total bilirubin,Acute Physiology and Chronic Health Evaluation(APACHE)II score,and Sequential Organ Failure Assessment(SOFA)score were identified as independent prognostic factors for mortality.Incorporating these nine factors into the LASSO regression model yielded a correct classification rate of 0.98,with area under the ROC curve(AUC)values of 0.980 and 0.990 in the training and testing cohorts,respectively.Incorporating the same factors into the LASSO-based ANN model yielded a correct classification rate of 0.990,with an AUC of 0.980 in both the training and testing cohorts.Conclusions:Both the LASSO regression and LASSO-based ANN model accurately predicted the clinical outcome of patients with COVID-19.Severity of illness,age,platelet count,leukocyte count,prealbumin,CRP,total bilirubin,APACHE II score,and SOFA score were identified as prognostic factors for mortality in patients with COVID-19.