The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. ...Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R<sup>2</sup> values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.展开更多
Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have dev...Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have developed from the logistic regression model,the geographical weighted logistic regression model,the Lasso regression model,the random forest model,and the support vector machine model based on historical forest fire data from 2000 to 2019 in Jilin Province.The models,along with a distribution map are presented in this paper to provide a theoretical basis for forest fire management in this area.Existing studies show that the prediction accuracies of the two machine learning models are higher than those of the three generalized linear regression models.The accuracies of the random forest model,the support vector machine model,geographical weighted logistic regression model,the Lasso regression model,and logistic model were 88.7%,87.7%,86.0%,85.0%and 84.6%,respectively.Weather is the main factor affecting forest fires,while the impacts of topography factors,human and social-economic factors on fire occurrence were similar.展开更多
This is an erratum to an already published paper named“Establishment of a prediction model for prehospital return of spontaneous circulation in out-ofhospital patients with cardiac arrest”.We found errors in the aff...This is an erratum to an already published paper named“Establishment of a prediction model for prehospital return of spontaneous circulation in out-ofhospital patients with cardiac arrest”.We found errors in the affiliated institution of the authors.We apologize for our unintentional mistake.Please note,these changes do not affect our results.展开更多
Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a n...Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.展开更多
Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two...Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two-part study, an ML approach is presented that offers accelerated digital design of Mg alloys. A systematic evaluation of four ML regression algorithms was explored to rationalise the complex relationships in Mg-alloy data and to capture the composition-processing-property patterns. Cross-validation and hold-out set validation techniques were utilised for unbiased estimation of model performance. Using atomic and thermodynamic properties of the alloys, feature augmentation was examined to define the most descriptive representation spaces for the alloy data. Additionally, a graphical user interface(GUI) webtool was developed to facilitate the use of the proposed models in predicting the mechanical properties of new Mg alloys. The results demonstrate that random forest regression model and neural network are robust models for predicting the ultimate tensile strength and ductility of Mg alloys, with accuracies of ~80% and 70% respectively. The developed models in this work are a step towards high-throughput screening of novel candidates for target mechanical properties and provide ML-guided alloy design.展开更多
Birch has long suffered from a lack of active forest management,leading many researchers to use mate-rial without a detailed management history.Data collected from three birch(Betula pendula Roth,B.pubescens Ehrh.)sit...Birch has long suffered from a lack of active forest management,leading many researchers to use mate-rial without a detailed management history.Data collected from three birch(Betula pendula Roth,B.pubescens Ehrh.)sites in southern Sweden were analyzed using regression analysis to detect any trends or differences in wood proper-ties that could be explained by stand history,tree age and stem form.All sites were genetics trials established in the same way.Estimates of acoustic velocity(AV)from non-destructive testing(NDT)and predicted AV had a higher correlation if data was pooled across sites and other stem form factors were considered.A subsample of stems had radial profiles of X-ray wood density and ring width by year created,and wood density was related to ring number from the pith and ring width.It seemed likely that wood density was negatively related to ring width for both birch species.Linear models had slight improvements if site and species were included,but only the youngest site with trees at age 15 had both birch species.This paper indicated that NDT values need to be considered separately,and any predictive models will likely be improved if they are specific to the site and birch species measured.展开更多
The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecologic...The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecological environment changes and even global changes.Based on field measurements,combined with Linear Regression(LR)model and Inverse Distance Weighing(IDW)method,this paper presents detailed analysis on the change history and trend of the shoreline and tidal flat in Bohai Bay.The shoreline faces a high erosion chance under the action of natural factors,while the tidal flat faces a different erosion and deposition patterns in Bohai Bay due to the impact of human activities.The implication of change rule for ecological protection and recovery is also discussed.Measures should be taken to protect the coastal ecological environment.The models used in this paper show a high correlation coefficient between observed and modeling data,which means that this method can be used to predict the changing trend of shoreline and tidal flat.The research results of present study can provide scientific supports for future coastal protection and management.展开更多
BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale c...BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale cannot be fully understood due to lack of information.AIM To identify key factors that may explain the variability in case lethality across countries.METHODS We identified 21 Potential risk factors for coronavirus disease 2019(COVID-19)case fatality rate for all the countries with available data.We examined univariate relationships of each variable with case fatality rate(CFR),and all independent variables to identify candidate variables for our final multiple model.Multiple regression analysis technique was used to assess the strength of relationship.RESULTS The mean of COVID-19 mortality was 1.52±1.72%.There was a statistically significant inverse correlation between health expenditure,and number of computed tomography scanners per 1 million with CFR,and significant direct correlation was found between literacy,and air pollution with CFR.This final model can predict approximately 97%of the changes in CFR.CONCLUSION The current study recommends some new predictors explaining affect mortality rate.Thus,it could help decision-makers develop health policies to fight COVID-19.展开更多
BACKGROUND Out-of-hospital cardiac arrest(OHCA)is a leading cause of death worldwide.AIM To explore factors influencing prehospital return of spontaneous circulation(P-ROSC)in patients with OHCA and develop a nomogram...BACKGROUND Out-of-hospital cardiac arrest(OHCA)is a leading cause of death worldwide.AIM To explore factors influencing prehospital return of spontaneous circulation(P-ROSC)in patients with OHCA and develop a nomogram prediction model.METHODS Clinical data of patients with OHCA in Shenzhen,China,from January 2012 to December 2019 were retrospectively analyzed.Least absolute shrinkage and selection operator(LASSO)regression and multivariate logistic regression were applied to select the optimal factors predicting P-ROSC in patients with OHCA.A nomogram prediction model was established based on these influencing factors.Discrimination and calibration were assessed using receiver operating charac-teristic(ROC)and calibration curves.Decision curve analysis(DCA)was used to evaluate the model’s clinical utility.RESULTS Among the included 2685 patients with OHCA,the P-ROSC incidence was 5.8%.LASSO and multivariate logistic regression analyses showed that age,bystander cardiopulmonary resuscitation(CPR),initial rhythm,CPR duration,ventilation mode,and pathogenesis were independent factors influencing P-ROSC in these patients.The area under the ROC was 0.963.The calibration plot demonstrated that the predicted P-ROSC model was concordant with the actual P-ROSC.The good clinical usability of the prediction model was confirmed using DCA.CONCLUSION The nomogram prediction model could effectively predict the probability of P-ROSC in patients with OHCA.展开更多
The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist...The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist and education-centric localities.In the upcoming arrival of electric kickboard vehicles,deploying a customer rental service is essential.Due to its freefloating nature,the shared electric kickboard is a common and practical means of transportation.Relocation plans for shared electric kickboards are required to increase the quality of service,and forecasting demand for their use in a specific region is crucial.Predicting demand accurately with small data is troublesome.Extensive data is necessary for training machine learning algorithms for effective prediction.Data generation is a method for expanding the amount of data that will be further accessible for training.In this work,we proposed a model that takes time-series customers’electric kickboard demand data as input,pre-processes it,and generates synthetic data according to the original data distribution using generative adversarial networks(GAN).The electric kickboard mobility demand prediction error was reduced when we combined synthetic data with the original data.We proposed Tabular-GAN-Modified-WGAN-GP for generating synthetic data for better prediction results.We modified The Wasserstein GAN-gradient penalty(GP)with the RMSprop optimizer and then employed Spectral Normalization(SN)to improve training stability and faster convergence.Finally,we applied a regression-based blending ensemble technique that can help us to improve performance of demand prediction.We used various evaluation criteria and visual representations to compare our proposed model’s performance.Synthetic data generated by our suggested GAN model is also evaluated.The TGAN-Modified-WGAN-GP model mitigates the overfitting and mode collapse problem,and it also converges faster than previous GAN models for synthetic data creation.The presented model’s performance is compared to existing ensemble and baseline models.The experimental findings imply that combining synthetic and actual data can significantly reduce prediction error rates in the mean absolute percentage error(MAPE)of 4.476 and increase prediction accuracy.展开更多
BACKGROUND Hepatocellular carcinoma(HCC)is difficult to diagnose with poor therapeutic effect,high recurrence rate and has a low survival rate.The survival of patients with HCC is closely related to the stage of diagn...BACKGROUND Hepatocellular carcinoma(HCC)is difficult to diagnose with poor therapeutic effect,high recurrence rate and has a low survival rate.The survival of patients with HCC is closely related to the stage of diagnosis.At present,no specific serolo-gical indicator or method to predict HCC,early diagnosis of HCC remains a challenge,especially in China,where the situation is more severe.AIM To identify risk factors associated with HCC and establish a risk prediction model based on clinical characteristics and liver-related indicators.METHODS The clinical data of patients in the Affiliated Hospital of North Sichuan Medical College from 2016 to 2020 were collected,using a retrospective study method.The results of needle biopsy or surgical pathology were used as the grouping criteria for the experimental group and the control group in this study.Based on the time of admission,the cases were divided into training cohort(n=1739)and validation cohort(n=467).Using HCC as a dependent variable,the research indicators were incorporated into logistic univariate and multivariate analysis.An HCC risk prediction model,which was called NSMC-HCC model,was then established in training cohort and verified in validation cohort.RESULTS Logistic univariate analysis showed that,gender,age,alpha-fetoprotein,and protein induced by vitamin K absence or antagonist-II,gamma-glutamyl transferase,aspartate aminotransferase and hepatitis B surface antigen were risk factors for HCC,alanine aminotransferase,total bilirubin and total bile acid were protective factors for HCC.When the cut-off value of the NSMC-HCC model joint prediction was 0.22,the area under receiver operating characteristic curve(AUC)of NSMC-HCC model in HCC diagnosis was 0.960,with sensitivity 94.40%and specificity 95.35%in training cohort,and AUC was 0.966,with sensitivity 90.00%and specificity 94.20%in validation cohort.In early-stage HCC diagnosis,the AUC of NSMC-HCC model was 0.946,with sensitivity 85.93%and specificity 93.62%in training cohort,and AUC was 0.947,with sensitivity 89.10%and specificity 98.49%in validation cohort.CONCLUSION The newly NSMC-HCC model was an effective risk prediction model in HCC and early-stage HCC diagnosis.展开更多
Intelligent healthcare networks represent a significant component in digital applications,where the requirements hold within quality-of-service(QoS)reliability and safeguarding privacy.This paper addresses these requi...Intelligent healthcare networks represent a significant component in digital applications,where the requirements hold within quality-of-service(QoS)reliability and safeguarding privacy.This paper addresses these requirements through the integration of enabler paradigms,including federated learning(FL),cloud/edge computing,softwaredefined/virtualized networking infrastructure,and converged prediction algorithms.The study focuses on achieving reliability and efficiency in real-time prediction models,which depend on the interaction flows and network topology.In response to these challenges,we introduce a modified version of federated logistic regression(FLR)that takes into account convergence latencies and the accuracy of the final FL model within healthcare networks.To establish the FLR framework for mission-critical healthcare applications,we provide a comprehensive workflow in this paper,introducing framework setup,iterative round communications,and model evaluation/deployment.Our optimization process delves into the formulation of loss functions and gradients within the domain of federated optimization,which concludes with the generation of service experience batches for model deployment.To assess the practicality of our approach,we conducted experiments using a hypertension prediction model with data sourced from the 2019 annual dataset(Version 2.0.1)of the Korea Medical Panel Survey.Performance metrics,including end-to-end execution delays,model drop/delivery ratios,and final model accuracies,are captured and compared between the proposed FLR framework and other baseline schemes.Our study offers an FLR framework setup for the enhancement of real-time prediction modeling within intelligent healthcare networks,addressing the critical demands of QoS reliability and privacy preservation.展开更多
BACKGROUND Due to academic pressure,social relations,and the change of adapting to independent life,college students are under high levels of pressure.Therefore,it is very important to study the mental health problems...BACKGROUND Due to academic pressure,social relations,and the change of adapting to independent life,college students are under high levels of pressure.Therefore,it is very important to study the mental health problems of college students.Developing a predictive model that can detect early warning signals of college students’mental health risks can help support early intervention and improve overall well-being.AIM To investigate college students’present psychological well-being,identify the contributing factors to its decline,and construct a predictive nomogram model.METHODS We analyzed the psychological health status of 40874 university students in selected universities in Hubei Province,China from March 1 to 15,2022,using online questionnaires and random sampling.Factors influencing their mental health were also analyzed using the logistic regression approach,and R4.2.3 software was employed to develop a nomogram model for risk prediction.RESULTS We randomly selected 918 valid data and found that 11.3%of college students had psychological problems.The results of the general data survey showed that the mental health problems of doctoral students were more prominent than those of junior college students,and the mental health of students from rural areas was more likely to be abnormal than that of urban students.In addition,students who had experienced significant life events and divorced parents were more likely to have an abnormal status.The abnormal group exhibited significantly higher Patient Health Questionnaire-9(PHQ-9)and Generalized Anxiety Disorder-7 scores than the healthy group,with these differences being statistically significant(P<0.05).The nomogram prediction model drawn by multivariate analysis includ-ed six predictors:The place of origin,whether they were single children,whether there were significant life events,parents’marital status,regular exercise,intimate friends,and the PHQ-9 score.The training set demonstrated an area under the receiver operating characteristic(ROC)curve(AUC)of 0.972[95%confidence interval(CI):0.947-0.997],a specificity of 0.888 and a sensitivity of 0.972.Similarly,the validation set had a ROC AUC of 0.979(95%CI:0.955-1.000),with a specificity of 0.942 and a sensitivity of 0.939.The H-L deviation test result was χ^(2)=32.476,P=0.000007,suggesting that the model calibration was good.CONCLUSION In this study,nearly 11.3%of contemporary college students had psychological problems,the risk factors include students from rural areas,divorced parents,non-single children,infrequent exercise,and significant life events.展开更多
In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived ...In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.展开更多
Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct ...Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct judgment of its category. In this paper, mathematical models and methods such as Chi-square test, weighted average method, principal component analysis, cluster analysis, binary classification model and grey correlation analysis were used comprehensively to analyze the data of sample glass products combined with their categories. The results showed that the weathered high-potassium glass could be divided into 12, 9, 10 and 27, 7, 22 and so on.展开更多
This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,w...This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.展开更多
Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used ...Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used tool to build prediction models in swine nutrition,while the artificial neural networks(ANN)model is reported to be more accurate than MR model in prediction performance.Therefore,the potential of ANN models in predicting the growth performance of pigs was evaluated and compared with MR models in this study.Results:Body weight(BW),net energy(NE)intake,standardized ileal digestible lysine(SID Lys)intake,and their quadratic terms were selected as input variables to predict ADG and F/G among 10 candidate variables.In the training phase,MR models showed high accuracy in both ADG and F/G prediction(R^(2)_(ADG)=0.929,R^(2)_(F/G)=0.886)while ANN models with 4,6 neurons and radial basis activation function yielded the best performance in ADG and F/G prediction(R^(2)_(ADG)=0.964,R^(2)_(F/G)=0.932).In the testing phase,these ANN models showed better accuracy in ADG prediction(CCC:0.976 vs.0.861,R^(2):0.951 vs.0.584),and F/G prediction(CCC:0.952 vs.0.900,R^(2):0.905 vs.0.821)compared with the MR models.Meanwhile,the“over-fitting”occurred in MR models but not in ANN models.On validation data from the animal trial,ANN models exhibited superiority over MR models in both ADG and F/G prediction(P<0.01).Moreover,the growth stages have a significant effect on the prediction accuracy of the models.Conclusion:Body weight,NE intake and SID Lys intake can be used as input variables to predict the growth performance of growing-finishing pigs,with trained ANN models are more flexible and accurate than MR models.Therefore,it is promising to use ANN models in related swine nutrition studies in the future.展开更多
According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
To ensure the safety of buildings surrounding foundation pits, a study was made on a settlement monitoring and trend prediction method. A statistical testing method for analyzing the stability of a settlement monitori...To ensure the safety of buildings surrounding foundation pits, a study was made on a settlement monitoring and trend prediction method. A statistical testing method for analyzing the stability of a settlement monitoring datum has been discussed. According to a comprehensive survey, data of 16 stages at operating control point, were verified by a standard t test to determine the stability of the operating control point. A stationary auto-regression model, AR(p), used for the observation point settlement prediction has been investigated. Given the 16 stages of the settlement data at an observation point, the applicability of this model was analyzed. Settlement of last four stages was predicted using the stationary auto-regression model AR (1); the maximum difference between predicted and measured values was 0.6 mm, indicating good prediction results of the model. Hence, this model can be applied to settlement predictions for buildings surrounding foundation pits.展开更多
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
文摘Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R<sup>2</sup> values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.
基金This research was funded by the National Natural Science Foundation of China(grant no.32271881).
文摘Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have developed from the logistic regression model,the geographical weighted logistic regression model,the Lasso regression model,the random forest model,and the support vector machine model based on historical forest fire data from 2000 to 2019 in Jilin Province.The models,along with a distribution map are presented in this paper to provide a theoretical basis for forest fire management in this area.Existing studies show that the prediction accuracies of the two machine learning models are higher than those of the three generalized linear regression models.The accuracies of the random forest model,the support vector machine model,geographical weighted logistic regression model,the Lasso regression model,and logistic model were 88.7%,87.7%,86.0%,85.0%and 84.6%,respectively.Weather is the main factor affecting forest fires,while the impacts of topography factors,human and social-economic factors on fire occurrence were similar.
文摘This is an erratum to an already published paper named“Establishment of a prediction model for prehospital return of spontaneous circulation in out-ofhospital patients with cardiac arrest”.We found errors in the affiliated institution of the authors.We apologize for our unintentional mistake.Please note,these changes do not affect our results.
基金supported by National Natural Science Foundation of China (61703410,61873175,62073336,61873273,61773386,61922089)。
文摘Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.
基金the support of the Monash-IITB Academy Scholarshipthe Australian Research Council for funding the present research (DP190103592)。
文摘Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two-part study, an ML approach is presented that offers accelerated digital design of Mg alloys. A systematic evaluation of four ML regression algorithms was explored to rationalise the complex relationships in Mg-alloy data and to capture the composition-processing-property patterns. Cross-validation and hold-out set validation techniques were utilised for unbiased estimation of model performance. Using atomic and thermodynamic properties of the alloys, feature augmentation was examined to define the most descriptive representation spaces for the alloy data. Additionally, a graphical user interface(GUI) webtool was developed to facilitate the use of the proposed models in predicting the mechanical properties of new Mg alloys. The results demonstrate that random forest regression model and neural network are robust models for predicting the ultimate tensile strength and ductility of Mg alloys, with accuracies of ~80% and 70% respectively. The developed models in this work are a step towards high-throughput screening of novel candidates for target mechanical properties and provide ML-guided alloy design.
基金financed by the research program FRAS-The Future Silviculture in Southern Sweden
文摘Birch has long suffered from a lack of active forest management,leading many researchers to use mate-rial without a detailed management history.Data collected from three birch(Betula pendula Roth,B.pubescens Ehrh.)sites in southern Sweden were analyzed using regression analysis to detect any trends or differences in wood proper-ties that could be explained by stand history,tree age and stem form.All sites were genetics trials established in the same way.Estimates of acoustic velocity(AV)from non-destructive testing(NDT)and predicted AV had a higher correlation if data was pooled across sites and other stem form factors were considered.A subsample of stems had radial profiles of X-ray wood density and ring width by year created,and wood density was related to ring number from the pith and ring width.It seemed likely that wood density was negatively related to ring width for both birch species.Linear models had slight improvements if site and species were included,but only the youngest site with trees at age 15 had both birch species.This paper indicated that NDT values need to be considered separately,and any predictive models will likely be improved if they are specific to the site and birch species measured.
基金supported by the National Natural Science Foundation of China (41602205, 42293261)the China Geological Survey Program (DD20189506, DD20211301)+2 种基金the Special Investigation Project on Science and Technology Basic Resources of the Ministry of Science and Technology (2021FY101003)the Central Guidance for Local Scientific and Technological Development Fund of 2023the Project of Hebei University of Environmental Engineering (GCY202301)
文摘The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecological environment changes and even global changes.Based on field measurements,combined with Linear Regression(LR)model and Inverse Distance Weighing(IDW)method,this paper presents detailed analysis on the change history and trend of the shoreline and tidal flat in Bohai Bay.The shoreline faces a high erosion chance under the action of natural factors,while the tidal flat faces a different erosion and deposition patterns in Bohai Bay due to the impact of human activities.The implication of change rule for ecological protection and recovery is also discussed.Measures should be taken to protect the coastal ecological environment.The models used in this paper show a high correlation coefficient between observed and modeling data,which means that this method can be used to predict the changing trend of shoreline and tidal flat.The research results of present study can provide scientific supports for future coastal protection and management.
文摘BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale cannot be fully understood due to lack of information.AIM To identify key factors that may explain the variability in case lethality across countries.METHODS We identified 21 Potential risk factors for coronavirus disease 2019(COVID-19)case fatality rate for all the countries with available data.We examined univariate relationships of each variable with case fatality rate(CFR),and all independent variables to identify candidate variables for our final multiple model.Multiple regression analysis technique was used to assess the strength of relationship.RESULTS The mean of COVID-19 mortality was 1.52±1.72%.There was a statistically significant inverse correlation between health expenditure,and number of computed tomography scanners per 1 million with CFR,and significant direct correlation was found between literacy,and air pollution with CFR.This final model can predict approximately 97%of the changes in CFR.CONCLUSION The current study recommends some new predictors explaining affect mortality rate.Thus,it could help decision-makers develop health policies to fight COVID-19.
文摘BACKGROUND Out-of-hospital cardiac arrest(OHCA)is a leading cause of death worldwide.AIM To explore factors influencing prehospital return of spontaneous circulation(P-ROSC)in patients with OHCA and develop a nomogram prediction model.METHODS Clinical data of patients with OHCA in Shenzhen,China,from January 2012 to December 2019 were retrospectively analyzed.Least absolute shrinkage and selection operator(LASSO)regression and multivariate logistic regression were applied to select the optimal factors predicting P-ROSC in patients with OHCA.A nomogram prediction model was established based on these influencing factors.Discrimination and calibration were assessed using receiver operating charac-teristic(ROC)and calibration curves.Decision curve analysis(DCA)was used to evaluate the model’s clinical utility.RESULTS Among the included 2685 patients with OHCA,the P-ROSC incidence was 5.8%.LASSO and multivariate logistic regression analyses showed that age,bystander cardiopulmonary resuscitation(CPR),initial rhythm,CPR duration,ventilation mode,and pathogenesis were independent factors influencing P-ROSC in these patients.The area under the ROC was 0.963.The calibration plot demonstrated that the predicted P-ROSC model was concordant with the actual P-ROSC.The good clinical usability of the prediction model was confirmed using DCA.CONCLUSION The nomogram prediction model could effectively predict the probability of P-ROSC in patients with OHCA.
基金This work was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0016977,The Establishment Project of Industry-University Fusion District).
文摘The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist and education-centric localities.In the upcoming arrival of electric kickboard vehicles,deploying a customer rental service is essential.Due to its freefloating nature,the shared electric kickboard is a common and practical means of transportation.Relocation plans for shared electric kickboards are required to increase the quality of service,and forecasting demand for their use in a specific region is crucial.Predicting demand accurately with small data is troublesome.Extensive data is necessary for training machine learning algorithms for effective prediction.Data generation is a method for expanding the amount of data that will be further accessible for training.In this work,we proposed a model that takes time-series customers’electric kickboard demand data as input,pre-processes it,and generates synthetic data according to the original data distribution using generative adversarial networks(GAN).The electric kickboard mobility demand prediction error was reduced when we combined synthetic data with the original data.We proposed Tabular-GAN-Modified-WGAN-GP for generating synthetic data for better prediction results.We modified The Wasserstein GAN-gradient penalty(GP)with the RMSprop optimizer and then employed Spectral Normalization(SN)to improve training stability and faster convergence.Finally,we applied a regression-based blending ensemble technique that can help us to improve performance of demand prediction.We used various evaluation criteria and visual representations to compare our proposed model’s performance.Synthetic data generated by our suggested GAN model is also evaluated.The TGAN-Modified-WGAN-GP model mitigates the overfitting and mode collapse problem,and it also converges faster than previous GAN models for synthetic data creation.The presented model’s performance is compared to existing ensemble and baseline models.The experimental findings imply that combining synthetic and actual data can significantly reduce prediction error rates in the mean absolute percentage error(MAPE)of 4.476 and increase prediction accuracy.
文摘BACKGROUND Hepatocellular carcinoma(HCC)is difficult to diagnose with poor therapeutic effect,high recurrence rate and has a low survival rate.The survival of patients with HCC is closely related to the stage of diagnosis.At present,no specific serolo-gical indicator or method to predict HCC,early diagnosis of HCC remains a challenge,especially in China,where the situation is more severe.AIM To identify risk factors associated with HCC and establish a risk prediction model based on clinical characteristics and liver-related indicators.METHODS The clinical data of patients in the Affiliated Hospital of North Sichuan Medical College from 2016 to 2020 were collected,using a retrospective study method.The results of needle biopsy or surgical pathology were used as the grouping criteria for the experimental group and the control group in this study.Based on the time of admission,the cases were divided into training cohort(n=1739)and validation cohort(n=467).Using HCC as a dependent variable,the research indicators were incorporated into logistic univariate and multivariate analysis.An HCC risk prediction model,which was called NSMC-HCC model,was then established in training cohort and verified in validation cohort.RESULTS Logistic univariate analysis showed that,gender,age,alpha-fetoprotein,and protein induced by vitamin K absence or antagonist-II,gamma-glutamyl transferase,aspartate aminotransferase and hepatitis B surface antigen were risk factors for HCC,alanine aminotransferase,total bilirubin and total bile acid were protective factors for HCC.When the cut-off value of the NSMC-HCC model joint prediction was 0.22,the area under receiver operating characteristic curve(AUC)of NSMC-HCC model in HCC diagnosis was 0.960,with sensitivity 94.40%and specificity 95.35%in training cohort,and AUC was 0.966,with sensitivity 90.00%and specificity 94.20%in validation cohort.In early-stage HCC diagnosis,the AUC of NSMC-HCC model was 0.946,with sensitivity 85.93%and specificity 93.62%in training cohort,and AUC was 0.947,with sensitivity 89.10%and specificity 98.49%in validation cohort.CONCLUSION The newly NSMC-HCC model was an effective risk prediction model in HCC and early-stage HCC diagnosis.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS2022-00167197Development of Intelligent 5G/6G Infrastructure Technology for the Smart City)+2 种基金in part by the National Research Foundation of Korea(NRF),Ministry of Education,through Basic Science Research Program under Grant NRF-2020R1I1A3066543in part by BK21 FOUR(Fostering Outstanding Universities for Research)under Grant 5199990914048in part by the Soonchunhyang University Research Fund.
文摘Intelligent healthcare networks represent a significant component in digital applications,where the requirements hold within quality-of-service(QoS)reliability and safeguarding privacy.This paper addresses these requirements through the integration of enabler paradigms,including federated learning(FL),cloud/edge computing,softwaredefined/virtualized networking infrastructure,and converged prediction algorithms.The study focuses on achieving reliability and efficiency in real-time prediction models,which depend on the interaction flows and network topology.In response to these challenges,we introduce a modified version of federated logistic regression(FLR)that takes into account convergence latencies and the accuracy of the final FL model within healthcare networks.To establish the FLR framework for mission-critical healthcare applications,we provide a comprehensive workflow in this paper,introducing framework setup,iterative round communications,and model evaluation/deployment.Our optimization process delves into the formulation of loss functions and gradients within the domain of federated optimization,which concludes with the generation of service experience batches for model deployment.To assess the practicality of our approach,we conducted experiments using a hypertension prediction model with data sourced from the 2019 annual dataset(Version 2.0.1)of the Korea Medical Panel Survey.Performance metrics,including end-to-end execution delays,model drop/delivery ratios,and final model accuracies,are captured and compared between the proposed FLR framework and other baseline schemes.Our study offers an FLR framework setup for the enhancement of real-time prediction modeling within intelligent healthcare networks,addressing the critical demands of QoS reliability and privacy preservation.
基金Supported by Hubei Province Education Science Planning Project,No.2020GB132。
文摘BACKGROUND Due to academic pressure,social relations,and the change of adapting to independent life,college students are under high levels of pressure.Therefore,it is very important to study the mental health problems of college students.Developing a predictive model that can detect early warning signals of college students’mental health risks can help support early intervention and improve overall well-being.AIM To investigate college students’present psychological well-being,identify the contributing factors to its decline,and construct a predictive nomogram model.METHODS We analyzed the psychological health status of 40874 university students in selected universities in Hubei Province,China from March 1 to 15,2022,using online questionnaires and random sampling.Factors influencing their mental health were also analyzed using the logistic regression approach,and R4.2.3 software was employed to develop a nomogram model for risk prediction.RESULTS We randomly selected 918 valid data and found that 11.3%of college students had psychological problems.The results of the general data survey showed that the mental health problems of doctoral students were more prominent than those of junior college students,and the mental health of students from rural areas was more likely to be abnormal than that of urban students.In addition,students who had experienced significant life events and divorced parents were more likely to have an abnormal status.The abnormal group exhibited significantly higher Patient Health Questionnaire-9(PHQ-9)and Generalized Anxiety Disorder-7 scores than the healthy group,with these differences being statistically significant(P<0.05).The nomogram prediction model drawn by multivariate analysis includ-ed six predictors:The place of origin,whether they were single children,whether there were significant life events,parents’marital status,regular exercise,intimate friends,and the PHQ-9 score.The training set demonstrated an area under the receiver operating characteristic(ROC)curve(AUC)of 0.972[95%confidence interval(CI):0.947-0.997],a specificity of 0.888 and a sensitivity of 0.972.Similarly,the validation set had a ROC AUC of 0.979(95%CI:0.955-1.000),with a specificity of 0.942 and a sensitivity of 0.939.The H-L deviation test result was χ^(2)=32.476,P=0.000007,suggesting that the model calibration was good.CONCLUSION In this study,nearly 11.3%of contemporary college students had psychological problems,the risk factors include students from rural areas,divorced parents,non-single children,infrequent exercise,and significant life events.
文摘In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.
文摘Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct judgment of its category. In this paper, mathematical models and methods such as Chi-square test, weighted average method, principal component analysis, cluster analysis, binary classification model and grey correlation analysis were used comprehensively to analyze the data of sample glass products combined with their categories. The results showed that the weathered high-potassium glass could be divided into 12, 9, 10 and 27, 7, 22 and so on.
文摘This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.
基金funded by the National Natural Science Foundation of China(32072764, 31702121)the 2115 Talent Development Program of China Agricultural UniversityNational Key Research and Development Program of China (2019YFD1002605)
文摘Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used tool to build prediction models in swine nutrition,while the artificial neural networks(ANN)model is reported to be more accurate than MR model in prediction performance.Therefore,the potential of ANN models in predicting the growth performance of pigs was evaluated and compared with MR models in this study.Results:Body weight(BW),net energy(NE)intake,standardized ileal digestible lysine(SID Lys)intake,and their quadratic terms were selected as input variables to predict ADG and F/G among 10 candidate variables.In the training phase,MR models showed high accuracy in both ADG and F/G prediction(R^(2)_(ADG)=0.929,R^(2)_(F/G)=0.886)while ANN models with 4,6 neurons and radial basis activation function yielded the best performance in ADG and F/G prediction(R^(2)_(ADG)=0.964,R^(2)_(F/G)=0.932).In the testing phase,these ANN models showed better accuracy in ADG prediction(CCC:0.976 vs.0.861,R^(2):0.951 vs.0.584),and F/G prediction(CCC:0.952 vs.0.900,R^(2):0.905 vs.0.821)compared with the MR models.Meanwhile,the“over-fitting”occurred in MR models but not in ANN models.On validation data from the animal trial,ANN models exhibited superiority over MR models in both ADG and F/G prediction(P<0.01).Moreover,the growth stages have a significant effect on the prediction accuracy of the models.Conclusion:Body weight,NE intake and SID Lys intake can be used as input variables to predict the growth performance of growing-finishing pigs,with trained ANN models are more flexible and accurate than MR models.Therefore,it is promising to use ANN models in related swine nutrition studies in the future.
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
基金Project 50279005 supported by the National Natural Science Foundation of China
文摘To ensure the safety of buildings surrounding foundation pits, a study was made on a settlement monitoring and trend prediction method. A statistical testing method for analyzing the stability of a settlement monitoring datum has been discussed. According to a comprehensive survey, data of 16 stages at operating control point, were verified by a standard t test to determine the stability of the operating control point. A stationary auto-regression model, AR(p), used for the observation point settlement prediction has been investigated. Given the 16 stages of the settlement data at an observation point, the applicability of this model was analyzed. Settlement of last four stages was predicted using the stationary auto-regression model AR (1); the maximum difference between predicted and measured values was 0.6 mm, indicating good prediction results of the model. Hence, this model can be applied to settlement predictions for buildings surrounding foundation pits.