Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medi...Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medium enterprises(SMEs)that are engaged in ISO 14001:2015 initiatives and consume electrical and electronic equipment(EEE)to enhance their performance and profitability.The methodology employed an exploratory approach involving the application of general equilibrium theory(GET)to contextualize the study and generate relevant parameters for deploying the random forest regression learning algorithm for predictions.Machine learning was applied to 80%of the samples for training,while simulation was conducted on the remaining 20%of samples based on quantities of EEE utilized over a specific period,utilization rates,repair rates,and average lifespans.The results demonstrate that the model’s predicted values are significantly close to the actual quantities of generated WEEE,and the model’s performance was evaluated using the mean squared error(MSE)and yielding satisfactory results.Based on this model,both companies and stakeholders can set realistic objectives for managing companies’WEEE,fostering sustainable socio-environmental practices.展开更多
With the development of UAV technology,UAV aerial magnetic survey plays an important role in the airborne geophysical prospecting.In the aeromagnetic survey,the magnetic field interferences generated by the magnetic c...With the development of UAV technology,UAV aerial magnetic survey plays an important role in the airborne geophysical prospecting.In the aeromagnetic survey,the magnetic field interferences generated by the magnetic components on the aircraft greatly affect the accuracy of the survey results.Therefore,it is necessary to use aeromagnetic compensation technology to eliminate the interfering magnetic field.So far,the aeromagnetic compensation methods used are mainly linear regression compensation methods based on the T-L equation.The least square is one of the most commonly used methods to solve multiple linear regressions.However,considering that the correlation between data may lead to instability of the algorithm,we use the ridge regression algorithm to solve the multicollinearity problem in the T-L equation.Subsequently this method is applied to the aeromagnetic survey data,and the standard deviation is selected as the index to evaluate the compensation effect to verify the effectiveness of the method.展开更多
The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration o...The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration of the influencing factors,leading to large errors in their calculations.Therefore,a stacking ensemble learning model(stacking-SSAOP)based on multi-layer regression algorithm fusion and optimized by the sparrow search algorithm is proposed for predicting the slope safety factor.In this method,the density,cohesion,friction angle,slope angle,slope height,and pore pressure ratio are selected as characteristic parameters from the 210 sets of established slope sample data.Random Forest,Extra Trees,AdaBoost,Bagging,and Support Vector regression are used as the base model(inner loop)to construct the first-level regression algorithm layer,and XGBoost is used as the meta-model(outer loop)to construct the second-level regression algorithm layer and complete the construction of the stacked learning model for improving the model prediction accuracy.The sparrow search algorithm is used to optimize the hyperparameters of the above six regression models and correct the over-and underfitting problems of the single regression model to further improve the prediction accuracy.The mean square error(MSE)of the predicted and true values and the fitting of the data are compared and analyzed.The MSE of the stacking-SSAOP model was found to be smaller than that of the single regression model(MSE=0.03917).Therefore,the former has a higher prediction accuracy and better data fitting.This study innovatively applies the sparrow search algorithm to predict the slope safety factor,showcasing its advantages over traditional methods.Additionally,our proposed stacking-SSAOP model integrates multiple regression algorithms to enhance prediction accuracy.This model not only refines the prediction accuracy of the slope safety factor but also offers a fresh approach to handling the intricate soil composition and other influencing factors,making it a precise and reliable method for slope stability evaluation.This research holds importance for the modernization and digitalization of slope safety assessments.展开更多
This research introduces a novel approach to improve and optimize the predictive capacity of consumer purchase behaviors on e-commerce platforms. This study presented an introduction to the fundamental concepts of the...This research introduces a novel approach to improve and optimize the predictive capacity of consumer purchase behaviors on e-commerce platforms. This study presented an introduction to the fundamental concepts of the logistic regression algorithm. In addition, it analyzed user data obtained from an e-commerce platform. The original data were preprocessed, and a consumer purchase prediction model was developed for the e-commerce platform using the logistic regression method. The comparison study used the classic random forest approach, further enhanced by including the K-fold cross-validation method. Evaluation of the accuracy of the model’s classification was conducted using performance indicators that included the accuracy rate, the precision rate, the recall rate, and the F1 score. A visual examination determined the significance of the findings. The findings suggest that employing the logistic regression algorithm to forecast customer purchase behaviors on e-commerce platforms can improve the efficacy of the approach and yield more accurate predictions. This study serves as a valuable resource for improving the precision of forecasting customers’ purchase behaviors on e-commerce platforms. It has significant practical implications for optimizing the operational efficiency of e-commerce platforms.展开更多
In view of the difficulty in calculating the atomic structure parameters of high-Z elements,the Hartree–Fock with relativistic corrections(HFR)theory in combination with the ridge regression(RR)algorithm rather than ...In view of the difficulty in calculating the atomic structure parameters of high-Z elements,the Hartree–Fock with relativistic corrections(HFR)theory in combination with the ridge regression(RR)algorithm rather than the Cowan code’s least squares fitting(LSF)method is proposed and applied.By analyzing the energy level structure parameters of the HFR theory and using the fitting experimental energy level extrapolation method,some excited state energy levels of the Yb I(Z=70)atom including the 4f open shell are calculated.The advantages of the ridge regression algorithm are demonstrated by comparing it with Cowan code’s LSF results.In addition,the results obtained by the new method are compared with the experimental results and other theoretical results to demonstrate the reliability and accuracy of our approach.展开更多
In the field of computer research,the increase of data in result of societal progress has been remarkable,and the management of this data and the analysis of linked businesses have grown in popularity.There are numero...In the field of computer research,the increase of data in result of societal progress has been remarkable,and the management of this data and the analysis of linked businesses have grown in popularity.There are numerous practical uses for the capability to extract key characteristics from secondary property data and utilize these characteristics to forecast home prices.Using regression methods in machine learning to segment the data set,examine the major factors affecting it,and forecast home prices is the most popular method for examining pricing information.It is challenging to generate precise forecasts since many of the regression models currently being utilized in research are unable to efficiently collect data on the distinctive elements that correlate y with a high degree of house price movement.In today’s forecasting studies,ensemble learning is a very prevalent and well-liked study methodology.The regression integration computation of large housing datasets can use a lot of computer resources as well as computation time,and ensemble learning uses more resources and calls for more machine support in integrating diverse models.The Average Model suggested in this paper uses the concept of fusion to produce integrated analysis findings from several models,combining the best benefits of separate models.The Average Model has a strong applicability in the field of regression prediction and significantly increases computational efficiency.The technique is also easier to replicate and very effective in regression investigations.Before using regression processing techniques,this work creates an average of different regression models using the AM(Average Model)algorithm in a novel way.By evaluating essential models with 90%accuracy,this technique significantly increases the accuracy of house price predictions.The experimental results show that the AM algorithm proposed in this paper has lower prediction error than other comparison algorithms,and the prediction accuracy is greatly improved compared with other algorithms,and has a good experimental effect in house price prediction.展开更多
Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models...Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models,but still selection of suitable transformation of the independent variables in a regression model is diffcult.In this paper,a genetic algorithm(GA)has been employed as a heuristic search method for selection of best transformation of the independent variables(some index properties of rocks)in regression models for prediction of uniaxial compressive strength(UCS)and modulus of elasticity(E).Firstly,multiple linear regression(MLR)analysis was performed on a data set to establish predictive models.Then,two GA models were developed in which root mean squared error(RMSE)was defned as ftness function.Results have shown that GA models are more precise than MLR models and are able to explain the relation between the intrinsic strength/elasticity properties and index properties of rocks by simple formulation and accepted accuracy.展开更多
There are various analytical, empirical and numerical methods to calculate groundwater inflow into tun- nels excavated in rocky media. Analytical methods have been widely applied in prediction of groundwa- ter inflow ...There are various analytical, empirical and numerical methods to calculate groundwater inflow into tun- nels excavated in rocky media. Analytical methods have been widely applied in prediction of groundwa- ter inflow to tunnels due to their simplicity and practical base theory. Investigations show that the real amount of water infiltrating into jointed tunnels is much less than calculated amount using analytical methods and obtained results are very dependent on tunnel's geometry and environmental situations. In this study, using multiple regression analysis, a new empirical model for estimation of groundwater seepage into circular tunnels was introduced. Our data was acquired from field surveys and laboratory analysis of core samples. New regression variables were defined after perusing single and two variables relationship between groundwater seepage and other variables. Finally, an appropriate model for estima- tion of leakage was obtained using the stepwise algorithm. Statistics like R, R2, R2e and the histogram of residual values in the model represent a good reputation and fitness for this model to estimate the groundwater seepage into tunnels. The new experimental model was used for the test data and results were satisfactory. Therefore, multiple regression analysis is an effective and efficient way to estimate the groundwater seeoage into tunnels.展开更多
The pruning algorithms for sparse least squares support vector regression machine are common methods, and easily com- prehensible, but the computational burden in the training phase is heavy due to the retraining in p...The pruning algorithms for sparse least squares support vector regression machine are common methods, and easily com- prehensible, but the computational burden in the training phase is heavy due to the retraining in performing the pruning process, which is not favorable for their applications. To this end, an im- proved scheme is proposed to accelerate sparse least squares support vector regression machine. A major advantage of this new scheme is based on the iterative methodology, which uses the previous training results instead of retraining, and its feasibility is strictly verified theoretically. Finally, experiments on bench- mark data sets corroborate a significant saving of the training time with the same number of support vectors and predictive accuracy compared with the original pruning algorithms, and this speedup scheme is also extended to classification problem.展开更多
“Breeding by design” for pure lines may be achieved by construction of an additive QTL-allele matrix in a germplasm panel or breeding population, but this option is not available for hybrids, where both additive and...“Breeding by design” for pure lines may be achieved by construction of an additive QTL-allele matrix in a germplasm panel or breeding population, but this option is not available for hybrids, where both additive and dominance QTL-allele matrices must be constructed. In this study, a hybrid-QTL identification approach, designated PLSRGA, using partial least squares regression(PLSR) for model fitting integrated with a genetic algorithm(GA) for variable selection based on a multi-locus, multi-allele model is described for additive and dominance QTL-allele detection in a diallel hybrid population(DHP). The PLSRGA was shown by simulation experiments to be superior to single-marker analysis and was then used for QTL-allele identification in a soybean DPH yield experiment with eight parents. Twenty-eight main-effect QTL with 138 alleles and nine QTL × environment QTL with 46 alleles were identified, with respective contributions of 61.8% and 23.5% of phenotypic variation. Main-effect additive and dominance QTL-allele matrices were established as a compact form of the DHP genetic structure. The mechanism of heterosis superior-to-parents(or superior-to-parents heterosis, SPH) was explored and might be explained by a complementary locus-set composed of OD+(showing positive over-dominance, most often), PD+(showing positive partial-to-complete dominance, less often) and HA+(showing positive homozygous additivity, occasionally) loci, depending on the parental materials. Any locus-type, whether OD+, PD + and HA+, could be the best genotype of a locus. All hybrids showed various numbers of better or best genotypes at many but not necessarily all loci, indicating further SPH improvement. Based on the additive/dominance QTL-allele matrices, the best hybrid genotype was predicted, and a hybrid improvement approach is suggested. PLSRGA is powerful for hybrid QTL-allele detection and cross-SPH improvement.展开更多
Based on the monitoring and discovery service 4 (MDS4) model, a monitoring model for a data grid which supports reliable storage and intrusion tolerance is designed. The load characteristics and indicators of comput...Based on the monitoring and discovery service 4 (MDS4) model, a monitoring model for a data grid which supports reliable storage and intrusion tolerance is designed. The load characteristics and indicators of computing resources in the monitoring model are analyzed. Then, a time-series autoregressive prediction model is devised. And an autoregressive support vector regression( ARSVR) monitoring method is put forward to predict the node load of the data grid. Finally, a model for historical observations sequences is set up using the autoregressive (AR) model and the model order is determined. The support vector regression(SVR) model is trained using historical data and the regression function is obtained. Simulation results show that the ARSVR method can effectively predict the node load.展开更多
It is hard for the existing methods to obtain the expression of the system reliability for most of the practical complex systems with a large number of components and possible stales. A new regression algorithm based ...It is hard for the existing methods to obtain the expression of the system reliability for most of the practical complex systems with a large number of components and possible stales. A new regression algorithm based on the lower and upper bounds is presented in this paper, which can obtain the system reliability analytically without concerning the structure of the complex system. The method has been applied to a real system and the reliability results are compared with those acquired by the classical method and the parametric method. The effectiveness and accuracy of the proposed method have been testified.展开更多
One-dimensional synthetic aperture microwave radiometers have higher spatial resolution and record measurements at multiple incidence angles.In this paper,we propose a multiple linear regression method to retrieve sea...One-dimensional synthetic aperture microwave radiometers have higher spatial resolution and record measurements at multiple incidence angles.In this paper,we propose a multiple linear regression method to retrieve sea surface wind speed at an incidence angle between 0°65°.We assume that a one-dimensional synthetic aperture microwave radiometer operates at frequencies of 6.9,10.65,18.7,23.8 and 36.5 GHz.Then,the microwave radiative transfer forward model is used to simulate the measured brightness temperatures.The sensitivity of the brightness temperatures at 0°65°to the sea surface wind speed is calculated.Then,vertical polarization channels(VR),horizontal polarization channels(HR)and all channels(AR)are used to retrieve the sea surface wind speed via a multiple linear regression algorithm at 0°65°,and the relationship between the retrieval error and incidence angle is obtained.The results are as follows:(1)The sensitivity of the vertical polarization brightness temperature to the sea surface wind speed is smaller than that of the horizontal polarization.(2)The retrieval error increases with Gaussian noise.The retrieval error of VR first increases and then decreases with increasing incidence angle,the retrieval error of HR gradually decreases with increasing incidence angle,and the retrieval error of AR first decreases and then increases with increasing incidence angle.(3)The retrieval error of AR is the lowest and it is necessary to retrieve the sea surface wind speed at a larger incidence angle for AR.展开更多
The application of low complexity and low order robust regression algorithm in channel estimation with 16QAM over fading channel for DS-CDMA is presented in this paper After initial channel estimation with classical m...The application of low complexity and low order robust regression algorithm in channel estimation with 16QAM over fading channel for DS-CDMA is presented in this paper After initial channel estimation with classical methods, channel gains estimated are filtered by linear or conic regression algorithm within a given regression length Simulation results show that this method offers up to 0,3 dB gain in a DS-CDMA system. The length and order of regression algorithm are two key parameters, which affect the system performance significantly and the optimal values of which depend on the speed of mobile station. It is demonstrated that this improved method can track fading channel accurately and outperforms over classical methods substantially by selecting appropriate parameters of regression algorithm under a certain channel environment.展开更多
The seasonal and inter-annual variations of Arctic cyclone are investigated. An automatic cyclone tracking algorithm developed by University of Reading was applied on the basis of European Center for Medium-range Weat...The seasonal and inter-annual variations of Arctic cyclone are investigated. An automatic cyclone tracking algorithm developed by University of Reading was applied on the basis of European Center for Medium-range Weather Forecasts(ECMWF) ERA-interim mean sea level pressure field with 6 h interval for 34 a period. The maximum number of the Arctic cyclones is counted in winter, and the minimum is in spring not in summer.About 50% of Arctic cyclones in summer generated from south of 70°N, moving into the Arctic. The number of Arctic cyclones has large inter-annual and seasonal variabilities, but no significant linear trend is detected for the period 1979–2012. The spatial distribution and linear trends of the Arctic cyclones track density show that the cyclone activity extent is the widest in summer with significant increasing trend in CRU(central Russia)subregion, and the largest track density is in winter with decreasing trend in the same subregion. The linear regressions between the cyclone track density and large-scale indices for the same period and pre-period sea ice area indices show that Arctic cyclone activities are closely linked to large-scale atmospheric circulations, such as Arctic Oscillation(AO), North Atlantic Oscillation(NAO) and Pacific-North American Pattern(PNA). Moreover,the pre-period sea ice area is significantly associated with the cyclone activities in some regions.展开更多
Background:Computed tomography images are easy to misjudge because of their complexity,especially images of solitary pulmonary nodules,of which diagnosis as benign or malignant is extremely important in lung cancer tr...Background:Computed tomography images are easy to misjudge because of their complexity,especially images of solitary pulmonary nodules,of which diagnosis as benign or malignant is extremely important in lung cancer treatment.Therefore,there is an urgent need for a more effective strategy in lung cancer diagnosis.In our study,we aimed to externally validate and revise the Mayo model,and a new model was established.Methods:A total of 1450 patients from three centers with solitary pulmonary nodules who underwent surgery were included in the study and were divided into training,internal validation,and external validation sets(n=849,365,and 236,respectively).External verification and recalibration of the Mayo model and establishment of new logistic regression model were performed on the training set.Overall performance of each model was evaluated using area under receiver operating characteristic curve(AUC).Finally,the model validation was completed on the validation data set.Results:The AUC of the Mayo model on the training set was 0.653(95%confidence interval[CI]:0.613–0.694).After re-estimation of the coefficients of all covariates included in the original Mayo model,the revised Mayo model achieved an AUC of 0.671(95%CI:0.635–0.706).We then developed a new model that achieved a higher AUC of 0.891(95%CI:0.865–0.917).It had an AUC of 0.888(95%CI:0.842–0.934)on the internal validation set,which was significantly higher than that of the revised Mayo model(AUC:0.577,95%CI:0.509–0.646)and the Mayo model(AUC:0.609,95%CI,0.544–0.675)(P<0.001).The AUC of the new model was 0.876(95%CI:0.831–0.920)on the external verification set,which was higher than the corresponding value of the Mayo model(AUC:0.705,95%CI:0.639–0.772)and revised Mayo model(AUC:0.706,95%CI:0.640–0.772)(P<0.001).Then the prediction model was presented as a nomogram,which is easier to generalize.Conclusions:After external verification and recalibration of the Mayo model,the results show that they are not suitable for the prediction of malignant pulmonary nodules in the Chinese population.Therefore,a new model was established by a backward stepwise process.The new model was constructed to rapidly discriminate benign from malignant pulmonary nodules,which could achieve accurate diagnosis of potential patients with lung cancer.展开更多
In this work, two chemometrics methods are applied for the modeling and prediction of electrophoretic mobilities of some organic and inorganic compounds. The successive projection algorithm, feature selection (SPA) ...In this work, two chemometrics methods are applied for the modeling and prediction of electrophoretic mobilities of some organic and inorganic compounds. The successive projection algorithm, feature selection (SPA) strategy, is used as the descriptor selection and model development method. Then, the support vector machine (SVM) and multiple linear regression (MLR) model are utilized to construct the non-linear and linear quantitative structure-property relationship models. The results obtained using the SVM model are compared with those obtained using MLR reveal that the SVM model is of much better predictive value than the MLR one. The root-mean-square errors for the training set and the test set for the SVM model were 0.1911 and 0.2569, respectively, while by the MLR model, they were 0.4908 and 0.6494, respectively. The results show that the SVM model drastically enhances the ability of prediction in QSPR studies and is superior to the MLR model.展开更多
Hot components operate in a high-temperature and high-pressure environment. The occurrence of a fault in hot components leads to high economic losses. In general, exhaust gas temperature(EGT) is used to monitor the pe...Hot components operate in a high-temperature and high-pressure environment. The occurrence of a fault in hot components leads to high economic losses. In general, exhaust gas temperature(EGT) is used to monitor the performance of hot components.However, during the early stages of a failure, the fault information is weak, and is simultaneously affected by various types of interference, such as the complex working conditions, ambient conditions, gradual performance degradation of the compressors and turbines, and noise. Additionally, inadequate effective information of the gas turbine also restricts the establishment of the detection model. To solve the above problems, this paper proposes an anomaly detection method based on frequent pattern extraction. A frequent pattern model(FPM) is applied to indicate the inherent regularity of change in EGT occurring from different types of interference. In this study, based on a genetic algorithm and support vector machine regression, the relationship model between the EGT and interference was tentatively built. The modeling accuracy was then further improved through the selection of the kernel function and training data. Experiments indicate that the optimal kernel function is linear and that the optimal training data should be balanced in addition to covering the appropriate range of operating conditions and ambient temperature. Furthermore, the thresholds based on the Pauta criterion that is automatically obtained during the modeling process, are used to determine whether hot components are operating abnormally. Moreover, the FPM is compared with the similarity theory, which demonstrates that the FPM can better suppress the effect of the component performance degradation and fuel heat value fluctuation. Finally, the effectiveness of the proposed method is validated on seven months of actual data obtained from a Titan130 gas turbine on an offshore oil platform. The results indicate that the proposed method can sensitively detect malfunctions in hot components during the early stages of a fault, and is robust to various types of interference.展开更多
Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis th...Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis than those without. Therefore, p CR is now considered as the best prognosticator for patients with neoadjuvant chemotherapy. However, not all patients can benefit from this treatment. As a result, we need to find a way to predict what kind of patients can induce p CR. Various gene signatures of chemosensitivity in breast cancer have been identified, from which such predictors can be built. Nevertheless, many of them have their prediction accuracy around 80%. As such, identifying gene signatures that could be employed to build high accuracy predictors is a prerequisite for their clinical tests and applications. Furthermore, to elucidate the importance of each individual gene in a signature is another pressing need before such signature could be tested in clinical settings. In this study, Genetic Algorithm(GA) and Sparse Logistic Regression(SLR) along with t-test were employed to identify one signature. It had 28 probe sets selected by GA from the top 65 probe sets that were highly overexpressed between p CR and Residual Disease(RD) and was used to build an SLR predictor of p CR(SLR-28). This predictor tested on a training set(n = 81) and validation set(n = 52) had very precise predictions measured by accuracy,specificity, sensitivity, positive predictive value, and negative predictive value with their corresponding P value all zero. Furthermore, this predictor discovered 12 important genes in the 28 probe set signature. Our findings also demonstrated that the most discriminative genes measured by SLR as a group selected by GA were not necessarily those with the smallest P values by t-test as individual genes, highlighting the ability of GA to capture the interacting genes in p CR prediction as multivariate techniques. Our gene signature produced superior performance over a signature found in one previous study with prediction accuracy 92% vs 76%, demonstrating the potential of GA and SLR in identifying robust gene signatures in chemo response prediction in breast cancer.展开更多
文摘Given the challenge of estimating or calculating quantities of waste electrical and electronic equipment(WEEE)in developing countries,this article focuses on predicting the WEEE generated by Cameroonian small and medium enterprises(SMEs)that are engaged in ISO 14001:2015 initiatives and consume electrical and electronic equipment(EEE)to enhance their performance and profitability.The methodology employed an exploratory approach involving the application of general equilibrium theory(GET)to contextualize the study and generate relevant parameters for deploying the random forest regression learning algorithm for predictions.Machine learning was applied to 80%of the samples for training,while simulation was conducted on the remaining 20%of samples based on quantities of EEE utilized over a specific period,utilization rates,repair rates,and average lifespans.The results demonstrate that the model’s predicted values are significantly close to the actual quantities of generated WEEE,and the model’s performance was evaluated using the mean squared error(MSE)and yielding satisfactory results.Based on this model,both companies and stakeholders can set realistic objectives for managing companies’WEEE,fostering sustainable socio-environmental practices.
文摘With the development of UAV technology,UAV aerial magnetic survey plays an important role in the airborne geophysical prospecting.In the aeromagnetic survey,the magnetic field interferences generated by the magnetic components on the aircraft greatly affect the accuracy of the survey results.Therefore,it is necessary to use aeromagnetic compensation technology to eliminate the interfering magnetic field.So far,the aeromagnetic compensation methods used are mainly linear regression compensation methods based on the T-L equation.The least square is one of the most commonly used methods to solve multiple linear regressions.However,considering that the correlation between data may lead to instability of the algorithm,we use the ridge regression algorithm to solve the multicollinearity problem in the T-L equation.Subsequently this method is applied to the aeromagnetic survey data,and the standard deviation is selected as the index to evaluate the compensation effect to verify the effectiveness of the method.
基金supported by the Basic Research Special Plan of Yunnan Provincial Department of Science and Technology-General Project(Grant No.202101AT070094)。
文摘The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration of the influencing factors,leading to large errors in their calculations.Therefore,a stacking ensemble learning model(stacking-SSAOP)based on multi-layer regression algorithm fusion and optimized by the sparrow search algorithm is proposed for predicting the slope safety factor.In this method,the density,cohesion,friction angle,slope angle,slope height,and pore pressure ratio are selected as characteristic parameters from the 210 sets of established slope sample data.Random Forest,Extra Trees,AdaBoost,Bagging,and Support Vector regression are used as the base model(inner loop)to construct the first-level regression algorithm layer,and XGBoost is used as the meta-model(outer loop)to construct the second-level regression algorithm layer and complete the construction of the stacked learning model for improving the model prediction accuracy.The sparrow search algorithm is used to optimize the hyperparameters of the above six regression models and correct the over-and underfitting problems of the single regression model to further improve the prediction accuracy.The mean square error(MSE)of the predicted and true values and the fitting of the data are compared and analyzed.The MSE of the stacking-SSAOP model was found to be smaller than that of the single regression model(MSE=0.03917).Therefore,the former has a higher prediction accuracy and better data fitting.This study innovatively applies the sparrow search algorithm to predict the slope safety factor,showcasing its advantages over traditional methods.Additionally,our proposed stacking-SSAOP model integrates multiple regression algorithms to enhance prediction accuracy.This model not only refines the prediction accuracy of the slope safety factor but also offers a fresh approach to handling the intricate soil composition and other influencing factors,making it a precise and reliable method for slope stability evaluation.This research holds importance for the modernization and digitalization of slope safety assessments.
文摘This research introduces a novel approach to improve and optimize the predictive capacity of consumer purchase behaviors on e-commerce platforms. This study presented an introduction to the fundamental concepts of the logistic regression algorithm. In addition, it analyzed user data obtained from an e-commerce platform. The original data were preprocessed, and a consumer purchase prediction model was developed for the e-commerce platform using the logistic regression method. The comparison study used the classic random forest approach, further enhanced by including the K-fold cross-validation method. Evaluation of the accuracy of the model’s classification was conducted using performance indicators that included the accuracy rate, the precision rate, the recall rate, and the F1 score. A visual examination determined the significance of the findings. The findings suggest that employing the logistic regression algorithm to forecast customer purchase behaviors on e-commerce platforms can improve the efficacy of the approach and yield more accurate predictions. This study serves as a valuable resource for improving the precision of forecasting customers’ purchase behaviors on e-commerce platforms. It has significant practical implications for optimizing the operational efficiency of e-commerce platforms.
基金the Fundamental Research Funds for the Central Universities(Grant No.10822041A2038).
文摘In view of the difficulty in calculating the atomic structure parameters of high-Z elements,the Hartree–Fock with relativistic corrections(HFR)theory in combination with the ridge regression(RR)algorithm rather than the Cowan code’s least squares fitting(LSF)method is proposed and applied.By analyzing the energy level structure parameters of the HFR theory and using the fitting experimental energy level extrapolation method,some excited state energy levels of the Yb I(Z=70)atom including the 4f open shell are calculated.The advantages of the ridge regression algorithm are demonstrated by comparing it with Cowan code’s LSF results.In addition,the results obtained by the new method are compared with the experimental results and other theoretical results to demonstrate the reliability and accuracy of our approach.
基金This work was supported in part by Sichuan Science and Technology Program(Grant No.2022YFG0174)in part by the Sichuan Gas Turbine Research Institute stability support project of China Aero Engine Group Co.,Ltd(Grant No.GJCZ-0034-19)。
文摘In the field of computer research,the increase of data in result of societal progress has been remarkable,and the management of this data and the analysis of linked businesses have grown in popularity.There are numerous practical uses for the capability to extract key characteristics from secondary property data and utilize these characteristics to forecast home prices.Using regression methods in machine learning to segment the data set,examine the major factors affecting it,and forecast home prices is the most popular method for examining pricing information.It is challenging to generate precise forecasts since many of the regression models currently being utilized in research are unable to efficiently collect data on the distinctive elements that correlate y with a high degree of house price movement.In today’s forecasting studies,ensemble learning is a very prevalent and well-liked study methodology.The regression integration computation of large housing datasets can use a lot of computer resources as well as computation time,and ensemble learning uses more resources and calls for more machine support in integrating diverse models.The Average Model suggested in this paper uses the concept of fusion to produce integrated analysis findings from several models,combining the best benefits of separate models.The Average Model has a strong applicability in the field of regression prediction and significantly increases computational efficiency.The technique is also easier to replicate and very effective in regression investigations.Before using regression processing techniques,this work creates an average of different regression models using the AM(Average Model)algorithm in a novel way.By evaluating essential models with 90%accuracy,this technique significantly increases the accuracy of house price predictions.The experimental results show that the AM algorithm proposed in this paper has lower prediction error than other comparison algorithms,and the prediction accuracy is greatly improved compared with other algorithms,and has a good experimental effect in house price prediction.
文摘Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models,but still selection of suitable transformation of the independent variables in a regression model is diffcult.In this paper,a genetic algorithm(GA)has been employed as a heuristic search method for selection of best transformation of the independent variables(some index properties of rocks)in regression models for prediction of uniaxial compressive strength(UCS)and modulus of elasticity(E).Firstly,multiple linear regression(MLR)analysis was performed on a data set to establish predictive models.Then,two GA models were developed in which root mean squared error(RMSE)was defned as ftness function.Results have shown that GA models are more precise than MLR models and are able to explain the relation between the intrinsic strength/elasticity properties and index properties of rocks by simple formulation and accepted accuracy.
文摘There are various analytical, empirical and numerical methods to calculate groundwater inflow into tun- nels excavated in rocky media. Analytical methods have been widely applied in prediction of groundwa- ter inflow to tunnels due to their simplicity and practical base theory. Investigations show that the real amount of water infiltrating into jointed tunnels is much less than calculated amount using analytical methods and obtained results are very dependent on tunnel's geometry and environmental situations. In this study, using multiple regression analysis, a new empirical model for estimation of groundwater seepage into circular tunnels was introduced. Our data was acquired from field surveys and laboratory analysis of core samples. New regression variables were defined after perusing single and two variables relationship between groundwater seepage and other variables. Finally, an appropriate model for estima- tion of leakage was obtained using the stepwise algorithm. Statistics like R, R2, R2e and the histogram of residual values in the model represent a good reputation and fitness for this model to estimate the groundwater seepage into tunnels. The new experimental model was used for the test data and results were satisfactory. Therefore, multiple regression analysis is an effective and efficient way to estimate the groundwater seeoage into tunnels.
基金supported by the National Natural Science Foundation of China(50576033)
文摘The pruning algorithms for sparse least squares support vector regression machine are common methods, and easily com- prehensible, but the computational burden in the training phase is heavy due to the retraining in performing the pruning process, which is not favorable for their applications. To this end, an im- proved scheme is proposed to accelerate sparse least squares support vector regression machine. A major advantage of this new scheme is based on the iterative methodology, which uses the previous training results instead of retraining, and its feasibility is strictly verified theoretically. Finally, experiments on bench- mark data sets corroborate a significant saving of the training time with the same number of support vectors and predictive accuracy compared with the original pruning algorithms, and this speedup scheme is also extended to classification problem.
基金supported by the National Key Research and Development Program of China (2021YFF1001204,2017YFD0101500)the MOE Program of Introducing Talents of Discipline to Universities (“111”Project, B08025)+4 种基金the MOE Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT_17R55)the MARA CARS-04 Programthe Jiangsu Higher Education PAPD Programthe Fundamental Research Funds for the Central Universities (KYZZ201901)the Jiangsu JCICMCP Program。
文摘“Breeding by design” for pure lines may be achieved by construction of an additive QTL-allele matrix in a germplasm panel or breeding population, but this option is not available for hybrids, where both additive and dominance QTL-allele matrices must be constructed. In this study, a hybrid-QTL identification approach, designated PLSRGA, using partial least squares regression(PLSR) for model fitting integrated with a genetic algorithm(GA) for variable selection based on a multi-locus, multi-allele model is described for additive and dominance QTL-allele detection in a diallel hybrid population(DHP). The PLSRGA was shown by simulation experiments to be superior to single-marker analysis and was then used for QTL-allele identification in a soybean DPH yield experiment with eight parents. Twenty-eight main-effect QTL with 138 alleles and nine QTL × environment QTL with 46 alleles were identified, with respective contributions of 61.8% and 23.5% of phenotypic variation. Main-effect additive and dominance QTL-allele matrices were established as a compact form of the DHP genetic structure. The mechanism of heterosis superior-to-parents(or superior-to-parents heterosis, SPH) was explored and might be explained by a complementary locus-set composed of OD+(showing positive over-dominance, most often), PD+(showing positive partial-to-complete dominance, less often) and HA+(showing positive homozygous additivity, occasionally) loci, depending on the parental materials. Any locus-type, whether OD+, PD + and HA+, could be the best genotype of a locus. All hybrids showed various numbers of better or best genotypes at many but not necessarily all loci, indicating further SPH improvement. Based on the additive/dominance QTL-allele matrices, the best hybrid genotype was predicted, and a hybrid improvement approach is suggested. PLSRGA is powerful for hybrid QTL-allele detection and cross-SPH improvement.
基金The National High Technology Research and Development Program of China (863 Program) (No2007AA01Z404)
文摘Based on the monitoring and discovery service 4 (MDS4) model, a monitoring model for a data grid which supports reliable storage and intrusion tolerance is designed. The load characteristics and indicators of computing resources in the monitoring model are analyzed. Then, a time-series autoregressive prediction model is devised. And an autoregressive support vector regression( ARSVR) monitoring method is put forward to predict the node load of the data grid. Finally, a model for historical observations sequences is set up using the autoregressive (AR) model and the model order is determined. The support vector regression(SVR) model is trained using historical data and the regression function is obtained. Simulation results show that the ARSVR method can effectively predict the node load.
基金National Natural Science Foundation of China(No.40927001)
文摘It is hard for the existing methods to obtain the expression of the system reliability for most of the practical complex systems with a large number of components and possible stales. A new regression algorithm based on the lower and upper bounds is presented in this paper, which can obtain the system reliability analytically without concerning the structure of the complex system. The method has been applied to a real system and the reliability results are compared with those acquired by the classical method and the parametric method. The effectiveness and accuracy of the proposed method have been testified.
基金National Natural Science Foundation of China(41475019,41631072)
文摘One-dimensional synthetic aperture microwave radiometers have higher spatial resolution and record measurements at multiple incidence angles.In this paper,we propose a multiple linear regression method to retrieve sea surface wind speed at an incidence angle between 0°65°.We assume that a one-dimensional synthetic aperture microwave radiometer operates at frequencies of 6.9,10.65,18.7,23.8 and 36.5 GHz.Then,the microwave radiative transfer forward model is used to simulate the measured brightness temperatures.The sensitivity of the brightness temperatures at 0°65°to the sea surface wind speed is calculated.Then,vertical polarization channels(VR),horizontal polarization channels(HR)and all channels(AR)are used to retrieve the sea surface wind speed via a multiple linear regression algorithm at 0°65°,and the relationship between the retrieval error and incidence angle is obtained.The results are as follows:(1)The sensitivity of the vertical polarization brightness temperature to the sea surface wind speed is smaller than that of the horizontal polarization.(2)The retrieval error increases with Gaussian noise.The retrieval error of VR first increases and then decreases with increasing incidence angle,the retrieval error of HR gradually decreases with increasing incidence angle,and the retrieval error of AR first decreases and then increases with increasing incidence angle.(3)The retrieval error of AR is the lowest and it is necessary to retrieve the sea surface wind speed at a larger incidence angle for AR.
文摘The application of low complexity and low order robust regression algorithm in channel estimation with 16QAM over fading channel for DS-CDMA is presented in this paper After initial channel estimation with classical methods, channel gains estimated are filtered by linear or conic regression algorithm within a given regression length Simulation results show that this method offers up to 0,3 dB gain in a DS-CDMA system. The length and order of regression algorithm are two key parameters, which affect the system performance significantly and the optimal values of which depend on the speed of mobile station. It is demonstrated that this improved method can track fading channel accurately and outperforms over classical methods substantially by selecting appropriate parameters of regression algorithm under a certain channel environment.
基金The Chinese Polar Environment Comprehensive Investigation and Assessment Programmes under contract No.2016-04-03the National Key Research and Development Program of China under contract No.2016YFC1402701
文摘The seasonal and inter-annual variations of Arctic cyclone are investigated. An automatic cyclone tracking algorithm developed by University of Reading was applied on the basis of European Center for Medium-range Weather Forecasts(ECMWF) ERA-interim mean sea level pressure field with 6 h interval for 34 a period. The maximum number of the Arctic cyclones is counted in winter, and the minimum is in spring not in summer.About 50% of Arctic cyclones in summer generated from south of 70°N, moving into the Arctic. The number of Arctic cyclones has large inter-annual and seasonal variabilities, but no significant linear trend is detected for the period 1979–2012. The spatial distribution and linear trends of the Arctic cyclones track density show that the cyclone activity extent is the widest in summer with significant increasing trend in CRU(central Russia)subregion, and the largest track density is in winter with decreasing trend in the same subregion. The linear regressions between the cyclone track density and large-scale indices for the same period and pre-period sea ice area indices show that Arctic cyclone activities are closely linked to large-scale atmospheric circulations, such as Arctic Oscillation(AO), North Atlantic Oscillation(NAO) and Pacific-North American Pattern(PNA). Moreover,the pre-period sea ice area is significantly associated with the cyclone activities in some regions.
基金the National Natural Science Foundation of China(No.81670091)the Zhongyuan Science and Technology Innovation Leading Talent Project(No.194200510).
文摘Background:Computed tomography images are easy to misjudge because of their complexity,especially images of solitary pulmonary nodules,of which diagnosis as benign or malignant is extremely important in lung cancer treatment.Therefore,there is an urgent need for a more effective strategy in lung cancer diagnosis.In our study,we aimed to externally validate and revise the Mayo model,and a new model was established.Methods:A total of 1450 patients from three centers with solitary pulmonary nodules who underwent surgery were included in the study and were divided into training,internal validation,and external validation sets(n=849,365,and 236,respectively).External verification and recalibration of the Mayo model and establishment of new logistic regression model were performed on the training set.Overall performance of each model was evaluated using area under receiver operating characteristic curve(AUC).Finally,the model validation was completed on the validation data set.Results:The AUC of the Mayo model on the training set was 0.653(95%confidence interval[CI]:0.613–0.694).After re-estimation of the coefficients of all covariates included in the original Mayo model,the revised Mayo model achieved an AUC of 0.671(95%CI:0.635–0.706).We then developed a new model that achieved a higher AUC of 0.891(95%CI:0.865–0.917).It had an AUC of 0.888(95%CI:0.842–0.934)on the internal validation set,which was significantly higher than that of the revised Mayo model(AUC:0.577,95%CI:0.509–0.646)and the Mayo model(AUC:0.609,95%CI,0.544–0.675)(P<0.001).The AUC of the new model was 0.876(95%CI:0.831–0.920)on the external verification set,which was higher than the corresponding value of the Mayo model(AUC:0.705,95%CI:0.639–0.772)and revised Mayo model(AUC:0.706,95%CI:0.640–0.772)(P<0.001).Then the prediction model was presented as a nomogram,which is easier to generalize.Conclusions:After external verification and recalibration of the Mayo model,the results show that they are not suitable for the prediction of malignant pulmonary nodules in the Chinese population.Therefore,a new model was established by a backward stepwise process.The new model was constructed to rapidly discriminate benign from malignant pulmonary nodules,which could achieve accurate diagnosis of potential patients with lung cancer.
文摘In this work, two chemometrics methods are applied for the modeling and prediction of electrophoretic mobilities of some organic and inorganic compounds. The successive projection algorithm, feature selection (SPA) strategy, is used as the descriptor selection and model development method. Then, the support vector machine (SVM) and multiple linear regression (MLR) model are utilized to construct the non-linear and linear quantitative structure-property relationship models. The results obtained using the SVM model are compared with those obtained using MLR reveal that the SVM model is of much better predictive value than the MLR one. The root-mean-square errors for the training set and the test set for the SVM model were 0.1911 and 0.2569, respectively, while by the MLR model, they were 0.4908 and 0.6494, respectively. The results show that the SVM model drastically enhances the ability of prediction in QSPR studies and is superior to the MLR model.
文摘Hot components operate in a high-temperature and high-pressure environment. The occurrence of a fault in hot components leads to high economic losses. In general, exhaust gas temperature(EGT) is used to monitor the performance of hot components.However, during the early stages of a failure, the fault information is weak, and is simultaneously affected by various types of interference, such as the complex working conditions, ambient conditions, gradual performance degradation of the compressors and turbines, and noise. Additionally, inadequate effective information of the gas turbine also restricts the establishment of the detection model. To solve the above problems, this paper proposes an anomaly detection method based on frequent pattern extraction. A frequent pattern model(FPM) is applied to indicate the inherent regularity of change in EGT occurring from different types of interference. In this study, based on a genetic algorithm and support vector machine regression, the relationship model between the EGT and interference was tentatively built. The modeling accuracy was then further improved through the selection of the kernel function and training data. Experiments indicate that the optimal kernel function is linear and that the optimal training data should be balanced in addition to covering the appropriate range of operating conditions and ambient temperature. Furthermore, the thresholds based on the Pauta criterion that is automatically obtained during the modeling process, are used to determine whether hot components are operating abnormally. Moreover, the FPM is compared with the similarity theory, which demonstrates that the FPM can better suppress the effect of the component performance degradation and fuel heat value fluctuation. Finally, the effectiveness of the proposed method is validated on seven months of actual data obtained from a Titan130 gas turbine on an offshore oil platform. The results indicate that the proposed method can sensitively detect malfunctions in hot components during the early stages of a fault, and is robust to various types of interference.
文摘Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis than those without. Therefore, p CR is now considered as the best prognosticator for patients with neoadjuvant chemotherapy. However, not all patients can benefit from this treatment. As a result, we need to find a way to predict what kind of patients can induce p CR. Various gene signatures of chemosensitivity in breast cancer have been identified, from which such predictors can be built. Nevertheless, many of them have their prediction accuracy around 80%. As such, identifying gene signatures that could be employed to build high accuracy predictors is a prerequisite for their clinical tests and applications. Furthermore, to elucidate the importance of each individual gene in a signature is another pressing need before such signature could be tested in clinical settings. In this study, Genetic Algorithm(GA) and Sparse Logistic Regression(SLR) along with t-test were employed to identify one signature. It had 28 probe sets selected by GA from the top 65 probe sets that were highly overexpressed between p CR and Residual Disease(RD) and was used to build an SLR predictor of p CR(SLR-28). This predictor tested on a training set(n = 81) and validation set(n = 52) had very precise predictions measured by accuracy,specificity, sensitivity, positive predictive value, and negative predictive value with their corresponding P value all zero. Furthermore, this predictor discovered 12 important genes in the 28 probe set signature. Our findings also demonstrated that the most discriminative genes measured by SLR as a group selected by GA were not necessarily those with the smallest P values by t-test as individual genes, highlighting the ability of GA to capture the interacting genes in p CR prediction as multivariate techniques. Our gene signature produced superior performance over a signature found in one previous study with prediction accuracy 92% vs 76%, demonstrating the potential of GA and SLR in identifying robust gene signatures in chemo response prediction in breast cancer.