Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the...Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.展开更多
In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
The soil water status was investigated under soil surface mulching techniques and two drip line depths from the soil surface(DL).These techniques were black plastic film(BPF),palm tree waste(PTW),and no mulching(NM)as...The soil water status was investigated under soil surface mulching techniques and two drip line depths from the soil surface(DL).These techniques were black plastic film(BPF),palm tree waste(PTW),and no mulching(NM)as the control treatment.The DL were 15 cm and 25 cm,with surface drip irrigation used as the control.The results indicated that both the BPF and PTW mulching enhanced the soil water retention capacity and there was about 6%water saving in subsurface drip irrigation,compared with NM.Furthermore,the water savings at a DL of 25 cm were lower(15-20 mm)than those at a DL of 15 cm(19-24 mm),whereas surface drip irrigation consumed more water.The distribution of soil water content(θv)for BPF and PTW were more useful than for NM.Hence,mulching the soil with PTW is recommended due to the lower costs and using a DL of 15 cm.Theθv values were derived using multiple linear regression(MLR)and multiple nonlinear regression(MNLR)models.Multiple regression analysis revealed the superiority of the MLR over the MNLR model,which in the training and testing processes had coefficients of correlation of 0.86 and 0.88,root mean square errors of 0.37 and 0.35,and indices of agreement of 0.99 and 0.93,respectively,over the MNLR model.Moreover,DL and spacing from the drip line had a significant effect on the estimation of θv.展开更多
The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest touris...The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest tourist destinations.Stock market values are responding to the evolution of the pandemic,especially in the case of tourist companies.Therefore,being able to quantify this relationship allows us to predict the effect of the pandemic on shares in the tourism sector,thereby improving the response to the crisis by policymakers and investors.Accordingly,a dynamic regression model was developed to predict the behavior of shares in the Spanish tourism sector according to the evolution of the COVID-19 pandemic in the medium term.It has been confirmed that both the number of deaths and cases are good predictors of abnormal stock prices in the tourism sector.展开更多
Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a n...Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.展开更多
Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two...Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two-part study, an ML approach is presented that offers accelerated digital design of Mg alloys. A systematic evaluation of four ML regression algorithms was explored to rationalise the complex relationships in Mg-alloy data and to capture the composition-processing-property patterns. Cross-validation and hold-out set validation techniques were utilised for unbiased estimation of model performance. Using atomic and thermodynamic properties of the alloys, feature augmentation was examined to define the most descriptive representation spaces for the alloy data. Additionally, a graphical user interface(GUI) webtool was developed to facilitate the use of the proposed models in predicting the mechanical properties of new Mg alloys. The results demonstrate that random forest regression model and neural network are robust models for predicting the ultimate tensile strength and ductility of Mg alloys, with accuracies of ~80% and 70% respectively. The developed models in this work are a step towards high-throughput screening of novel candidates for target mechanical properties and provide ML-guided alloy design.展开更多
In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which us...In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which uses fuzzy theory to describe the uncertainty in big data sets and uses quantum computing to exponentially improve the efficiency of data set preprocessing and parameter estimation.In this paper,data envelopment analysis(DEA)is used to calculate the degree of importance of each data point.Meanwhile,Harrow,Hassidim and Lloyd(HHL)algorithm and quantum swap circuits are used to improve the efficiency of high-dimensional data matrix calculation.The application of the quantum fuzzy regression model to smallscale financial data proves that its accuracy is greatly improved compared with the quantum regression model.Moreover,due to the introduction of quantum computing,the speed of dealing with high-dimensional data matrix has an exponential improvement compared with the fuzzy regression model.The quantum fuzzy regression model proposed in this paper combines the advantages of fuzzy theory and quantum computing which can efficiently calculate high-dimensional data matrix and complete parameter estimation using quantum computing while retaining the uncertainty in big data.Thus,it is a new model for efficient and accurate big data processing in uncertain environments.展开更多
The present paper proposes a new robust estimator for Poisson regression models. We used the weighted maximum likelihood estimators which are regarded as Mallows-type estimators. We perform a Monte Carlo simulation st...The present paper proposes a new robust estimator for Poisson regression models. We used the weighted maximum likelihood estimators which are regarded as Mallows-type estimators. We perform a Monte Carlo simulation study to assess the performance of a suggested estimator compared to the maximum likelihood estimator and some robust methods. The result shows that, in general, all robust methods in this paper perform better than the classical maximum likelihood estimators when the model contains outliers. The proposed estimators showed the best performance compared to other robust estimators.展开更多
The aim of this study was to model the Undrained Shear Strength (USS) of soil found in the coastal region of the Niger Delta in Nigeria with some soil properties. The undrained shear strength (USS) is a key parameter ...The aim of this study was to model the Undrained Shear Strength (USS) of soil found in the coastal region of the Niger Delta in Nigeria with some soil properties. The undrained shear strength (USS) is a key parameter needed for most geotechnical/structural designs. Accurate determination of the USS of soft clays can be challenging to obtain in the laboratory due to the difficulty in remoulding the clay to its in-situ conditions before testing and more accurate test such as Cone Penetration test (CPT) can be quite expensive. This study was carried out at Escravos site which is located in Delta state, Nigeria. Three Boreholes were drilled and soil samples were collected at 0.75 m intervals up to a depth of 45 m. Laboratory tests were used to obtain the moisture content, bulk unit weight, liquid and plastic limit, while CPT was used in obtaining the undrained shear strength. Classification of the soil samples was done by adopting the Unified Soil Classification System and various models relating the USS with the soil properties were developed. The result showed that most of the soils at Escravos site were predominately inorganic clay of high plasticity which are problematic due to the expansion and shrinking nature of this type of soil. The model developed showed that the soil properties that gave the best fit with the USS were the moisture content and effective stress of the soil. The coefficient of determination (R<sup>2</sup>) and the root mean square error (RMSE) obtained for this model were 0.805 and 6.37 KN/m<sup>2</sup>, respectively.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
The analysis of numerous experimental equations published in the literature reveals awide scatter in the predictions for the static recrystallization kinetics of steels. Thepowers of the deformation variables, strain ...The analysis of numerous experimental equations published in the literature reveals awide scatter in the predictions for the static recrystallization kinetics of steels. Thepowers of the deformation variables, strain and strain rate, similarly as the powerof the grain size vary in these equations. These differences are highlighted and thetypical values are compared between torsion and compression tests. Potential errorsin physical simulation testing are discussed.展开更多
The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecologic...The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecological environment changes and even global changes.Based on field measurements,combined with Linear Regression(LR)model and Inverse Distance Weighing(IDW)method,this paper presents detailed analysis on the change history and trend of the shoreline and tidal flat in Bohai Bay.The shoreline faces a high erosion chance under the action of natural factors,while the tidal flat faces a different erosion and deposition patterns in Bohai Bay due to the impact of human activities.The implication of change rule for ecological protection and recovery is also discussed.Measures should be taken to protect the coastal ecological environment.The models used in this paper show a high correlation coefficient between observed and modeling data,which means that this method can be used to predict the changing trend of shoreline and tidal flat.The research results of present study can provide scientific supports for future coastal protection and management.展开更多
Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have dev...Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have developed from the logistic regression model,the geographical weighted logistic regression model,the Lasso regression model,the random forest model,and the support vector machine model based on historical forest fire data from 2000 to 2019 in Jilin Province.The models,along with a distribution map are presented in this paper to provide a theoretical basis for forest fire management in this area.Existing studies show that the prediction accuracies of the two machine learning models are higher than those of the three generalized linear regression models.The accuracies of the random forest model,the support vector machine model,geographical weighted logistic regression model,the Lasso regression model,and logistic model were 88.7%,87.7%,86.0%,85.0%and 84.6%,respectively.Weather is the main factor affecting forest fires,while the impacts of topography factors,human and social-economic factors on fire occurrence were similar.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Literature review indicates that most studies on pavement management have been on reconstruction and rehabilitation, but not on maintenance;this includes routine, corrective and preventive maintenance. This study deve...Literature review indicates that most studies on pavement management have been on reconstruction and rehabilitation, but not on maintenance;this includes routine, corrective and preventive maintenance. This study developed linear regression models to estimate the total maintenance cost and component costs for labor, materials, equipment, and stockpile. The data used in the model development were extracted from the pavement and maintenance management systems of the Nevada Department of Transportation (NDOT). The life cycle maintenance strategies adopted by NDOT for five maintenance prioritization categories were used as the basis for developing the regression models of this study. These regression models are specified for each stage of life-cycle maintenance strategies. The models indicate that age, traffic flow, elevation, type of maintenance, maintenance schedule, life cycle stage, and the districts where maintenances are performed all are important factors that influence the magnitude of the costs. Because these models have embedded the road conditions into the life-cycle stage and type of maintenance performed, they can be easily integrated into existing pavement management systems for implementation.展开更多
In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calcula...In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.展开更多
In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the...In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.展开更多
This paper studies how the price movements of pork,chicken and egg respond to those of related cost factors in short terms in Chinese market.We employ a linear quantile approach not only to explore potential data hete...This paper studies how the price movements of pork,chicken and egg respond to those of related cost factors in short terms in Chinese market.We employ a linear quantile approach not only to explore potential data heteroscedasticity but also to generate confidence bands for the purpose of price stability study.We then evaluate our models by comparing the prediction intervals generated from the quantile regression models with in-sample and out-of-sample forecasts.Using monthly data from January 2000 to October 2010,we observed these findings:(i) the price changes of cost factors asymmetrically and unequally influence those of the livestock across different quantiles;(ii) the performance of our models is robust and consistent for both in-sample and out-of-sample forecasts;(iii) the confidence intervals generated from 0.05th and 0.95th quantile regression models are good methods to forecast livestock price fluctuation.展开更多
Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used ...Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used tool to build prediction models in swine nutrition,while the artificial neural networks(ANN)model is reported to be more accurate than MR model in prediction performance.Therefore,the potential of ANN models in predicting the growth performance of pigs was evaluated and compared with MR models in this study.Results:Body weight(BW),net energy(NE)intake,standardized ileal digestible lysine(SID Lys)intake,and their quadratic terms were selected as input variables to predict ADG and F/G among 10 candidate variables.In the training phase,MR models showed high accuracy in both ADG and F/G prediction(R^(2)_(ADG)=0.929,R^(2)_(F/G)=0.886)while ANN models with 4,6 neurons and radial basis activation function yielded the best performance in ADG and F/G prediction(R^(2)_(ADG)=0.964,R^(2)_(F/G)=0.932).In the testing phase,these ANN models showed better accuracy in ADG prediction(CCC:0.976 vs.0.861,R^(2):0.951 vs.0.584),and F/G prediction(CCC:0.952 vs.0.900,R^(2):0.905 vs.0.821)compared with the MR models.Meanwhile,the“over-fitting”occurred in MR models but not in ANN models.On validation data from the animal trial,ANN models exhibited superiority over MR models in both ADG and F/G prediction(P<0.01).Moreover,the growth stages have a significant effect on the prediction accuracy of the models.Conclusion:Body weight,NE intake and SID Lys intake can be used as input variables to predict the growth performance of growing-finishing pigs,with trained ANN models are more flexible and accurate than MR models.Therefore,it is promising to use ANN models in related swine nutrition studies in the future.展开更多
In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the l...In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the local linear technique and the averaged method,the initial estimates of the coefficient functions are given.Second step,based on the initial estimates,the efficient estimates of the coefficient functions are proposed by a one-step back-fitting procedure.The efficient estimators share the same asymptotic normalities as the local linear estimators for the functional-coefficient models with a single smoothing variable in different functions.Two simulated examples show that the procedure is effective.展开更多
文摘Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group(No.RG-1440-022).
文摘The soil water status was investigated under soil surface mulching techniques and two drip line depths from the soil surface(DL).These techniques were black plastic film(BPF),palm tree waste(PTW),and no mulching(NM)as the control treatment.The DL were 15 cm and 25 cm,with surface drip irrigation used as the control.The results indicated that both the BPF and PTW mulching enhanced the soil water retention capacity and there was about 6%water saving in subsurface drip irrigation,compared with NM.Furthermore,the water savings at a DL of 25 cm were lower(15-20 mm)than those at a DL of 15 cm(19-24 mm),whereas surface drip irrigation consumed more water.The distribution of soil water content(θv)for BPF and PTW were more useful than for NM.Hence,mulching the soil with PTW is recommended due to the lower costs and using a DL of 15 cm.Theθv values were derived using multiple linear regression(MLR)and multiple nonlinear regression(MNLR)models.Multiple regression analysis revealed the superiority of the MLR over the MNLR model,which in the training and testing processes had coefficients of correlation of 0.86 and 0.88,root mean square errors of 0.37 and 0.35,and indices of agreement of 0.99 and 0.93,respectively,over the MNLR model.Moreover,DL and spacing from the drip line had a significant effect on the estimation of θv.
文摘The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest tourist destinations.Stock market values are responding to the evolution of the pandemic,especially in the case of tourist companies.Therefore,being able to quantify this relationship allows us to predict the effect of the pandemic on shares in the tourism sector,thereby improving the response to the crisis by policymakers and investors.Accordingly,a dynamic regression model was developed to predict the behavior of shares in the Spanish tourism sector according to the evolution of the COVID-19 pandemic in the medium term.It has been confirmed that both the number of deaths and cases are good predictors of abnormal stock prices in the tourism sector.
基金supported by National Natural Science Foundation of China (61703410,61873175,62073336,61873273,61773386,61922089)。
文摘Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.
基金the support of the Monash-IITB Academy Scholarshipthe Australian Research Council for funding the present research (DP190103592)。
文摘Machine learning(ML) models provide great opportunities to accelerate novel material development, offering a virtual alternative to laborious and resource-intensive empirical methods. In this work, the second of a two-part study, an ML approach is presented that offers accelerated digital design of Mg alloys. A systematic evaluation of four ML regression algorithms was explored to rationalise the complex relationships in Mg-alloy data and to capture the composition-processing-property patterns. Cross-validation and hold-out set validation techniques were utilised for unbiased estimation of model performance. Using atomic and thermodynamic properties of the alloys, feature augmentation was examined to define the most descriptive representation spaces for the alloy data. Additionally, a graphical user interface(GUI) webtool was developed to facilitate the use of the proposed models in predicting the mechanical properties of new Mg alloys. The results demonstrate that random forest regression model and neural network are robust models for predicting the ultimate tensile strength and ductility of Mg alloys, with accuracies of ~80% and 70% respectively. The developed models in this work are a step towards high-throughput screening of novel candidates for target mechanical properties and provide ML-guided alloy design.
基金This work is supported by the NationalNatural Science Foundation of China(No.62076042)the Key Research and Development Project of Sichuan Province(Nos.2021YFSY0012,2020YFG0307,2021YFG0332)+3 种基金the Science and Technology Innovation Project of Sichuan(No.2020017)the Key Research and Development Project of Chengdu(No.2019-YF05-02028-GX)the Innovation Team of Quantum Security Communication of Sichuan Province(No.17TD0009)the Academic and Technical Leaders Training Funding Support Projects of Sichuan Province(No.2016120080102643).
文摘In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which uses fuzzy theory to describe the uncertainty in big data sets and uses quantum computing to exponentially improve the efficiency of data set preprocessing and parameter estimation.In this paper,data envelopment analysis(DEA)is used to calculate the degree of importance of each data point.Meanwhile,Harrow,Hassidim and Lloyd(HHL)algorithm and quantum swap circuits are used to improve the efficiency of high-dimensional data matrix calculation.The application of the quantum fuzzy regression model to smallscale financial data proves that its accuracy is greatly improved compared with the quantum regression model.Moreover,due to the introduction of quantum computing,the speed of dealing with high-dimensional data matrix has an exponential improvement compared with the fuzzy regression model.The quantum fuzzy regression model proposed in this paper combines the advantages of fuzzy theory and quantum computing which can efficiently calculate high-dimensional data matrix and complete parameter estimation using quantum computing while retaining the uncertainty in big data.Thus,it is a new model for efficient and accurate big data processing in uncertain environments.
文摘The present paper proposes a new robust estimator for Poisson regression models. We used the weighted maximum likelihood estimators which are regarded as Mallows-type estimators. We perform a Monte Carlo simulation study to assess the performance of a suggested estimator compared to the maximum likelihood estimator and some robust methods. The result shows that, in general, all robust methods in this paper perform better than the classical maximum likelihood estimators when the model contains outliers. The proposed estimators showed the best performance compared to other robust estimators.
文摘The aim of this study was to model the Undrained Shear Strength (USS) of soil found in the coastal region of the Niger Delta in Nigeria with some soil properties. The undrained shear strength (USS) is a key parameter needed for most geotechnical/structural designs. Accurate determination of the USS of soft clays can be challenging to obtain in the laboratory due to the difficulty in remoulding the clay to its in-situ conditions before testing and more accurate test such as Cone Penetration test (CPT) can be quite expensive. This study was carried out at Escravos site which is located in Delta state, Nigeria. Three Boreholes were drilled and soil samples were collected at 0.75 m intervals up to a depth of 45 m. Laboratory tests were used to obtain the moisture content, bulk unit weight, liquid and plastic limit, while CPT was used in obtaining the undrained shear strength. Classification of the soil samples was done by adopting the Unified Soil Classification System and various models relating the USS with the soil properties were developed. The result showed that most of the soils at Escravos site were predominately inorganic clay of high plasticity which are problematic due to the expansion and shrinking nature of this type of soil. The model developed showed that the soil properties that gave the best fit with the USS were the moisture content and effective stress of the soil. The coefficient of determination (R<sup>2</sup>) and the root mean square error (RMSE) obtained for this model were 0.805 and 6.37 KN/m<sup>2</sup>, respectively.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
文摘The analysis of numerous experimental equations published in the literature reveals awide scatter in the predictions for the static recrystallization kinetics of steels. Thepowers of the deformation variables, strain and strain rate, similarly as the powerof the grain size vary in these equations. These differences are highlighted and thetypical values are compared between torsion and compression tests. Potential errorsin physical simulation testing are discussed.
基金supported by the National Natural Science Foundation of China (41602205, 42293261)the China Geological Survey Program (DD20189506, DD20211301)+2 种基金the Special Investigation Project on Science and Technology Basic Resources of the Ministry of Science and Technology (2021FY101003)the Central Guidance for Local Scientific and Technological Development Fund of 2023the Project of Hebei University of Environmental Engineering (GCY202301)
文摘The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecological environment changes and even global changes.Based on field measurements,combined with Linear Regression(LR)model and Inverse Distance Weighing(IDW)method,this paper presents detailed analysis on the change history and trend of the shoreline and tidal flat in Bohai Bay.The shoreline faces a high erosion chance under the action of natural factors,while the tidal flat faces a different erosion and deposition patterns in Bohai Bay due to the impact of human activities.The implication of change rule for ecological protection and recovery is also discussed.Measures should be taken to protect the coastal ecological environment.The models used in this paper show a high correlation coefficient between observed and modeling data,which means that this method can be used to predict the changing trend of shoreline and tidal flat.The research results of present study can provide scientific supports for future coastal protection and management.
基金This research was funded by the National Natural Science Foundation of China(grant no.32271881).
文摘Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have developed from the logistic regression model,the geographical weighted logistic regression model,the Lasso regression model,the random forest model,and the support vector machine model based on historical forest fire data from 2000 to 2019 in Jilin Province.The models,along with a distribution map are presented in this paper to provide a theoretical basis for forest fire management in this area.Existing studies show that the prediction accuracies of the two machine learning models are higher than those of the three generalized linear regression models.The accuracies of the random forest model,the support vector machine model,geographical weighted logistic regression model,the Lasso regression model,and logistic model were 88.7%,87.7%,86.0%,85.0%and 84.6%,respectively.Weather is the main factor affecting forest fires,while the impacts of topography factors,human and social-economic factors on fire occurrence were similar.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘Literature review indicates that most studies on pavement management have been on reconstruction and rehabilitation, but not on maintenance;this includes routine, corrective and preventive maintenance. This study developed linear regression models to estimate the total maintenance cost and component costs for labor, materials, equipment, and stockpile. The data used in the model development were extracted from the pavement and maintenance management systems of the Nevada Department of Transportation (NDOT). The life cycle maintenance strategies adopted by NDOT for five maintenance prioritization categories were used as the basis for developing the regression models of this study. These regression models are specified for each stage of life-cycle maintenance strategies. The models indicate that age, traffic flow, elevation, type of maintenance, maintenance schedule, life cycle stage, and the districts where maintenances are performed all are important factors that influence the magnitude of the costs. Because these models have embedded the road conditions into the life-cycle stage and type of maintenance performed, they can be easily integrated into existing pavement management systems for implementation.
基金Supported by the Natural Science Foundation of Anhui Education Committee
文摘In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.
基金Project supported by the National Natural Science Foundation of China (Grant No 60573065)the Natural Science Foundation of Shandong Province,China (Grant No Y2007G33)the Key Subject Research Foundation of Shandong Province,China(Grant No XTD0708)
文摘In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.
基金supported by the Key Project of National Key Technology R&D Program of China(2009BADA9B01)
文摘This paper studies how the price movements of pork,chicken and egg respond to those of related cost factors in short terms in Chinese market.We employ a linear quantile approach not only to explore potential data heteroscedasticity but also to generate confidence bands for the purpose of price stability study.We then evaluate our models by comparing the prediction intervals generated from the quantile regression models with in-sample and out-of-sample forecasts.Using monthly data from January 2000 to October 2010,we observed these findings:(i) the price changes of cost factors asymmetrically and unequally influence those of the livestock across different quantiles;(ii) the performance of our models is robust and consistent for both in-sample and out-of-sample forecasts;(iii) the confidence intervals generated from 0.05th and 0.95th quantile regression models are good methods to forecast livestock price fluctuation.
基金funded by the National Natural Science Foundation of China(32072764, 31702121)the 2115 Talent Development Program of China Agricultural UniversityNational Key Research and Development Program of China (2019YFD1002605)
文摘Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used tool to build prediction models in swine nutrition,while the artificial neural networks(ANN)model is reported to be more accurate than MR model in prediction performance.Therefore,the potential of ANN models in predicting the growth performance of pigs was evaluated and compared with MR models in this study.Results:Body weight(BW),net energy(NE)intake,standardized ileal digestible lysine(SID Lys)intake,and their quadratic terms were selected as input variables to predict ADG and F/G among 10 candidate variables.In the training phase,MR models showed high accuracy in both ADG and F/G prediction(R^(2)_(ADG)=0.929,R^(2)_(F/G)=0.886)while ANN models with 4,6 neurons and radial basis activation function yielded the best performance in ADG and F/G prediction(R^(2)_(ADG)=0.964,R^(2)_(F/G)=0.932).In the testing phase,these ANN models showed better accuracy in ADG prediction(CCC:0.976 vs.0.861,R^(2):0.951 vs.0.584),and F/G prediction(CCC:0.952 vs.0.900,R^(2):0.905 vs.0.821)compared with the MR models.Meanwhile,the“over-fitting”occurred in MR models but not in ANN models.On validation data from the animal trial,ANN models exhibited superiority over MR models in both ADG and F/G prediction(P<0.01).Moreover,the growth stages have a significant effect on the prediction accuracy of the models.Conclusion:Body weight,NE intake and SID Lys intake can be used as input variables to predict the growth performance of growing-finishing pigs,with trained ANN models are more flexible and accurate than MR models.Therefore,it is promising to use ANN models in related swine nutrition studies in the future.
文摘In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the local linear technique and the averaged method,the initial estimates of the coefficient functions are given.Second step,based on the initial estimates,the efficient estimates of the coefficient functions are proposed by a one-step back-fitting procedure.The efficient estimators share the same asymptotic normalities as the local linear estimators for the functional-coefficient models with a single smoothing variable in different functions.Two simulated examples show that the procedure is effective.