Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale c...BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale cannot be fully understood due to lack of information.AIM To identify key factors that may explain the variability in case lethality across countries.METHODS We identified 21 Potential risk factors for coronavirus disease 2019(COVID-19)case fatality rate for all the countries with available data.We examined univariate relationships of each variable with case fatality rate(CFR),and all independent variables to identify candidate variables for our final multiple model.Multiple regression analysis technique was used to assess the strength of relationship.RESULTS The mean of COVID-19 mortality was 1.52±1.72%.There was a statistically significant inverse correlation between health expenditure,and number of computed tomography scanners per 1 million with CFR,and significant direct correlation was found between literacy,and air pollution with CFR.This final model can predict approximately 97%of the changes in CFR.CONCLUSION The current study recommends some new predictors explaining affect mortality rate.Thus,it could help decision-makers develop health policies to fight COVID-19.展开更多
In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), ob...In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.展开更多
In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
This paper considers the convergence rates for nonparametric estimators of the error distribution in semi-parametric regression models. By establishing some general laws of the iterated logarithm, it shows that the ra...This paper considers the convergence rates for nonparametric estimators of the error distribution in semi-parametric regression models. By establishing some general laws of the iterated logarithm, it shows that the rates of convergence of either the empirical distribution or a smoothed version of the empirical distribution function matches exactly the rates obtained for an independent sample from the error distribution.展开更多
The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest touris...The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest tourist destinations.Stock market values are responding to the evolution of the pandemic,especially in the case of tourist companies.Therefore,being able to quantify this relationship allows us to predict the effect of the pandemic on shares in the tourism sector,thereby improving the response to the crisis by policymakers and investors.Accordingly,a dynamic regression model was developed to predict the behavior of shares in the Spanish tourism sector according to the evolution of the COVID-19 pandemic in the medium term.It has been confirmed that both the number of deaths and cases are good predictors of abnormal stock prices in the tourism sector.展开更多
Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a n...Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.展开更多
In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which us...In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which uses fuzzy theory to describe the uncertainty in big data sets and uses quantum computing to exponentially improve the efficiency of data set preprocessing and parameter estimation.In this paper,data envelopment analysis(DEA)is used to calculate the degree of importance of each data point.Meanwhile,Harrow,Hassidim and Lloyd(HHL)algorithm and quantum swap circuits are used to improve the efficiency of high-dimensional data matrix calculation.The application of the quantum fuzzy regression model to smallscale financial data proves that its accuracy is greatly improved compared with the quantum regression model.Moreover,due to the introduction of quantum computing,the speed of dealing with high-dimensional data matrix has an exponential improvement compared with the fuzzy regression model.The quantum fuzzy regression model proposed in this paper combines the advantages of fuzzy theory and quantum computing which can efficiently calculate high-dimensional data matrix and complete parameter estimation using quantum computing while retaining the uncertainty in big data.Thus,it is a new model for efficient and accurate big data processing in uncertain environments.展开更多
Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the...Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.展开更多
Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urg...Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate combining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID-19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and accuracy improvement from 2020 to 2023. Most importantly, we provide new advanced pathways which may serve as targets for developing new solutions and approaches.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,w...This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.展开更多
In this paper,based on the theory of parameter estimation,we give a selection method and,in a sense or a good character of the parameter estimation,we think that it is very reasonable. Moreover,we offer a calculation ...In this paper,based on the theory of parameter estimation,we give a selection method and,in a sense or a good character of the parameter estimation,we think that it is very reasonable. Moreover,we offer a calculation method of selection statistic and an applied example.展开更多
Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump poi...Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.展开更多
A geometric framcwork is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kin...A geometric framcwork is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures, The results obtained by Hamilton et al. (1982), Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.展开更多
Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used ...Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used tool to build prediction models in swine nutrition,while the artificial neural networks(ANN)model is reported to be more accurate than MR model in prediction performance.Therefore,the potential of ANN models in predicting the growth performance of pigs was evaluated and compared with MR models in this study.Results:Body weight(BW),net energy(NE)intake,standardized ileal digestible lysine(SID Lys)intake,and their quadratic terms were selected as input variables to predict ADG and F/G among 10 candidate variables.In the training phase,MR models showed high accuracy in both ADG and F/G prediction(R^(2)_(ADG)=0.929,R^(2)_(F/G)=0.886)while ANN models with 4,6 neurons and radial basis activation function yielded the best performance in ADG and F/G prediction(R^(2)_(ADG)=0.964,R^(2)_(F/G)=0.932).In the testing phase,these ANN models showed better accuracy in ADG prediction(CCC:0.976 vs.0.861,R^(2):0.951 vs.0.584),and F/G prediction(CCC:0.952 vs.0.900,R^(2):0.905 vs.0.821)compared with the MR models.Meanwhile,the“over-fitting”occurred in MR models but not in ANN models.On validation data from the animal trial,ANN models exhibited superiority over MR models in both ADG and F/G prediction(P<0.01).Moreover,the growth stages have a significant effect on the prediction accuracy of the models.Conclusion:Body weight,NE intake and SID Lys intake can be used as input variables to predict the growth performance of growing-finishing pigs,with trained ANN models are more flexible and accurate than MR models.Therefore,it is promising to use ANN models in related swine nutrition studies in the future.展开更多
Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and...Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and outcome by adding control variables. However, this approach may not produce reliable estimates of causal effects. In addition to the shortcomings of the method, this lack of confidence is mainly related to ambiguous formulations in econometrics, such as the definition of selection bias, selection of core control variables, and method of testing for robustness. Within the framework of the causal models, we clarify the assumption of causal inference using regression-based statistical controls, as described in econometrics, and discuss how to select core control variables to satisfy this assumption and conduct robustness tests for regression estimates.展开更多
Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models...Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models,but still selection of suitable transformation of the independent variables in a regression model is diffcult.In this paper,a genetic algorithm(GA)has been employed as a heuristic search method for selection of best transformation of the independent variables(some index properties of rocks)in regression models for prediction of uniaxial compressive strength(UCS)and modulus of elasticity(E).Firstly,multiple linear regression(MLR)analysis was performed on a data set to establish predictive models.Then,two GA models were developed in which root mean squared error(RMSE)was defned as ftness function.Results have shown that GA models are more precise than MLR models and are able to explain the relation between the intrinsic strength/elasticity properties and index properties of rocks by simple formulation and accepted accuracy.展开更多
The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When thi...The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.展开更多
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
文摘BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale cannot be fully understood due to lack of information.AIM To identify key factors that may explain the variability in case lethality across countries.METHODS We identified 21 Potential risk factors for coronavirus disease 2019(COVID-19)case fatality rate for all the countries with available data.We examined univariate relationships of each variable with case fatality rate(CFR),and all independent variables to identify candidate variables for our final multiple model.Multiple regression analysis technique was used to assess the strength of relationship.RESULTS The mean of COVID-19 mortality was 1.52±1.72%.There was a statistically significant inverse correlation between health expenditure,and number of computed tomography scanners per 1 million with CFR,and significant direct correlation was found between literacy,and air pollution with CFR.This final model can predict approximately 97%of the changes in CFR.CONCLUSION The current study recommends some new predictors explaining affect mortality rate.Thus,it could help decision-makers develop health policies to fight COVID-19.
文摘In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
基金supported by the National Science Foundation of China under Grant Nos.11201422,11301481,and 11371321Zhejiang Provincial Natural Science Foundation of China under Grant Nos.Y6110639,Y6110110,LQ12A01018,and LQ12A01017+2 种基金the National Statistical Science Research Project of China under Grant No.2012LY174Foundation for Young Talents of ZJGSU under Grant No.1020XJ1314019Zhejiang Provincial Key Research Base for Humanities and Social Science Research(Statistics)
文摘This paper considers the convergence rates for nonparametric estimators of the error distribution in semi-parametric regression models. By establishing some general laws of the iterated logarithm, it shows that the rates of convergence of either the empirical distribution or a smoothed version of the empirical distribution function matches exactly the rates obtained for an independent sample from the error distribution.
文摘The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest tourist destinations.Stock market values are responding to the evolution of the pandemic,especially in the case of tourist companies.Therefore,being able to quantify this relationship allows us to predict the effect of the pandemic on shares in the tourism sector,thereby improving the response to the crisis by policymakers and investors.Accordingly,a dynamic regression model was developed to predict the behavior of shares in the Spanish tourism sector according to the evolution of the COVID-19 pandemic in the medium term.It has been confirmed that both the number of deaths and cases are good predictors of abnormal stock prices in the tourism sector.
基金supported by National Natural Science Foundation of China (61703410,61873175,62073336,61873273,61773386,61922089)。
文摘Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.
基金This work is supported by the NationalNatural Science Foundation of China(No.62076042)the Key Research and Development Project of Sichuan Province(Nos.2021YFSY0012,2020YFG0307,2021YFG0332)+3 种基金the Science and Technology Innovation Project of Sichuan(No.2020017)the Key Research and Development Project of Chengdu(No.2019-YF05-02028-GX)the Innovation Team of Quantum Security Communication of Sichuan Province(No.17TD0009)the Academic and Technical Leaders Training Funding Support Projects of Sichuan Province(No.2016120080102643).
文摘In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which uses fuzzy theory to describe the uncertainty in big data sets and uses quantum computing to exponentially improve the efficiency of data set preprocessing and parameter estimation.In this paper,data envelopment analysis(DEA)is used to calculate the degree of importance of each data point.Meanwhile,Harrow,Hassidim and Lloyd(HHL)algorithm and quantum swap circuits are used to improve the efficiency of high-dimensional data matrix calculation.The application of the quantum fuzzy regression model to smallscale financial data proves that its accuracy is greatly improved compared with the quantum regression model.Moreover,due to the introduction of quantum computing,the speed of dealing with high-dimensional data matrix has an exponential improvement compared with the fuzzy regression model.The quantum fuzzy regression model proposed in this paper combines the advantages of fuzzy theory and quantum computing which can efficiently calculate high-dimensional data matrix and complete parameter estimation using quantum computing while retaining the uncertainty in big data.Thus,it is a new model for efficient and accurate big data processing in uncertain environments.
文摘Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.
文摘Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate combining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID-19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and accuracy improvement from 2020 to 2023. Most importantly, we provide new advanced pathways which may serve as targets for developing new solutions and approaches.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
文摘This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.
基金Supported by the Natural Science Foundation of Anhui Education Committee
文摘In this paper,based on the theory of parameter estimation,we give a selection method and,in a sense or a good character of the parameter estimation,we think that it is very reasonable. Moreover,we offer a calculation method of selection statistic and an applied example.
文摘Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.
文摘A geometric framcwork is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures, The results obtained by Hamilton et al. (1982), Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.
基金funded by the National Natural Science Foundation of China(32072764, 31702121)the 2115 Talent Development Program of China Agricultural UniversityNational Key Research and Development Program of China (2019YFD1002605)
文摘Backgrounds:Evaluating the growth performance of pigs in real-time is laborious and expensive,thus mathematical models based on easily accessible variables are developed.Multiple regression(MR)is the most widely used tool to build prediction models in swine nutrition,while the artificial neural networks(ANN)model is reported to be more accurate than MR model in prediction performance.Therefore,the potential of ANN models in predicting the growth performance of pigs was evaluated and compared with MR models in this study.Results:Body weight(BW),net energy(NE)intake,standardized ileal digestible lysine(SID Lys)intake,and their quadratic terms were selected as input variables to predict ADG and F/G among 10 candidate variables.In the training phase,MR models showed high accuracy in both ADG and F/G prediction(R^(2)_(ADG)=0.929,R^(2)_(F/G)=0.886)while ANN models with 4,6 neurons and radial basis activation function yielded the best performance in ADG and F/G prediction(R^(2)_(ADG)=0.964,R^(2)_(F/G)=0.932).In the testing phase,these ANN models showed better accuracy in ADG prediction(CCC:0.976 vs.0.861,R^(2):0.951 vs.0.584),and F/G prediction(CCC:0.952 vs.0.900,R^(2):0.905 vs.0.821)compared with the MR models.Meanwhile,the“over-fitting”occurred in MR models but not in ANN models.On validation data from the animal trial,ANN models exhibited superiority over MR models in both ADG and F/G prediction(P<0.01).Moreover,the growth stages have a significant effect on the prediction accuracy of the models.Conclusion:Body weight,NE intake and SID Lys intake can be used as input variables to predict the growth performance of growing-finishing pigs,with trained ANN models are more flexible and accurate than MR models.Therefore,it is promising to use ANN models in related swine nutrition studies in the future.
基金This research was funded by the National Natural Science Foundation of China(Grant No.72074060).
文摘Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and outcome by adding control variables. However, this approach may not produce reliable estimates of causal effects. In addition to the shortcomings of the method, this lack of confidence is mainly related to ambiguous formulations in econometrics, such as the definition of selection bias, selection of core control variables, and method of testing for robustness. Within the framework of the causal models, we clarify the assumption of causal inference using regression-based statistical controls, as described in econometrics, and discuss how to select core control variables to satisfy this assumption and conduct robustness tests for regression estimates.
文摘Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models,but still selection of suitable transformation of the independent variables in a regression model is diffcult.In this paper,a genetic algorithm(GA)has been employed as a heuristic search method for selection of best transformation of the independent variables(some index properties of rocks)in regression models for prediction of uniaxial compressive strength(UCS)and modulus of elasticity(E).Firstly,multiple linear regression(MLR)analysis was performed on a data set to establish predictive models.Then,two GA models were developed in which root mean squared error(RMSE)was defned as ftness function.Results have shown that GA models are more precise than MLR models and are able to explain the relation between the intrinsic strength/elasticity properties and index properties of rocks by simple formulation and accepted accuracy.
文摘The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.