In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining ...In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.展开更多
Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations inc...Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.展开更多
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NL...In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NLDD) model and the first-order shear deformation theory. The nonlinear constitutive relations are presented, and the strain energy, kinetic energy, and virtual work of the PS doubly-curved shell are derived.Based on Hamilton's principle as well as the condition of charge continuity, the nonlinear governing equations are achieved, and then these equations are solved by means of an efficient iteration method. Several numerical examples are given to show the effect of the nonlinear drift current, elastic foundation parameters as well as geometric parameters on the nonlinear vibration frequency, and the damping characteristic of the PS doublycurved shell. The main innovations of the manuscript are that the difference between the linearized drift-diffusion(LDD) model and the NLDD model is revealed, and an effective method is proposed to select a proper initial electron concentration for the LDD model.展开更多
Soil information is the basis of soil management and precise variable fertilization. The traditional method of obtaining soil information through chemical detection of laboratory has high cost and poor timeliness, whi...Soil information is the basis of soil management and precise variable fertilization. The traditional method of obtaining soil information through chemical detection of laboratory has high cost and poor timeliness, which is difficult to meet the needs of digital forestry, soil monitoring and real-time management of nutrients. Taking red soil of Eucalyptus plantation in northern Guangxi as the research object, the spectral data of samples with different soil available potassium contents were measured, and the spectral characteristics were analyzed, and the inversion model was established by using PLS method. The results showed that the spectral sensitive bands of available potassium content in red soil of the region mainly concentrated in 400-600, 1 450, 2 200 nm and so on. After the first derivative transformation, the redundant information in the original spectral data can be significantly reduced, and the correlation between spectral indexes and soil available potassium content can be improved. The full-band modeling results of R and FDR were better than those of significant bands. The optimal model was full-band-FDR-PLS, R2=0.862, and RMSE=2.718. The results of this study can be used for the application of near-earth remote sensing in Guangxi, such as soil digital mapping, precise variable fertilization and real-time monitoring of soil available potassium.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have dev...Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have developed from the logistic regression model,the geographical weighted logistic regression model,the Lasso regression model,the random forest model,and the support vector machine model based on historical forest fire data from 2000 to 2019 in Jilin Province.The models,along with a distribution map are presented in this paper to provide a theoretical basis for forest fire management in this area.Existing studies show that the prediction accuracies of the two machine learning models are higher than those of the three generalized linear regression models.The accuracies of the random forest model,the support vector machine model,geographical weighted logistic regression model,the Lasso regression model,and logistic model were 88.7%,87.7%,86.0%,85.0%and 84.6%,respectively.Weather is the main factor affecting forest fires,while the impacts of topography factors,human and social-economic factors on fire occurrence were similar.展开更多
The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecologic...The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecological environment changes and even global changes.Based on field measurements,combined with Linear Regression(LR)model and Inverse Distance Weighing(IDW)method,this paper presents detailed analysis on the change history and trend of the shoreline and tidal flat in Bohai Bay.The shoreline faces a high erosion chance under the action of natural factors,while the tidal flat faces a different erosion and deposition patterns in Bohai Bay due to the impact of human activities.The implication of change rule for ecological protection and recovery is also discussed.Measures should be taken to protect the coastal ecological environment.The models used in this paper show a high correlation coefficient between observed and modeling data,which means that this method can be used to predict the changing trend of shoreline and tidal flat.The research results of present study can provide scientific supports for future coastal protection and management.展开更多
Hydraulic-electric rock fragmentation(HERF)plays a significant role in improving the efficiency of high voltage pulse rock breaking.However,the underlying mechanism of HERF remains unclear.In this study,considering th...Hydraulic-electric rock fragmentation(HERF)plays a significant role in improving the efficiency of high voltage pulse rock breaking.However,the underlying mechanism of HERF remains unclear.In this study,considering the heterogeneity of the rock,microscopic thermodynamic properties,and shockwave time domain waveforms,based on the shockwave model,digital imaging technology and the discrete element method,the cyclic loading numerical simulations of HERF is achieved by coupling electrical,thermal,and solid mechanics under different formation temperatures,confining pressure,initial peak voltage,electrode bit diameter,and loading times.Meanwhile,the HERF discharge system is conducive to the laboratory experiments with various electrical parameters and the resulting broken pits are numerically reconstructed to obtain the geometric parameters.The results show that,the completely broken area consists of powdery rock debris.In the pre-broken zone,the mineral cementation of the rock determines the transition of type CⅠcracks to type CⅡand type CⅢcracks.Furthermore,the peak pressure of the shockwave increased with initial peak voltage but decreased with electrode bit diameter,while the wave front time reduced.Moreover,increasing well depth,formation temperature and confining pressure augment and inhibit HERF,but once confining pressure surpassed the threshold of 60 MPa for 152.40,215.90,and 228.60 mm electrode bits,and 40 MPa for 309.88 mm electrode bits,HERF is promoted.Additionally,for the same kind of rock,the volume and width of the broken pit increase with higher initial peak voltage and rock fissures will promote HERF.Eventually,the electrode drill bit with a 215.90 mm diameter is more suitable for drilling pink granite.This research contributes to a better microscopic understanding of HERF and provides valuable insights for electrode bit selection,as well as the optimization of circuit parameters for HERF technology.展开更多
In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calcula...In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.展开更多
The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial pert...The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial perturbations. The evolutions of different initial perturbations superposed on the nonlinearly stable and unstable basic flows are examined using the two-dimensional quasi-geostrophic models of double periodic-boundary condition and rigid boundary condition. The results indicate that the valid time period of TLM, during which TLM can be utilized to approximate NLM with given accuracy, varies with the magnitudes of the perturbations and the nonlinear stability and instability of the basic flows. The larger the magnitude of the perturbation is, the shorter the valid time period. The more nonlinearly unstable the basic flow is, the shorter the valid time period of TLM. With the double—periodic condition the valid period of the TLM is shorter than that with the rigid—boundary condition. Key words Nonlinear stability and instability - Tangent linear model (TLM) - Validity This work was supported by the National Key Basic Research Project “Research on the Formation Mechanism and Prediction Theory of Severe Synoptic Disasters in China” (No.G1998040910) and the National Natural Science Foundation of China (No.49775262 and No.49823002).展开更多
Convenient and effective methods to determine seasonal changes in individual leaf area (LA) and leaf mass (LM) of plants are useful in research on plant physiology and forest ecology. However, practical methods for es...Convenient and effective methods to determine seasonal changes in individual leaf area (LA) and leaf mass (LM) of plants are useful in research on plant physiology and forest ecology. However, practical methods for estimating LA and LM of elm (Ulmus japonica) leaves in different periods have rarely been reported. We collected sample elm leaves in June, July and September. Then, we developed allometric models relating LA, LM and leaf parameters, such as leaf length (L) and width (W) or the product of L and W (LW). Our objective was to find optimal allometric models for conveniently and effectively estimating LA and LM of elm leaves in different periods. LA and LM were significantly correlated with leaf parameters (P < 0.05), and allometric models with LW as an independent variable were best for estimating LA and LM in each period. A linear model was separately developed to predict LA of elm leaves in June, July and September, and it yielded high accuracies of 93, 96 and 96%, respectively. Similarly, a specific allometric model for predicting LM was developed separately in three periods, and the optimal model form in both June and July was a power model, but the linear model was optimal for September. The accuracies of the allometric models in predicting LM were 88, 83 and 84% for June, July and September, respectively. The error caused by ignoring seasonal variation of allometric models in predicting LA and LM in the three periods were 1-4 and 16-59%, respectively.展开更多
In this article,the empirical Bayes(EB)estimators are constructed for the estimable functions of the parameters in partitioned normal linear model.The superiorities of the EB estimators over ordinary least-squares...In this article,the empirical Bayes(EB)estimators are constructed for the estimable functions of the parameters in partitioned normal linear model.The superiorities of the EB estimators over ordinary least-squares(LS)estimator are investigated under mean square error matrix(MSEM)criterion.展开更多
The 3-hour-interval prediction of ground-level temperature from +00 h out to +45 h in South Korea (38 stations) is performed using the DLM (dynamic linear model) in order to eliminate the systematic error of numerical...The 3-hour-interval prediction of ground-level temperature from +00 h out to +45 h in South Korea (38 stations) is performed using the DLM (dynamic linear model) in order to eliminate the systematic error of numerical model forecasts. Numerical model forecasts and observations are used as input values of the DLM. According to the comparison of the DLM forecasts to the KFM (Kalman filter model) forecasts with RMSE and bias, the DLM is useful to improve the accuracy of prediction.展开更多
Background: Leaf Area Index(LAI) is an important parameter used in monitoring and modeling of forest ecosystems. The aim of this study was to evaluate performance of the artificial neural network(ANN) models to predic...Background: Leaf Area Index(LAI) is an important parameter used in monitoring and modeling of forest ecosystems. The aim of this study was to evaluate performance of the artificial neural network(ANN) models to predict the LAI by comparing the regression analysis models as the classical method in these pure and even-aged Crimean pine forest stands.Methods: One hundred eight temporary sample plots were collected from Crimean pine forest stands to estimate stand parameters. Each sample plot was imaged with hemispherical photographs to detect the LAI. The partial correlation analysis was used to assess the relationships between the stand LAI values and stand parameters, and the multivariate linear regression analysis was used to predict the LAI from stand parameters. Different artificial neural network models comprising different number of neuron and transfer functions were trained and used to predict the LAI of forest stands.Results: The correlation coefficients between LAI and stand parameters(stand number of trees, basal area, the quadratic mean diameter, stand density and stand age) were significant at the level of 0.01. The stand age, number of trees, site index, and basal area were independent parameters in the most successful regression model predicted LAI values using stand parameters(R_(adj)~2=0.5431). As corresponding method to predict the interactions between the stand LAI values and stand parameters, the neural network architecture based on the RBF 4-19-1 with Gaussian activation function in hidden layer and the identity activation function in output layer performed better in predicting LAI(SSE(12.1040), MSE(0.1223), RMSE(0.3497), AIC(0.1040), BIC(-77.7310) and R^2(0.6392)) compared to the other studied techniques.Conclusion: The ANN outperformed the multivariate regression techniques in predicting LAI from stand parameters. The ANN models, developed in this study, may aid in making forest management planning in study forest stands.展开更多
In this paper,the empirical likelihood confidence regions for the regression coefficient in a linear model are constructed under m-dependent errors.It is shown that the blockwise empirical likelihood is a good way to ...In this paper,the empirical likelihood confidence regions for the regression coefficient in a linear model are constructed under m-dependent errors.It is shown that the blockwise empirical likelihood is a good way to deal with dependent samples.展开更多
In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be est...In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be estimated simultaneously by the proposed method while the feature of longitudinal data is considered. The existence, strong consistency and asymptotic normality of the estimators are proved under suitable conditions. A simulation study is conducted to investigate the finite sample performance of the proposed method. Our approach can also be used to study the pure single-index model for longitudinal data.展开更多
This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author als...This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author also found that the estimators show remarkable in the small sample case yet.展开更多
In the hierarchical random effect linear model, the Bayes estimator of random parameter are not only dependent on specific prior distribution but also it is difficult to calculate in most cases. This paper derives the...In the hierarchical random effect linear model, the Bayes estimator of random parameter are not only dependent on specific prior distribution but also it is difficult to calculate in most cases. This paper derives the distributed-free optimal linear estimator of random parameters in the model by means of the credibility theory method. The estimators the authors derive can be applied in more extensive practical scenarios since they are only dependent on the first two moments of prior parameter rather than on specific prior distribution. Finally, the results are compared with some classical models and a numerical example is given to show the effectiveness of the estimators.展开更多
基金This research was funded by the National Natural Science Foundation of China(No.62272124)the National Key Research and Development Program of China(No.2022YFB2701401)+3 种基金Guizhou Province Science and Technology Plan Project(Grant Nos.Qiankehe Paltform Talent[2020]5017)The Research Project of Guizhou University for Talent Introduction(No.[2020]61)the Cultivation Project of Guizhou University(No.[2019]56)the Open Fund of Key Laboratory of Advanced Manufacturing Technology,Ministry of Education(GZUAMT2021KF[01]).
文摘In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.
文摘Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金Project supported by the National Natural Science Foundation of China (Nos. 12172236, 12202289,and U21A20430)the Science and Technology Research Project of Hebei Education Department of China (No. QN2022083)。
文摘In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NLDD) model and the first-order shear deformation theory. The nonlinear constitutive relations are presented, and the strain energy, kinetic energy, and virtual work of the PS doubly-curved shell are derived.Based on Hamilton's principle as well as the condition of charge continuity, the nonlinear governing equations are achieved, and then these equations are solved by means of an efficient iteration method. Several numerical examples are given to show the effect of the nonlinear drift current, elastic foundation parameters as well as geometric parameters on the nonlinear vibration frequency, and the damping characteristic of the PS doublycurved shell. The main innovations of the manuscript are that the difference between the linearized drift-diffusion(LDD) model and the NLDD model is revealed, and an effective method is proposed to select a proper initial electron concentration for the LDD model.
基金Supported by Autonomous Project of the Key Laboratory for Cultivating Excellent Timber Forest Resources in Guangxi (2020-A-04-01)Special Fund of Guangxi Innovation Driven Development (GUIKE AA17204087-11)。
文摘Soil information is the basis of soil management and precise variable fertilization. The traditional method of obtaining soil information through chemical detection of laboratory has high cost and poor timeliness, which is difficult to meet the needs of digital forestry, soil monitoring and real-time management of nutrients. Taking red soil of Eucalyptus plantation in northern Guangxi as the research object, the spectral data of samples with different soil available potassium contents were measured, and the spectral characteristics were analyzed, and the inversion model was established by using PLS method. The results showed that the spectral sensitive bands of available potassium content in red soil of the region mainly concentrated in 400-600, 1 450, 2 200 nm and so on. After the first derivative transformation, the redundant information in the original spectral data can be significantly reduced, and the correlation between spectral indexes and soil available potassium content can be improved. The full-band modeling results of R and FDR were better than those of significant bands. The optimal model was full-band-FDR-PLS, R2=0.862, and RMSE=2.718. The results of this study can be used for the application of near-earth remote sensing in Guangxi, such as soil digital mapping, precise variable fertilization and real-time monitoring of soil available potassium.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
基金This research was funded by the National Natural Science Foundation of China(grant no.32271881).
文摘Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have developed from the logistic regression model,the geographical weighted logistic regression model,the Lasso regression model,the random forest model,and the support vector machine model based on historical forest fire data from 2000 to 2019 in Jilin Province.The models,along with a distribution map are presented in this paper to provide a theoretical basis for forest fire management in this area.Existing studies show that the prediction accuracies of the two machine learning models are higher than those of the three generalized linear regression models.The accuracies of the random forest model,the support vector machine model,geographical weighted logistic regression model,the Lasso regression model,and logistic model were 88.7%,87.7%,86.0%,85.0%and 84.6%,respectively.Weather is the main factor affecting forest fires,while the impacts of topography factors,human and social-economic factors on fire occurrence were similar.
基金supported by the National Natural Science Foundation of China (41602205, 42293261)the China Geological Survey Program (DD20189506, DD20211301)+2 种基金the Special Investigation Project on Science and Technology Basic Resources of the Ministry of Science and Technology (2021FY101003)the Central Guidance for Local Scientific and Technological Development Fund of 2023the Project of Hebei University of Environmental Engineering (GCY202301)
文摘The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecological environment changes and even global changes.Based on field measurements,combined with Linear Regression(LR)model and Inverse Distance Weighing(IDW)method,this paper presents detailed analysis on the change history and trend of the shoreline and tidal flat in Bohai Bay.The shoreline faces a high erosion chance under the action of natural factors,while the tidal flat faces a different erosion and deposition patterns in Bohai Bay due to the impact of human activities.The implication of change rule for ecological protection and recovery is also discussed.Measures should be taken to protect the coastal ecological environment.The models used in this paper show a high correlation coefficient between observed and modeling data,which means that this method can be used to predict the changing trend of shoreline and tidal flat.The research results of present study can provide scientific supports for future coastal protection and management.
基金supported by the National Natural Science Foundation of China(Nos.52034006,52004229,52225401,and 52274231)the Regional Innovation Cooperation Project of Sichuan Province(No.2022YFQ0059)+3 种基金Science and Technology Cooperation Project of the CNPC-SWPU Innovation Alliance(No.2020CX040301)Natural Science Foundation of Sichuan Province(No.2023NSFSC0431)Science and Technology Strategic Cooperation Project between Nanchong City and Southwest Petroleum University(No.SXHZ004)Research and innovation Fund for Graduate Students of Southwest Petroleum University(No.2022KYCX058).
文摘Hydraulic-electric rock fragmentation(HERF)plays a significant role in improving the efficiency of high voltage pulse rock breaking.However,the underlying mechanism of HERF remains unclear.In this study,considering the heterogeneity of the rock,microscopic thermodynamic properties,and shockwave time domain waveforms,based on the shockwave model,digital imaging technology and the discrete element method,the cyclic loading numerical simulations of HERF is achieved by coupling electrical,thermal,and solid mechanics under different formation temperatures,confining pressure,initial peak voltage,electrode bit diameter,and loading times.Meanwhile,the HERF discharge system is conducive to the laboratory experiments with various electrical parameters and the resulting broken pits are numerically reconstructed to obtain the geometric parameters.The results show that,the completely broken area consists of powdery rock debris.In the pre-broken zone,the mineral cementation of the rock determines the transition of type CⅠcracks to type CⅡand type CⅢcracks.Furthermore,the peak pressure of the shockwave increased with initial peak voltage but decreased with electrode bit diameter,while the wave front time reduced.Moreover,increasing well depth,formation temperature and confining pressure augment and inhibit HERF,but once confining pressure surpassed the threshold of 60 MPa for 152.40,215.90,and 228.60 mm electrode bits,and 40 MPa for 309.88 mm electrode bits,HERF is promoted.Additionally,for the same kind of rock,the volume and width of the broken pit increase with higher initial peak voltage and rock fissures will promote HERF.Eventually,the electrode drill bit with a 215.90 mm diameter is more suitable for drilling pink granite.This research contributes to a better microscopic understanding of HERF and provides valuable insights for electrode bit selection,as well as the optimization of circuit parameters for HERF technology.
基金Supported by the Natural Science Foundation of Anhui Education Committee
文摘In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.
文摘The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial perturbations. The evolutions of different initial perturbations superposed on the nonlinearly stable and unstable basic flows are examined using the two-dimensional quasi-geostrophic models of double periodic-boundary condition and rigid boundary condition. The results indicate that the valid time period of TLM, during which TLM can be utilized to approximate NLM with given accuracy, varies with the magnitudes of the perturbations and the nonlinear stability and instability of the basic flows. The larger the magnitude of the perturbation is, the shorter the valid time period. The more nonlinearly unstable the basic flow is, the shorter the valid time period of TLM. With the double—periodic condition the valid period of the TLM is shorter than that with the rigid—boundary condition. Key words Nonlinear stability and instability - Tangent linear model (TLM) - Validity This work was supported by the National Key Basic Research Project “Research on the Formation Mechanism and Prediction Theory of Severe Synoptic Disasters in China” (No.G1998040910) and the National Natural Science Foundation of China (No.49775262 and No.49823002).
基金financially supported by the National Natural Science Foundation of China(No.31600587)
文摘Convenient and effective methods to determine seasonal changes in individual leaf area (LA) and leaf mass (LM) of plants are useful in research on plant physiology and forest ecology. However, practical methods for estimating LA and LM of elm (Ulmus japonica) leaves in different periods have rarely been reported. We collected sample elm leaves in June, July and September. Then, we developed allometric models relating LA, LM and leaf parameters, such as leaf length (L) and width (W) or the product of L and W (LW). Our objective was to find optimal allometric models for conveniently and effectively estimating LA and LM of elm leaves in different periods. LA and LM were significantly correlated with leaf parameters (P < 0.05), and allometric models with LW as an independent variable were best for estimating LA and LM in each period. A linear model was separately developed to predict LA of elm leaves in June, July and September, and it yielded high accuracies of 93, 96 and 96%, respectively. Similarly, a specific allometric model for predicting LM was developed separately in three periods, and the optimal model form in both June and July was a power model, but the linear model was optimal for September. The accuracies of the allometric models in predicting LM were 88, 83 and 84% for June, July and September, respectively. The error caused by ignoring seasonal variation of allometric models in predicting LA and LM in the three periods were 1-4 and 16-59%, respectively.
基金the Knowledge Innovation Program of the Chinese Academy of Sciences(KJCX3-SYW-S02)the Youth Foundation of USTC
文摘In this article,the empirical Bayes(EB)estimators are constructed for the estimable functions of the parameters in partitioned normal linear model.The superiorities of the EB estimators over ordinary least-squares(LS)estimator are investigated under mean square error matrix(MSEM)criterion.
文摘The 3-hour-interval prediction of ground-level temperature from +00 h out to +45 h in South Korea (38 stations) is performed using the DLM (dynamic linear model) in order to eliminate the systematic error of numerical model forecasts. Numerical model forecasts and observations are used as input values of the DLM. According to the comparison of the DLM forecasts to the KFM (Kalman filter model) forecasts with RMSE and bias, the DLM is useful to improve the accuracy of prediction.
基金Funding from The Scientific and Technological Research Council of Turkey(Project No:2130026)is gratefully acknowledged
文摘Background: Leaf Area Index(LAI) is an important parameter used in monitoring and modeling of forest ecosystems. The aim of this study was to evaluate performance of the artificial neural network(ANN) models to predict the LAI by comparing the regression analysis models as the classical method in these pure and even-aged Crimean pine forest stands.Methods: One hundred eight temporary sample plots were collected from Crimean pine forest stands to estimate stand parameters. Each sample plot was imaged with hemispherical photographs to detect the LAI. The partial correlation analysis was used to assess the relationships between the stand LAI values and stand parameters, and the multivariate linear regression analysis was used to predict the LAI from stand parameters. Different artificial neural network models comprising different number of neuron and transfer functions were trained and used to predict the LAI of forest stands.Results: The correlation coefficients between LAI and stand parameters(stand number of trees, basal area, the quadratic mean diameter, stand density and stand age) were significant at the level of 0.01. The stand age, number of trees, site index, and basal area were independent parameters in the most successful regression model predicted LAI values using stand parameters(R_(adj)~2=0.5431). As corresponding method to predict the interactions between the stand LAI values and stand parameters, the neural network architecture based on the RBF 4-19-1 with Gaussian activation function in hidden layer and the identity activation function in output layer performed better in predicting LAI(SSE(12.1040), MSE(0.1223), RMSE(0.3497), AIC(0.1040), BIC(-77.7310) and R^2(0.6392)) compared to the other studied techniques.Conclusion: The ANN outperformed the multivariate regression techniques in predicting LAI from stand parameters. The ANN models, developed in this study, may aid in making forest management planning in study forest stands.
文摘In this paper,the empirical likelihood confidence regions for the regression coefficient in a linear model are constructed under m-dependent errors.It is shown that the blockwise empirical likelihood is a good way to deal with dependent samples.
基金Supported by the National Natural Science Foundation of China (10571008)the Natural Science Foundation of Henan (092300410149)the Core Teacher Foundationof Henan (2006141)
文摘In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be estimated simultaneously by the proposed method while the feature of longitudinal data is considered. The existence, strong consistency and asymptotic normality of the estimators are proved under suitable conditions. A simulation study is conducted to investigate the finite sample performance of the proposed method. Our approach can also be used to study the pure single-index model for longitudinal data.
文摘This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author also found that the estimators show remarkable in the small sample case yet.
基金supported by the National Science Foundation of China under Grant Nos.71361015,71340010,71371074the Jiangxi Provincial Natural Science Foundation under Grant No.20142BAB201013+2 种基金China Postdoctoral Science Foundation under Grant No.2013M540534China Postdoctoral Fund special Project under Grant No.2014T70615Jiangxi Postdoctoral Science Foundation under Grant No.2013KY53
文摘In the hierarchical random effect linear model, the Bayes estimator of random parameter are not only dependent on specific prior distribution but also it is difficult to calculate in most cases. This paper derives the distributed-free optimal linear estimator of random parameters in the model by means of the credibility theory method. The estimators the authors derive can be applied in more extensive practical scenarios since they are only dependent on the first two moments of prior parameter rather than on specific prior distribution. Finally, the results are compared with some classical models and a numerical example is given to show the effectiveness of the estimators.