In this paper, we propose the test statistic to check whether the nonparametric function in partially linear models is linear or not. We estimate the nonparametric function in alternative by using the local linear met...In this paper, we propose the test statistic to check whether the nonparametric function in partially linear models is linear or not. We estimate the nonparametric function in alternative by using the local linear method, and then estimate the parameters by the two stage method. The test statistic under the null hypothesis is calculated, and it is shown to be asymptotically normal.展开更多
In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining ...In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.展开更多
Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,...Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.展开更多
The virtuality and openness of online social platforms make networks a hotbed for the rapid propagation of various rumors.In order to block the outbreak of rumor,one of the most effective containment measures is sprea...The virtuality and openness of online social platforms make networks a hotbed for the rapid propagation of various rumors.In order to block the outbreak of rumor,one of the most effective containment measures is spreading positive information to counterbalance the diffusion of rumor.The spreading mechanism of rumors and effective suppression strategies are significant and challenging research issues.Firstly,in order to simulate the dissemination of multiple types of information,we propose a competitive linear threshold model with state transition(CLTST)to describe the spreading process of rumor and anti-rumor in the same network.Subsequently,we put forward a community-based rumor blocking(CRB)algorithm based on influence maximization theory in social networks.Its crucial step is to identify a set of influential seeds that propagate anti-rumor information to other nodes,which includes community detection,selection of candidate anti-rumor seeds and generation of anti-rumor seed set.Under the CLTST model,the CRB algorithm has been compared with six state-of-the-art algorithms on nine online social networks to verify the performance.Experimental results show that the proposed model can better reflect the process of rumor propagation,and review the propagation mechanism of rumor and anti-rumor in online social networks.Moreover,the proposed CRB algorithm has better performance in weakening the rumor dissemination ability,which can select anti-rumor seeds in networks more accurately and achieve better performance in influence spread,sensitivity analysis,seeds distribution and running time.展开更多
Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations inc...Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.展开更多
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Based on modeling principle of GM(1,1)model and linear regression model,a combined prediction model is established to predict equipment fault by the fitting of two models.The new prediction model takes full advantag...Based on modeling principle of GM(1,1)model and linear regression model,a combined prediction model is established to predict equipment fault by the fitting of two models.The new prediction model takes full advantage of prediction information provided by the two models and improves the prediction precision.Finally,this model is introduced to predict the system fault time according to the output voltages of a certain type of radar transmitter.展开更多
The aim of this study was to assay the polyphenols,flavonoid,polyphenol oxidase and phenylalnine ammonialyase which were relative to the anthocyanins synthesis of purple corn. The optimization of multiple linear regre...The aim of this study was to assay the polyphenols,flavonoid,polyphenol oxidase and phenylalnine ammonialyase which were relative to the anthocyanins synthesis of purple corn. The optimization of multiple linear regression model of anthocyanins synthesis was y=4.383 86-0.205 45x1+5.479 638x2+0.195 575x4. According to standard partial regression coefficient testing,the result indicated that polyphenols content was negatively correlated with anthocyanins and the relative influence to anthocyanins synthesis was-42.7%; flavonoid content and activity of polyphenol oxidase were positively correlated with anthocyanins of purple corn and the relative influence to anthocyanins synthesis were 71.45% and 73.32% respectively. There was no positive correlation between the activity of phenylalnine ammonialyase and anthocyanins of purple corn. The establishment of multiple linear regression model of anthocyanins synthesis was to provide theory foundation of producing anthocyanins in laboratory.展开更多
Necessary and sufficient conditions for equalities between a 2 y′(I-P Xx)y and minimum norm quadratic unbiased estimator of variance under the general linear model, where a 2 is a known positive number, are...Necessary and sufficient conditions for equalities between a 2 y′(I-P Xx)y and minimum norm quadratic unbiased estimator of variance under the general linear model, where a 2 is a known positive number, are derived. Further, when the Gauss? Markov estimators and the ordinary least squares estimator are identical, a relative simply equivalent condition is obtained. At last, this condition is applied to an interesting example.展开更多
Plant invasion refers to the phenomenon that some plants grow too fast due to they are far away from the original living environment or predators, affecting the local environment. With the development of tourism and t...Plant invasion refers to the phenomenon that some plants grow too fast due to they are far away from the original living environment or predators, affecting the local environment. With the development of tourism and trade, the harm caused by invasive plants will be more and more serious. Therefore, it is necessary to ex- plore an effective method for controlling plant invasion through qualitative and quan- titative research. In this paper, the models were established for the early and late harmful plant invasion control. The huge computation was completed by the com- puter programming to obtain the optimal solutions of the models. The real meaning of the optimal solution was further discussed. Through numerical simulations and discussion, it could be concluded that the quantitative research on the invasive plant control had a certain application value.展开更多
By analyzing the observed phenomena and the data collected in the study, a multi-compartment linear circulation model for targeting drug delivery system was developed and the function formulas of the drug concentratio...By analyzing the observed phenomena and the data collected in the study, a multi-compartment linear circulation model for targeting drug delivery system was developed and the function formulas of the drug concentration-time in blood and target organ by computing were figured out. The drug concentration-time curve for target organ can be plotted with reference to the data of drug concentration in blood according to the model. The pharmacokinetic parameters of the drug in target organ could also be obtained. The practicability of the model was further checked by the curves of drug concentration-time in blood and target organ(liver) of liver-targeting nanoparticles in animal tests. Based on the liver drug concentration-time curves calculated by the function formula of the drug in target organ, the pharmacokinetic behavior of the drug in target organ(liver) was analyzed by statistical moment, and its pharmacokinetic parameters in liver were obtained. It is suggested that the (relative targeting index( can be used for quantitative evaluation of the targeting drug delivery systems.展开更多
In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calcula...In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.展开更多
The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial pert...The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial perturbations. The evolutions of different initial perturbations superposed on the nonlinearly stable and unstable basic flows are examined using the two-dimensional quasi-geostrophic models of double periodic-boundary condition and rigid boundary condition. The results indicate that the valid time period of TLM, during which TLM can be utilized to approximate NLM with given accuracy, varies with the magnitudes of the perturbations and the nonlinear stability and instability of the basic flows. The larger the magnitude of the perturbation is, the shorter the valid time period. The more nonlinearly unstable the basic flow is, the shorter the valid time period of TLM. With the double—periodic condition the valid period of the TLM is shorter than that with the rigid—boundary condition. Key words Nonlinear stability and instability - Tangent linear model (TLM) - Validity This work was supported by the National Key Basic Research Project “Research on the Formation Mechanism and Prediction Theory of Severe Synoptic Disasters in China” (No.G1998040910) and the National Natural Science Foundation of China (No.49775262 and No.49823002).展开更多
Convenient and effective methods to determine seasonal changes in individual leaf area (LA) and leaf mass (LM) of plants are useful in research on plant physiology and forest ecology. However, practical methods for es...Convenient and effective methods to determine seasonal changes in individual leaf area (LA) and leaf mass (LM) of plants are useful in research on plant physiology and forest ecology. However, practical methods for estimating LA and LM of elm (Ulmus japonica) leaves in different periods have rarely been reported. We collected sample elm leaves in June, July and September. Then, we developed allometric models relating LA, LM and leaf parameters, such as leaf length (L) and width (W) or the product of L and W (LW). Our objective was to find optimal allometric models for conveniently and effectively estimating LA and LM of elm leaves in different periods. LA and LM were significantly correlated with leaf parameters (P < 0.05), and allometric models with LW as an independent variable were best for estimating LA and LM in each period. A linear model was separately developed to predict LA of elm leaves in June, July and September, and it yielded high accuracies of 93, 96 and 96%, respectively. Similarly, a specific allometric model for predicting LM was developed separately in three periods, and the optimal model form in both June and July was a power model, but the linear model was optimal for September. The accuracies of the allometric models in predicting LM were 88, 83 and 84% for June, July and September, respectively. The error caused by ignoring seasonal variation of allometric models in predicting LA and LM in the three periods were 1-4 and 16-59%, respectively.展开更多
The strong nonlinearity of boundary layer parameterizations in atmospheric and oceanic models can cause difficulty for tangent linear models in approximating nonlinear perturbations when the time integration grows lon...The strong nonlinearity of boundary layer parameterizations in atmospheric and oceanic models can cause difficulty for tangent linear models in approximating nonlinear perturbations when the time integration grows longer. Consequently, the related 4—D variational data assimilation problems could be difficult to solve. A modified tangent linear model is built on the Mellor-Yamada turbulent closure (level 2.5) for 4-D variational data assimilation. For oceanic mixed layer model settings, the modified tangent linear model produces better finite amplitude, nonlinear perturbation than the full and simplified tangent linear models when the integration time is longer than one day. The corresponding variational data assimilation performances based on the adjoint of the modified tangent linear model are also improved compared with those adjoints of the full and simplified tangent linear models.展开更多
A batch-to-batch optimal iterative learning control (ILC) strategy for the tracking control of product quality in batch processes is presented. The linear time-varying perturbation (LTVP) model is built for produc...A batch-to-batch optimal iterative learning control (ILC) strategy for the tracking control of product quality in batch processes is presented. The linear time-varying perturbation (LTVP) model is built for product quality around the nominal trajectories. To address problems of model-plant mismatches, model prediction errors in the previous batch run are added to the model predictions for the current batch run. Then tracking error transition models can be built, and the ILC law with direct error feedback is explicitly obtained, A rigorous theorem is proposed, to prove the convergence of tracking error under ILC, The proposed methodology is illustrated on a typical batch reactor and the results show that the performance of trajectory tracking is gradually improved by the ILC.展开更多
In this article,the empirical Bayes(EB)estimators are constructed for the estimable functions of the parameters in partitioned normal linear model.The superiorities of the EB estimators over ordinary least-squares...In this article,the empirical Bayes(EB)estimators are constructed for the estimable functions of the parameters in partitioned normal linear model.The superiorities of the EB estimators over ordinary least-squares(LS)estimator are investigated under mean square error matrix(MSEM)criterion.展开更多
An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical...An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies.展开更多
The 3-hour-interval prediction of ground-level temperature from +00 h out to +45 h in South Korea (38 stations) is performed using the DLM (dynamic linear model) in order to eliminate the systematic error of numerical...The 3-hour-interval prediction of ground-level temperature from +00 h out to +45 h in South Korea (38 stations) is performed using the DLM (dynamic linear model) in order to eliminate the systematic error of numerical model forecasts. Numerical model forecasts and observations are used as input values of the DLM. According to the comparison of the DLM forecasts to the KFM (Kalman filter model) forecasts with RMSE and bias, the DLM is useful to improve the accuracy of prediction.展开更多
文摘In this paper, we propose the test statistic to check whether the nonparametric function in partially linear models is linear or not. We estimate the nonparametric function in alternative by using the local linear method, and then estimate the parameters by the two stage method. The test statistic under the null hypothesis is calculated, and it is shown to be asymptotically normal.
基金This research was funded by the National Natural Science Foundation of China(No.62272124)the National Key Research and Development Program of China(No.2022YFB2701401)+3 种基金Guizhou Province Science and Technology Plan Project(Grant Nos.Qiankehe Paltform Talent[2020]5017)The Research Project of Guizhou University for Talent Introduction(No.[2020]61)the Cultivation Project of Guizhou University(No.[2019]56)the Open Fund of Key Laboratory of Advanced Manufacturing Technology,Ministry of Education(GZUAMT2021KF[01]).
文摘In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.
基金This study was supported by the National Natural Science Foundation of China(42261008,41971034)the Natural Science Foundation of Gansu Province,China(22JR5RA074).
文摘Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.
基金supported by the National Social Science Fund of China (Grant No.23BGL270)。
文摘The virtuality and openness of online social platforms make networks a hotbed for the rapid propagation of various rumors.In order to block the outbreak of rumor,one of the most effective containment measures is spreading positive information to counterbalance the diffusion of rumor.The spreading mechanism of rumors and effective suppression strategies are significant and challenging research issues.Firstly,in order to simulate the dissemination of multiple types of information,we propose a competitive linear threshold model with state transition(CLTST)to describe the spreading process of rumor and anti-rumor in the same network.Subsequently,we put forward a community-based rumor blocking(CRB)algorithm based on influence maximization theory in social networks.Its crucial step is to identify a set of influential seeds that propagate anti-rumor information to other nodes,which includes community detection,selection of candidate anti-rumor seeds and generation of anti-rumor seed set.Under the CLTST model,the CRB algorithm has been compared with six state-of-the-art algorithms on nine online social networks to verify the performance.Experimental results show that the proposed model can better reflect the process of rumor propagation,and review the propagation mechanism of rumor and anti-rumor in online social networks.Moreover,the proposed CRB algorithm has better performance in weakening the rumor dissemination ability,which can select anti-rumor seeds in networks more accurately and achieve better performance in influence spread,sensitivity analysis,seeds distribution and running time.
文摘Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金National Natural Science Foundation of China(No.51175480)
文摘Based on modeling principle of GM(1,1)model and linear regression model,a combined prediction model is established to predict equipment fault by the fitting of two models.The new prediction model takes full advantage of prediction information provided by the two models and improves the prediction precision.Finally,this model is introduced to predict the system fault time according to the output voltages of a certain type of radar transmitter.
文摘The aim of this study was to assay the polyphenols,flavonoid,polyphenol oxidase and phenylalnine ammonialyase which were relative to the anthocyanins synthesis of purple corn. The optimization of multiple linear regression model of anthocyanins synthesis was y=4.383 86-0.205 45x1+5.479 638x2+0.195 575x4. According to standard partial regression coefficient testing,the result indicated that polyphenols content was negatively correlated with anthocyanins and the relative influence to anthocyanins synthesis was-42.7%; flavonoid content and activity of polyphenol oxidase were positively correlated with anthocyanins of purple corn and the relative influence to anthocyanins synthesis were 71.45% and 73.32% respectively. There was no positive correlation between the activity of phenylalnine ammonialyase and anthocyanins of purple corn. The establishment of multiple linear regression model of anthocyanins synthesis was to provide theory foundation of producing anthocyanins in laboratory.
文摘Necessary and sufficient conditions for equalities between a 2 y′(I-P Xx)y and minimum norm quadratic unbiased estimator of variance under the general linear model, where a 2 is a known positive number, are derived. Further, when the Gauss? Markov estimators and the ordinary least squares estimator are identical, a relative simply equivalent condition is obtained. At last, this condition is applied to an interesting example.
文摘Plant invasion refers to the phenomenon that some plants grow too fast due to they are far away from the original living environment or predators, affecting the local environment. With the development of tourism and trade, the harm caused by invasive plants will be more and more serious. Therefore, it is necessary to ex- plore an effective method for controlling plant invasion through qualitative and quan- titative research. In this paper, the models were established for the early and late harmful plant invasion control. The huge computation was completed by the com- puter programming to obtain the optimal solutions of the models. The real meaning of the optimal solution was further discussed. Through numerical simulations and discussion, it could be concluded that the quantitative research on the invasive plant control had a certain application value.
文摘By analyzing the observed phenomena and the data collected in the study, a multi-compartment linear circulation model for targeting drug delivery system was developed and the function formulas of the drug concentration-time in blood and target organ by computing were figured out. The drug concentration-time curve for target organ can be plotted with reference to the data of drug concentration in blood according to the model. The pharmacokinetic parameters of the drug in target organ could also be obtained. The practicability of the model was further checked by the curves of drug concentration-time in blood and target organ(liver) of liver-targeting nanoparticles in animal tests. Based on the liver drug concentration-time curves calculated by the function formula of the drug in target organ, the pharmacokinetic behavior of the drug in target organ(liver) was analyzed by statistical moment, and its pharmacokinetic parameters in liver were obtained. It is suggested that the (relative targeting index( can be used for quantitative evaluation of the targeting drug delivery systems.
基金Supported by the Natural Science Foundation of Anhui Education Committee
文摘In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.
文摘The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial perturbations. The evolutions of different initial perturbations superposed on the nonlinearly stable and unstable basic flows are examined using the two-dimensional quasi-geostrophic models of double periodic-boundary condition and rigid boundary condition. The results indicate that the valid time period of TLM, during which TLM can be utilized to approximate NLM with given accuracy, varies with the magnitudes of the perturbations and the nonlinear stability and instability of the basic flows. The larger the magnitude of the perturbation is, the shorter the valid time period. The more nonlinearly unstable the basic flow is, the shorter the valid time period of TLM. With the double—periodic condition the valid period of the TLM is shorter than that with the rigid—boundary condition. Key words Nonlinear stability and instability - Tangent linear model (TLM) - Validity This work was supported by the National Key Basic Research Project “Research on the Formation Mechanism and Prediction Theory of Severe Synoptic Disasters in China” (No.G1998040910) and the National Natural Science Foundation of China (No.49775262 and No.49823002).
基金financially supported by the National Natural Science Foundation of China(No.31600587)
文摘Convenient and effective methods to determine seasonal changes in individual leaf area (LA) and leaf mass (LM) of plants are useful in research on plant physiology and forest ecology. However, practical methods for estimating LA and LM of elm (Ulmus japonica) leaves in different periods have rarely been reported. We collected sample elm leaves in June, July and September. Then, we developed allometric models relating LA, LM and leaf parameters, such as leaf length (L) and width (W) or the product of L and W (LW). Our objective was to find optimal allometric models for conveniently and effectively estimating LA and LM of elm leaves in different periods. LA and LM were significantly correlated with leaf parameters (P < 0.05), and allometric models with LW as an independent variable were best for estimating LA and LM in each period. A linear model was separately developed to predict LA of elm leaves in June, July and September, and it yielded high accuracies of 93, 96 and 96%, respectively. Similarly, a specific allometric model for predicting LM was developed separately in three periods, and the optimal model form in both June and July was a power model, but the linear model was optimal for September. The accuracies of the allometric models in predicting LM were 88, 83 and 84% for June, July and September, respectively. The error caused by ignoring seasonal variation of allometric models in predicting LA and LM in the three periods were 1-4 and 16-59%, respectively.
基金Acknowledgments. The authors would like to thank Prof. Z. Yuan for her numerous suggestions in the writing of this paper. This work is supported by the National Natural Science Foundation of China (Grant No.40176009), the National Key Programme for Devel
文摘The strong nonlinearity of boundary layer parameterizations in atmospheric and oceanic models can cause difficulty for tangent linear models in approximating nonlinear perturbations when the time integration grows longer. Consequently, the related 4—D variational data assimilation problems could be difficult to solve. A modified tangent linear model is built on the Mellor-Yamada turbulent closure (level 2.5) for 4-D variational data assimilation. For oceanic mixed layer model settings, the modified tangent linear model produces better finite amplitude, nonlinear perturbation than the full and simplified tangent linear models when the integration time is longer than one day. The corresponding variational data assimilation performances based on the adjoint of the modified tangent linear model are also improved compared with those adjoints of the full and simplified tangent linear models.
基金Supported by the National Natural Science Foundation of China (60404012, 60674064), UK EPSRC (GR/N13319 and GR/R10875), the National High Technology Research and Development Program of China (2007AA04Z193), New Star of Science and Technology of Beijing City (2006A62), and IBM China Research Lab 2007 UR-Program.
文摘A batch-to-batch optimal iterative learning control (ILC) strategy for the tracking control of product quality in batch processes is presented. The linear time-varying perturbation (LTVP) model is built for product quality around the nominal trajectories. To address problems of model-plant mismatches, model prediction errors in the previous batch run are added to the model predictions for the current batch run. Then tracking error transition models can be built, and the ILC law with direct error feedback is explicitly obtained, A rigorous theorem is proposed, to prove the convergence of tracking error under ILC, The proposed methodology is illustrated on a typical batch reactor and the results show that the performance of trajectory tracking is gradually improved by the ILC.
基金the Knowledge Innovation Program of the Chinese Academy of Sciences(KJCX3-SYW-S02)the Youth Foundation of USTC
文摘In this article,the empirical Bayes(EB)estimators are constructed for the estimable functions of the parameters in partitioned normal linear model.The superiorities of the EB estimators over ordinary least-squares(LS)estimator are investigated under mean square error matrix(MSEM)criterion.
文摘An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies.
文摘The 3-hour-interval prediction of ground-level temperature from +00 h out to +45 h in South Korea (38 stations) is performed using the DLM (dynamic linear model) in order to eliminate the systematic error of numerical model forecasts. Numerical model forecasts and observations are used as input values of the DLM. According to the comparison of the DLM forecasts to the KFM (Kalman filter model) forecasts with RMSE and bias, the DLM is useful to improve the accuracy of prediction.