Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,...Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.展开更多
Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with rand...Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with random errors.However,in many geodetic applications,some elements are error-free and some random observations appear repeatedly in different positions in the augmented coefficient matrix.It is called the linear structured EIV(LSEIV)model.Two kinds of methods are proposed for the LSEIV model from functional and stochastic modifications.On the one hand,the functional part of the LSEIV model is modified into the errors-in-observations(EIO)model.On the other hand,the stochastic model is modified by applying the Moore-Penrose inverse of the cofactor matrix.The algorithms are derived through the Lagrange multipliers method and linear approximation.The estimation principles and iterative formula of the parameters are proven to be consistent.The first-order approximate variance-covariance matrix(VCM)of the parameters is also derived.A numerical example is given to compare the performances of our proposed three algorithms with the STLS approach.Afterwards,the least squares(LS),total least squares(TLS)and linear structured weighted total least squares(LSWTLS)solutions are compared and the accuracy evaluation formula is proven to be feasible and effective.Finally,the LSWTLS is applied to the field of deformation analysis,which yields a better result than the traditional LS and TLS estimations.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NL...In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NLDD) model and the first-order shear deformation theory. The nonlinear constitutive relations are presented, and the strain energy, kinetic energy, and virtual work of the PS doubly-curved shell are derived.Based on Hamilton's principle as well as the condition of charge continuity, the nonlinear governing equations are achieved, and then these equations are solved by means of an efficient iteration method. Several numerical examples are given to show the effect of the nonlinear drift current, elastic foundation parameters as well as geometric parameters on the nonlinear vibration frequency, and the damping characteristic of the PS doublycurved shell. The main innovations of the manuscript are that the difference between the linearized drift-diffusion(LDD) model and the NLDD model is revealed, and an effective method is proposed to select a proper initial electron concentration for the LDD model.展开更多
In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining ...In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,an...The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.展开更多
BACKGROUND: In localized brain proton magnetic resonance spectroscopy (^1H-MRS), metabolite levels are often expressed as ratios, rather than absolute concentrations. Frequently, the denominator is creatine, which ...BACKGROUND: In localized brain proton magnetic resonance spectroscopy (^1H-MRS), metabolite levels are often expressed as ratios, rather than absolute concentrations. Frequently, the denominator is creatine, which is assumed to be stable in normal, as well as many pathological, states. However, in vivo creatine levels do not remain constant. Therefore, absolute metabolite measurements, which provide the precise concentrations of certain chemical compounds, are superior to metabolite ratios for determining pathological and evolutional changes. OBJECTIVE: To investigate the feasibility of quantification analysis of brain metabolite changes caused by central analgesics nasal spray using the ^1H-MRS and linear combination model (LCModel) methods. DESIGN, TIME AND SETTING: This neuroimaging, observational, animal study was performed at the Laboratory of the Department of Medical Imaging, Second Affiliated Hospital, Medical College, Shantou University, China from July to December 2007. MATERIALS: Butorphanol tartrate nasal spray, as a mixed agonist-antagonist opioid analgesic, was purchased from Shanghai Hengrui Pharmacy, China. A General Electric Signa 1.5T System (General Electric Medical Systems, Milwaukee, WI, USA) and LCModel software (Stephen Provencher, Oakville, Ontario, Canada) were used in this study. METHODS: MRS images were acquired in ten healthy swine aged 2 weeks using single-voxel point-resolved spectroscopic sequence. A region of interest (2 cm × 2 cm × 2 cm) was placed in the image centers of maximum brain parenchyma. Repeated MRS scanning was performed 15-20 minutes after intranasal administration of 1 mg of butorphanol tartrate. Three settings of repetition time/echo time were selected before and after nasal spray administration 3 000 ms/30 ms,1 500 ms/30 ms, and 3 000 ms/50 ms. Metabolite concentrations were estimated by LCModel software. MAIN OUTCOME MEASURES: ^1H-MRS spectra was obtained using various repetition time/echo time settings. Concentrations of glutamate compounds (glutamate + glutamine), N-acetyl aspartate, and choline were detected in swine brain prior to and following nasal spray treatment. RESULTS: The glutamate compounds curve was consistent with original spectra, when a repetition time/echo time of 3 000 ms/30 ms was adopted. Concentrations of glutamate compounds, N-acetyl aspartate, and choline decreased following administration. The most significant reduction was observed in glutamate compound concentrations from (9.28 ± 0.54) mmol/kg to (7.28 ± 0.54) mmol/kg (P 〈 0.05). CONCLUSION: ^1H-MRS and LCModel software were effectively utilized to quantitatively analyze and measure brain metabolites. Glutamate compounds might be an important neurotransmitter in central analgesia.展开更多
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
In this paper, we present continuous iteratively reweighted least squares algorithm (CIRLS) for solving the linear models problem by convex relaxation, and prove the convergence of this algorithm. Under some condition...In this paper, we present continuous iteratively reweighted least squares algorithm (CIRLS) for solving the linear models problem by convex relaxation, and prove the convergence of this algorithm. Under some conditions, we give an error bound for the algorithm. In addition, the numerical result shows the efficiency of the algorithm.展开更多
Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations inc...Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.展开更多
The dynamic viscoelastic properties of asphalt AC-20 and its composites with Organic-Montmorillonite clay (OMMt) and SBS were modeled using the empirical Havriliak-Negami (HN) model, based on linear viscoelastic theor...The dynamic viscoelastic properties of asphalt AC-20 and its composites with Organic-Montmorillonite clay (OMMt) and SBS were modeled using the empirical Havriliak-Negami (HN) model, based on linear viscoelastic theory (LVE). The HN parameters, α, β, G0, G∞and τHN were determined by solving the HN equation across various temperatures and frequencies. The HN model successfully predicted the rheological behavior of the asphalt and its blends within the temperature range of 25˚C - 40˚C. However, deviations occurred between 40˚C - 75˚C, where the glass transition temperature Tg of the asphalt components and the SBS polymer are located, rendering the HN model ineffective for predicting the dynamic viscoelastic properties of composites containing OMMt under these conditions. Yet, the prediction error of the HN model dropped to 2.28% - 2.81% for asphalt and its mixtures at 100˚C, a temperature exceeding the Tg values of both polymer and asphalt, where the mixtures exhibited a liquid-like behavior. The exponent α and the relaxation time increased with temperature across all systems. Incorporating OMMt clay into the asphalt blends significantly enhanced the relaxation dynamics of the resulting composites.展开更多
A class of general inverse matrix techniques based on adaptive algorithmic modelling methodologies is derived yielding iterative methods for solving unsymmetric linear systems of irregular structure arising in complex...A class of general inverse matrix techniques based on adaptive algorithmic modelling methodologies is derived yielding iterative methods for solving unsymmetric linear systems of irregular structure arising in complex computational problems in three space dimensions. The proposed class of approximate inverse is chosen as the basis to yield systems on which classic and preconditioned iterative methods are explicitly applied. Optimized versions of the proposed approximate inverse are presented using special storage (k-sweep) techniques leading to economical forms of the approximate inverses. Application of the adaptive algorithmic methodologies on a characteristic nonlinear boundary value problem is discussed and numerical results are given.展开更多
An improved model predictive control algorithm is proposed for Hammerstein-Wiener nonlinear systems.The proposed synthesis algorithm contains two parts:offline design the polytopic invariant sets,and online solve the ...An improved model predictive control algorithm is proposed for Hammerstein-Wiener nonlinear systems.The proposed synthesis algorithm contains two parts:offline design the polytopic invariant sets,and online solve the min-max optimization problem.The polytopic invariant set is adopted to replace the traditional ellipsoid invariant set.And the parameter-correlation nonlinear control law is designed to replace the traditional linear control law.Consequently,the terminal region is enlarged and the control effect is improved.Simulation and experiment are used to verify the validity of the wind tunnel flow field control algorithm.展开更多
Aiming at the difficulty of accurately constructing the dynamic model of subtropical high, based on the potential height field time series over 500 hPa layer of T106 numerical forecast products, by using EOF(empirica...Aiming at the difficulty of accurately constructing the dynamic model of subtropical high, based on the potential height field time series over 500 hPa layer of T106 numerical forecast products, by using EOF(empirical orthogonal function) temporal-spatial separation technique, the disassembled EOF time coefficients series were regarded as dynamical model variables, and dynamic system retrieval idea as well as genetic algorithm were introduced to make dynamical model parameters optimization search, then, a reasonable non-linear dynamic model of EOF time-coefficients was established. By dynamic model integral and EOF temporal-spatial components assembly, a mid-/long-term forecast of subtropical high was carried out. The experimental results show that the forecast results of dynamic model are superior to that of general numerical model forecast results. A new modeling idea and forecast technique is presented for diagnosing and forecasting such complicated weathers as subtropical high.展开更多
In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calcula...In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.展开更多
The construction method of background value is improved in the original multi-variable grey model (MGM(1,m)) from its source of construction errors. The MGM(1,m) with optimized background value is used to elimin...The construction method of background value is improved in the original multi-variable grey model (MGM(1,m)) from its source of construction errors. The MGM(1,m) with optimized background value is used to eliminate the random fluctuations or errors of the observational data of all variables, and the combined prediction model together with the multiple linear regression is established in order to improve the simulation and prediction accuracy of the combined model. Finally, a combined model of the MGM(1,2) with optimized background value and the binary linear regression is constructed by an example. The results show that the model has good effects for simulation and prediction.展开更多
The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial pert...The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial perturbations. The evolutions of different initial perturbations superposed on the nonlinearly stable and unstable basic flows are examined using the two-dimensional quasi-geostrophic models of double periodic-boundary condition and rigid boundary condition. The results indicate that the valid time period of TLM, during which TLM can be utilized to approximate NLM with given accuracy, varies with the magnitudes of the perturbations and the nonlinear stability and instability of the basic flows. The larger the magnitude of the perturbation is, the shorter the valid time period. The more nonlinearly unstable the basic flow is, the shorter the valid time period of TLM. With the double—periodic condition the valid period of the TLM is shorter than that with the rigid—boundary condition. Key words Nonlinear stability and instability - Tangent linear model (TLM) - Validity This work was supported by the National Key Basic Research Project “Research on the Formation Mechanism and Prediction Theory of Severe Synoptic Disasters in China” (No.G1998040910) and the National Natural Science Foundation of China (No.49775262 and No.49823002).展开更多
基金This study was supported by the National Natural Science Foundation of China(42261008,41971034)the Natural Science Foundation of Gansu Province,China(22JR5RA074).
文摘Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.
基金the financial support of the National Natural Science Foundation of China(Grant No.42074016,42104025,42274057and 41704007)Hunan Provincial Natural Science Foundation of China(Grant No.2021JJ30244)Scientific Research Fund of Hunan Provincial Education Department(Grant No.22B0496)。
文摘Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with random errors.However,in many geodetic applications,some elements are error-free and some random observations appear repeatedly in different positions in the augmented coefficient matrix.It is called the linear structured EIV(LSEIV)model.Two kinds of methods are proposed for the LSEIV model from functional and stochastic modifications.On the one hand,the functional part of the LSEIV model is modified into the errors-in-observations(EIO)model.On the other hand,the stochastic model is modified by applying the Moore-Penrose inverse of the cofactor matrix.The algorithms are derived through the Lagrange multipliers method and linear approximation.The estimation principles and iterative formula of the parameters are proven to be consistent.The first-order approximate variance-covariance matrix(VCM)of the parameters is also derived.A numerical example is given to compare the performances of our proposed three algorithms with the STLS approach.Afterwards,the least squares(LS),total least squares(TLS)and linear structured weighted total least squares(LSWTLS)solutions are compared and the accuracy evaluation formula is proven to be feasible and effective.Finally,the LSWTLS is applied to the field of deformation analysis,which yields a better result than the traditional LS and TLS estimations.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金Project supported by the National Natural Science Foundation of China (Nos. 12172236, 12202289,and U21A20430)the Science and Technology Research Project of Hebei Education Department of China (No. QN2022083)。
文摘In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NLDD) model and the first-order shear deformation theory. The nonlinear constitutive relations are presented, and the strain energy, kinetic energy, and virtual work of the PS doubly-curved shell are derived.Based on Hamilton's principle as well as the condition of charge continuity, the nonlinear governing equations are achieved, and then these equations are solved by means of an efficient iteration method. Several numerical examples are given to show the effect of the nonlinear drift current, elastic foundation parameters as well as geometric parameters on the nonlinear vibration frequency, and the damping characteristic of the PS doublycurved shell. The main innovations of the manuscript are that the difference between the linearized drift-diffusion(LDD) model and the NLDD model is revealed, and an effective method is proposed to select a proper initial electron concentration for the LDD model.
基金This research was funded by the National Natural Science Foundation of China(No.62272124)the National Key Research and Development Program of China(No.2022YFB2701401)+3 种基金Guizhou Province Science and Technology Plan Project(Grant Nos.Qiankehe Paltform Talent[2020]5017)The Research Project of Guizhou University for Talent Introduction(No.[2020]61)the Cultivation Project of Guizhou University(No.[2019]56)the Open Fund of Key Laboratory of Advanced Manufacturing Technology,Ministry of Education(GZUAMT2021KF[01]).
文摘In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
基金funded by the National Key Research and Development Program of China(No.2022YFD2200503-02)。
文摘The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.
基金the National Natural Science Foundation of China,No. 3047051530570480
文摘BACKGROUND: In localized brain proton magnetic resonance spectroscopy (^1H-MRS), metabolite levels are often expressed as ratios, rather than absolute concentrations. Frequently, the denominator is creatine, which is assumed to be stable in normal, as well as many pathological, states. However, in vivo creatine levels do not remain constant. Therefore, absolute metabolite measurements, which provide the precise concentrations of certain chemical compounds, are superior to metabolite ratios for determining pathological and evolutional changes. OBJECTIVE: To investigate the feasibility of quantification analysis of brain metabolite changes caused by central analgesics nasal spray using the ^1H-MRS and linear combination model (LCModel) methods. DESIGN, TIME AND SETTING: This neuroimaging, observational, animal study was performed at the Laboratory of the Department of Medical Imaging, Second Affiliated Hospital, Medical College, Shantou University, China from July to December 2007. MATERIALS: Butorphanol tartrate nasal spray, as a mixed agonist-antagonist opioid analgesic, was purchased from Shanghai Hengrui Pharmacy, China. A General Electric Signa 1.5T System (General Electric Medical Systems, Milwaukee, WI, USA) and LCModel software (Stephen Provencher, Oakville, Ontario, Canada) were used in this study. METHODS: MRS images were acquired in ten healthy swine aged 2 weeks using single-voxel point-resolved spectroscopic sequence. A region of interest (2 cm × 2 cm × 2 cm) was placed in the image centers of maximum brain parenchyma. Repeated MRS scanning was performed 15-20 minutes after intranasal administration of 1 mg of butorphanol tartrate. Three settings of repetition time/echo time were selected before and after nasal spray administration 3 000 ms/30 ms,1 500 ms/30 ms, and 3 000 ms/50 ms. Metabolite concentrations were estimated by LCModel software. MAIN OUTCOME MEASURES: ^1H-MRS spectra was obtained using various repetition time/echo time settings. Concentrations of glutamate compounds (glutamate + glutamine), N-acetyl aspartate, and choline were detected in swine brain prior to and following nasal spray treatment. RESULTS: The glutamate compounds curve was consistent with original spectra, when a repetition time/echo time of 3 000 ms/30 ms was adopted. Concentrations of glutamate compounds, N-acetyl aspartate, and choline decreased following administration. The most significant reduction was observed in glutamate compound concentrations from (9.28 ± 0.54) mmol/kg to (7.28 ± 0.54) mmol/kg (P 〈 0.05). CONCLUSION: ^1H-MRS and LCModel software were effectively utilized to quantitatively analyze and measure brain metabolites. Glutamate compounds might be an important neurotransmitter in central analgesia.
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.
文摘In this paper, we present continuous iteratively reweighted least squares algorithm (CIRLS) for solving the linear models problem by convex relaxation, and prove the convergence of this algorithm. Under some conditions, we give an error bound for the algorithm. In addition, the numerical result shows the efficiency of the algorithm.
文摘Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.
文摘The dynamic viscoelastic properties of asphalt AC-20 and its composites with Organic-Montmorillonite clay (OMMt) and SBS were modeled using the empirical Havriliak-Negami (HN) model, based on linear viscoelastic theory (LVE). The HN parameters, α, β, G0, G∞and τHN were determined by solving the HN equation across various temperatures and frequencies. The HN model successfully predicted the rheological behavior of the asphalt and its blends within the temperature range of 25˚C - 40˚C. However, deviations occurred between 40˚C - 75˚C, where the glass transition temperature Tg of the asphalt components and the SBS polymer are located, rendering the HN model ineffective for predicting the dynamic viscoelastic properties of composites containing OMMt under these conditions. Yet, the prediction error of the HN model dropped to 2.28% - 2.81% for asphalt and its mixtures at 100˚C, a temperature exceeding the Tg values of both polymer and asphalt, where the mixtures exhibited a liquid-like behavior. The exponent α and the relaxation time increased with temperature across all systems. Incorporating OMMt clay into the asphalt blends significantly enhanced the relaxation dynamics of the resulting composites.
文摘A class of general inverse matrix techniques based on adaptive algorithmic modelling methodologies is derived yielding iterative methods for solving unsymmetric linear systems of irregular structure arising in complex computational problems in three space dimensions. The proposed class of approximate inverse is chosen as the basis to yield systems on which classic and preconditioned iterative methods are explicitly applied. Optimized versions of the proposed approximate inverse are presented using special storage (k-sweep) techniques leading to economical forms of the approximate inverses. Application of the adaptive algorithmic methodologies on a characteristic nonlinear boundary value problem is discussed and numerical results are given.
基金Project(61074074)supported by the National Natural Science Foundation,ChinaProject(KT2012C01J0401)supported by the Group Innovation Fund,China
文摘An improved model predictive control algorithm is proposed for Hammerstein-Wiener nonlinear systems.The proposed synthesis algorithm contains two parts:offline design the polytopic invariant sets,and online solve the min-max optimization problem.The polytopic invariant set is adopted to replace the traditional ellipsoid invariant set.And the parameter-correlation nonlinear control law is designed to replace the traditional linear control law.Consequently,the terminal region is enlarged and the control effect is improved.Simulation and experiment are used to verify the validity of the wind tunnel flow field control algorithm.
基金Project supported by the National Natural Science Foundation of China (No.40375019) the Tropical Marine and Meteorology Science Foundation (No.200609) the Jiangsu Key Laboratory of Meteorological Disaster Foundation (No.KLME0507)
文摘Aiming at the difficulty of accurately constructing the dynamic model of subtropical high, based on the potential height field time series over 500 hPa layer of T106 numerical forecast products, by using EOF(empirical orthogonal function) temporal-spatial separation technique, the disassembled EOF time coefficients series were regarded as dynamical model variables, and dynamic system retrieval idea as well as genetic algorithm were introduced to make dynamical model parameters optimization search, then, a reasonable non-linear dynamic model of EOF time-coefficients was established. By dynamic model integral and EOF temporal-spatial components assembly, a mid-/long-term forecast of subtropical high was carried out. The experimental results show that the forecast results of dynamic model are superior to that of general numerical model forecast results. A new modeling idea and forecast technique is presented for diagnosing and forecasting such complicated weathers as subtropical high.
基金Supported by the Natural Science Foundation of Anhui Education Committee
文摘In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.
基金supported by the National Natural Science Foundation of China(71071077)the Ministry of Education Key Project of National Educational Science Planning(DFA090215)+1 种基金China Postdoctoral Science Foundation(20100481137)Funding of Jiangsu Innovation Program for Graduate Education(CXZZ11-0226)
文摘The construction method of background value is improved in the original multi-variable grey model (MGM(1,m)) from its source of construction errors. The MGM(1,m) with optimized background value is used to eliminate the random fluctuations or errors of the observational data of all variables, and the combined prediction model together with the multiple linear regression is established in order to improve the simulation and prediction accuracy of the combined model. Finally, a combined model of the MGM(1,2) with optimized background value and the binary linear regression is constructed by an example. The results show that the model has good effects for simulation and prediction.
文摘The impact of nonlinear stability and instability on the validity of tangent linear model (TLM) is investigated by comparing its results with those produced by the nonlinear model (NLM) with the identical initial perturbations. The evolutions of different initial perturbations superposed on the nonlinearly stable and unstable basic flows are examined using the two-dimensional quasi-geostrophic models of double periodic-boundary condition and rigid boundary condition. The results indicate that the valid time period of TLM, during which TLM can be utilized to approximate NLM with given accuracy, varies with the magnitudes of the perturbations and the nonlinear stability and instability of the basic flows. The larger the magnitude of the perturbation is, the shorter the valid time period. The more nonlinearly unstable the basic flow is, the shorter the valid time period of TLM. With the double—periodic condition the valid period of the TLM is shorter than that with the rigid—boundary condition. Key words Nonlinear stability and instability - Tangent linear model (TLM) - Validity This work was supported by the National Key Basic Research Project “Research on the Formation Mechanism and Prediction Theory of Severe Synoptic Disasters in China” (No.G1998040910) and the National Natural Science Foundation of China (No.49775262 and No.49823002).