Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,...Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.展开更多
In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NL...In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NLDD) model and the first-order shear deformation theory. The nonlinear constitutive relations are presented, and the strain energy, kinetic energy, and virtual work of the PS doubly-curved shell are derived.Based on Hamilton's principle as well as the condition of charge continuity, the nonlinear governing equations are achieved, and then these equations are solved by means of an efficient iteration method. Several numerical examples are given to show the effect of the nonlinear drift current, elastic foundation parameters as well as geometric parameters on the nonlinear vibration frequency, and the damping characteristic of the PS doublycurved shell. The main innovations of the manuscript are that the difference between the linearized drift-diffusion(LDD) model and the NLDD model is revealed, and an effective method is proposed to select a proper initial electron concentration for the LDD model.展开更多
In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining ...In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with rand...Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with random errors.However,in many geodetic applications,some elements are error-free and some random observations appear repeatedly in different positions in the augmented coefficient matrix.It is called the linear structured EIV(LSEIV)model.Two kinds of methods are proposed for the LSEIV model from functional and stochastic modifications.On the one hand,the functional part of the LSEIV model is modified into the errors-in-observations(EIO)model.On the other hand,the stochastic model is modified by applying the Moore-Penrose inverse of the cofactor matrix.The algorithms are derived through the Lagrange multipliers method and linear approximation.The estimation principles and iterative formula of the parameters are proven to be consistent.The first-order approximate variance-covariance matrix(VCM)of the parameters is also derived.A numerical example is given to compare the performances of our proposed three algorithms with the STLS approach.Afterwards,the least squares(LS),total least squares(TLS)and linear structured weighted total least squares(LSWTLS)solutions are compared and the accuracy evaluation formula is proven to be feasible and effective.Finally,the LSWTLS is applied to the field of deformation analysis,which yields a better result than the traditional LS and TLS estimations.展开更多
The virtuality and openness of online social platforms make networks a hotbed for the rapid propagation of various rumors.In order to block the outbreak of rumor,one of the most effective containment measures is sprea...The virtuality and openness of online social platforms make networks a hotbed for the rapid propagation of various rumors.In order to block the outbreak of rumor,one of the most effective containment measures is spreading positive information to counterbalance the diffusion of rumor.The spreading mechanism of rumors and effective suppression strategies are significant and challenging research issues.Firstly,in order to simulate the dissemination of multiple types of information,we propose a competitive linear threshold model with state transition(CLTST)to describe the spreading process of rumor and anti-rumor in the same network.Subsequently,we put forward a community-based rumor blocking(CRB)algorithm based on influence maximization theory in social networks.Its crucial step is to identify a set of influential seeds that propagate anti-rumor information to other nodes,which includes community detection,selection of candidate anti-rumor seeds and generation of anti-rumor seed set.Under the CLTST model,the CRB algorithm has been compared with six state-of-the-art algorithms on nine online social networks to verify the performance.Experimental results show that the proposed model can better reflect the process of rumor propagation,and review the propagation mechanism of rumor and anti-rumor in online social networks.Moreover,the proposed CRB algorithm has better performance in weakening the rumor dissemination ability,which can select anti-rumor seeds in networks more accurately and achieve better performance in influence spread,sensitivity analysis,seeds distribution and running time.展开更多
The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,an...The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.展开更多
Effort estimation plays a crucial role in software development projects,aiding in resource allocation,project planning,and risk management.Traditional estimation techniques often struggle to provide accurate estimates...Effort estimation plays a crucial role in software development projects,aiding in resource allocation,project planning,and risk management.Traditional estimation techniques often struggle to provide accurate estimates due to the complex nature of software projects.In recent years,machine learning approaches have shown promise in improving the accuracy of effort estimation models.This study proposes a hybrid model that combines Long Short-Term Memory(LSTM)and Random Forest(RF)algorithms to enhance software effort estimation.The proposed hybrid model takes advantage of the strengths of both LSTM and RF algorithms.To evaluate the performance of the hybrid model,an extensive set of software development projects is used as the experimental dataset.The experimental results demonstrate that the proposed hybrid model outperforms traditional estimation techniques in terms of accuracy and reliability.The integration of LSTM and RF enables the model to efficiently capture temporal dependencies and non-linear interactions in the software development data.The hybrid model enhances estimation accuracy,enabling project managers and stakeholders to make more precise predictions of effort needed for upcoming software projects.展开更多
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations inc...Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.展开更多
The dynamic viscoelastic properties of asphalt AC-20 and its composites with Organic-Montmorillonite clay (OMMt) and SBS were modeled using the empirical Havriliak-Negami (HN) model, based on linear viscoelastic theor...The dynamic viscoelastic properties of asphalt AC-20 and its composites with Organic-Montmorillonite clay (OMMt) and SBS were modeled using the empirical Havriliak-Negami (HN) model, based on linear viscoelastic theory (LVE). The HN parameters, α, β, G0, G∞and τHN were determined by solving the HN equation across various temperatures and frequencies. The HN model successfully predicted the rheological behavior of the asphalt and its blends within the temperature range of 25˚C - 40˚C. However, deviations occurred between 40˚C - 75˚C, where the glass transition temperature Tg of the asphalt components and the SBS polymer are located, rendering the HN model ineffective for predicting the dynamic viscoelastic properties of composites containing OMMt under these conditions. Yet, the prediction error of the HN model dropped to 2.28% - 2.81% for asphalt and its mixtures at 100˚C, a temperature exceeding the Tg values of both polymer and asphalt, where the mixtures exhibited a liquid-like behavior. The exponent α and the relaxation time increased with temperature across all systems. Incorporating OMMt clay into the asphalt blends significantly enhanced the relaxation dynamics of the resulting composites.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Data organization requires high efficiency for large amount of data applied in the digital mine system. A new method of storing massive data of block model is proposed to meet the characteristics of the database, incl...Data organization requires high efficiency for large amount of data applied in the digital mine system. A new method of storing massive data of block model is proposed to meet the characteristics of the database, including ACID-compliant, concurrency support, data sharing, and efficient access. Each block model is organized by linear octree, stored in LMDB(lightning memory-mapped database). Geological attribute can be queried at any point of 3D space by comparison algorithm of location code and conversion algorithm from address code of geometry space to location code of storage. The performance and robustness of querying geological attribute at 3D spatial region are enhanced greatly by the transformation from 3D to 2D and the method of 2D grid scanning to screen the inner and outer points. Experimental results showed that this method can access the massive data of block model, meeting the database characteristics. The method with LMDB is at least 3 times faster than that with etree, especially when it is used to read. In addition, the larger the amount of data is processed, the more efficient the method would be.展开更多
A unified breakdown model of SOI RESURF device with uniform,step,or linear drift region doping profile is firstly proposed.By the model,the electric field distribution and breakdown voltage are researched in detail fo...A unified breakdown model of SOI RESURF device with uniform,step,or linear drift region doping profile is firstly proposed.By the model,the electric field distribution and breakdown voltage are researched in detail for the step numbers from 0 to infinity.The critic electric field as the function of the geometry parameters and doping profile is derived.For the thick film device,linear doping profile can be replaced by a single or two steps doping profile in the drift region due to a considerable uniformly lateral electric field,almost ideal breakdown voltage,and simplified design and fabrication.The availability of the proposed model is verified by the good accordance among the analytical results,numerical simulations,and reported experiments.展开更多
In this paper, we define a new class of biased linear estimators of the vector of unknown parameters in the deficient_rank linear model based on the spectral decomposition expression of the best linear minimun bias es...In this paper, we define a new class of biased linear estimators of the vector of unknown parameters in the deficient_rank linear model based on the spectral decomposition expression of the best linear minimun bias estimator. Some important properties are discussed. By appropriate choices of bias parameters, we construct many interested and useful biased linear estimators, which are the extension of ordinary biased linear estimators in the full_rank linear model to the deficient_rank linear model. At last, we give a numerical example in geodetic adjustment.展开更多
In order to overcome data-quantization, networked-induced delay, network packet dropouts and wrong sequences in the nonlinear networked control system, a novel nonlinear networked control system model is built by the ...In order to overcome data-quantization, networked-induced delay, network packet dropouts and wrong sequences in the nonlinear networked control system, a novel nonlinear networked control system model is built by the T-S fuzzy method. Two time-varying quantizers are added in the model. The key analysis steps in the method are to construct an improved interval-delay-dependent Lyapunov functional and to introduce the free-weighting matrix. By making use of the parallel distributed compensation technology and the convexity of the matrix function, the improved criteria of the stabilization and stability are obtained. Simulation experiments show that the parameters of the controllers and quantizers satisfying a certain performance can be obtained by solving a set of LMIs. The application of the nonlinear mass-spring system is provided to show that the proposed method is effective.展开更多
Necessary and sufficient conditions for equalities between a 2 y′(I-P Xx)y and minimum norm quadratic unbiased estimator of variance under the general linear model, where a 2 is a known positive number, are...Necessary and sufficient conditions for equalities between a 2 y′(I-P Xx)y and minimum norm quadratic unbiased estimator of variance under the general linear model, where a 2 is a known positive number, are derived. Further, when the Gauss? Markov estimators and the ordinary least squares estimator are identical, a relative simply equivalent condition is obtained. At last, this condition is applied to an interesting example.展开更多
By analyzing the observed phenomena and the data collected in the study, a multi-compartment linear circulation model for targeting drug delivery system was developed and the function formulas of the drug concentratio...By analyzing the observed phenomena and the data collected in the study, a multi-compartment linear circulation model for targeting drug delivery system was developed and the function formulas of the drug concentration-time in blood and target organ by computing were figured out. The drug concentration-time curve for target organ can be plotted with reference to the data of drug concentration in blood according to the model. The pharmacokinetic parameters of the drug in target organ could also be obtained. The practicability of the model was further checked by the curves of drug concentration-time in blood and target organ(liver) of liver-targeting nanoparticles in animal tests. Based on the liver drug concentration-time curves calculated by the function formula of the drug in target organ, the pharmacokinetic behavior of the drug in target organ(liver) was analyzed by statistical moment, and its pharmacokinetic parameters in liver were obtained. It is suggested that the (relative targeting index( can be used for quantitative evaluation of the targeting drug delivery systems.展开更多
In order to solve serious urban transport problems, according to the proved chaotic characteristic of traffic flow, a non linear chaotic model to analyze the time series of traffic flow is proposed. This model recons...In order to solve serious urban transport problems, according to the proved chaotic characteristic of traffic flow, a non linear chaotic model to analyze the time series of traffic flow is proposed. This model reconstructs the time series of traffic flow in the phase space firstly, and the correlative information in the traffic flow is extracted richly, on the basis of it, a predicted equation for the reconstructed information is established by using chaotic theory, and for the purpose of obtaining the optimal predicted results, recognition and optimization to the model parameters are done by using genetic algorithm. Practical prediction research of urban traffic flow shows that this model has famous predicted precision, and it can provide exact reference for urban traffic programming and control.展开更多
In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calcula...In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.展开更多
基金This study was supported by the National Natural Science Foundation of China(42261008,41971034)the Natural Science Foundation of Gansu Province,China(22JR5RA074).
文摘Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.
基金Project supported by the National Natural Science Foundation of China (Nos. 12172236, 12202289,and U21A20430)the Science and Technology Research Project of Hebei Education Department of China (No. QN2022083)。
文摘In this paper, the nonlinear free vibration behaviors of the piezoelectric semiconductor(PS) doubly-curved shell resting on the Pasternak foundation are studied within the framework of the nonlinear drift-diffusion(NLDD) model and the first-order shear deformation theory. The nonlinear constitutive relations are presented, and the strain energy, kinetic energy, and virtual work of the PS doubly-curved shell are derived.Based on Hamilton's principle as well as the condition of charge continuity, the nonlinear governing equations are achieved, and then these equations are solved by means of an efficient iteration method. Several numerical examples are given to show the effect of the nonlinear drift current, elastic foundation parameters as well as geometric parameters on the nonlinear vibration frequency, and the damping characteristic of the PS doublycurved shell. The main innovations of the manuscript are that the difference between the linearized drift-diffusion(LDD) model and the NLDD model is revealed, and an effective method is proposed to select a proper initial electron concentration for the LDD model.
基金This research was funded by the National Natural Science Foundation of China(No.62272124)the National Key Research and Development Program of China(No.2022YFB2701401)+3 种基金Guizhou Province Science and Technology Plan Project(Grant Nos.Qiankehe Paltform Talent[2020]5017)The Research Project of Guizhou University for Talent Introduction(No.[2020]61)the Cultivation Project of Guizhou University(No.[2019]56)the Open Fund of Key Laboratory of Advanced Manufacturing Technology,Ministry of Education(GZUAMT2021KF[01]).
文摘In the assessment of car insurance claims,the claim rate for car insurance presents a highly skewed probability distribution,which is typically modeled using Tweedie distribution.The traditional approach to obtaining the Tweedie regression model involves training on a centralized dataset,when the data is provided by multiple parties,training a privacy-preserving Tweedie regression model without exchanging raw data becomes a challenge.To address this issue,this study introduces a novel vertical federated learning-based Tweedie regression algorithm for multi-party auto insurance rate setting in data silos.The algorithm can keep sensitive data locally and uses privacy-preserving techniques to achieve intersection operations between the two parties holding the data.After determining which entities are shared,the participants train the model locally using the shared entity data to obtain the local generalized linear model intermediate parameters.The homomorphic encryption algorithms are introduced to interact with and update the model intermediate parameters to collaboratively complete the joint training of the car insurance rate-setting model.Performance tests on two publicly available datasets show that the proposed federated Tweedie regression algorithm can effectively generate Tweedie regression models that leverage the value of data fromboth partieswithout exchanging data.The assessment results of the scheme approach those of the Tweedie regressionmodel learned fromcentralized data,and outperformthe Tweedie regressionmodel learned independently by a single party.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
基金the financial support of the National Natural Science Foundation of China(Grant No.42074016,42104025,42274057and 41704007)Hunan Provincial Natural Science Foundation of China(Grant No.2021JJ30244)Scientific Research Fund of Hunan Provincial Education Department(Grant No.22B0496)。
文摘Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with random errors.However,in many geodetic applications,some elements are error-free and some random observations appear repeatedly in different positions in the augmented coefficient matrix.It is called the linear structured EIV(LSEIV)model.Two kinds of methods are proposed for the LSEIV model from functional and stochastic modifications.On the one hand,the functional part of the LSEIV model is modified into the errors-in-observations(EIO)model.On the other hand,the stochastic model is modified by applying the Moore-Penrose inverse of the cofactor matrix.The algorithms are derived through the Lagrange multipliers method and linear approximation.The estimation principles and iterative formula of the parameters are proven to be consistent.The first-order approximate variance-covariance matrix(VCM)of the parameters is also derived.A numerical example is given to compare the performances of our proposed three algorithms with the STLS approach.Afterwards,the least squares(LS),total least squares(TLS)and linear structured weighted total least squares(LSWTLS)solutions are compared and the accuracy evaluation formula is proven to be feasible and effective.Finally,the LSWTLS is applied to the field of deformation analysis,which yields a better result than the traditional LS and TLS estimations.
基金supported by the National Social Science Fund of China (Grant No.23BGL270)。
文摘The virtuality and openness of online social platforms make networks a hotbed for the rapid propagation of various rumors.In order to block the outbreak of rumor,one of the most effective containment measures is spreading positive information to counterbalance the diffusion of rumor.The spreading mechanism of rumors and effective suppression strategies are significant and challenging research issues.Firstly,in order to simulate the dissemination of multiple types of information,we propose a competitive linear threshold model with state transition(CLTST)to describe the spreading process of rumor and anti-rumor in the same network.Subsequently,we put forward a community-based rumor blocking(CRB)algorithm based on influence maximization theory in social networks.Its crucial step is to identify a set of influential seeds that propagate anti-rumor information to other nodes,which includes community detection,selection of candidate anti-rumor seeds and generation of anti-rumor seed set.Under the CLTST model,the CRB algorithm has been compared with six state-of-the-art algorithms on nine online social networks to verify the performance.Experimental results show that the proposed model can better reflect the process of rumor propagation,and review the propagation mechanism of rumor and anti-rumor in online social networks.Moreover,the proposed CRB algorithm has better performance in weakening the rumor dissemination ability,which can select anti-rumor seeds in networks more accurately and achieve better performance in influence spread,sensitivity analysis,seeds distribution and running time.
基金funded by the National Key Research and Development Program of China(No.2022YFD2200503-02)。
文摘The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.
文摘Effort estimation plays a crucial role in software development projects,aiding in resource allocation,project planning,and risk management.Traditional estimation techniques often struggle to provide accurate estimates due to the complex nature of software projects.In recent years,machine learning approaches have shown promise in improving the accuracy of effort estimation models.This study proposes a hybrid model that combines Long Short-Term Memory(LSTM)and Random Forest(RF)algorithms to enhance software effort estimation.The proposed hybrid model takes advantage of the strengths of both LSTM and RF algorithms.To evaluate the performance of the hybrid model,an extensive set of software development projects is used as the experimental dataset.The experimental results demonstrate that the proposed hybrid model outperforms traditional estimation techniques in terms of accuracy and reliability.The integration of LSTM and RF enables the model to efficiently capture temporal dependencies and non-linear interactions in the software development data.The hybrid model enhances estimation accuracy,enabling project managers and stakeholders to make more precise predictions of effort needed for upcoming software projects.
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.
文摘Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.
文摘The dynamic viscoelastic properties of asphalt AC-20 and its composites with Organic-Montmorillonite clay (OMMt) and SBS were modeled using the empirical Havriliak-Negami (HN) model, based on linear viscoelastic theory (LVE). The HN parameters, α, β, G0, G∞and τHN were determined by solving the HN equation across various temperatures and frequencies. The HN model successfully predicted the rheological behavior of the asphalt and its blends within the temperature range of 25˚C - 40˚C. However, deviations occurred between 40˚C - 75˚C, where the glass transition temperature Tg of the asphalt components and the SBS polymer are located, rendering the HN model ineffective for predicting the dynamic viscoelastic properties of composites containing OMMt under these conditions. Yet, the prediction error of the HN model dropped to 2.28% - 2.81% for asphalt and its mixtures at 100˚C, a temperature exceeding the Tg values of both polymer and asphalt, where the mixtures exhibited a liquid-like behavior. The exponent α and the relaxation time increased with temperature across all systems. Incorporating OMMt clay into the asphalt blends significantly enhanced the relaxation dynamics of the resulting composites.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金Projects(41572317,51374242)supported by the National Natural Science Foundation of ChinaProject(2015CX005)supported by the Innovation Driven Plan of Central South University,China
文摘Data organization requires high efficiency for large amount of data applied in the digital mine system. A new method of storing massive data of block model is proposed to meet the characteristics of the database, including ACID-compliant, concurrency support, data sharing, and efficient access. Each block model is organized by linear octree, stored in LMDB(lightning memory-mapped database). Geological attribute can be queried at any point of 3D space by comparison algorithm of location code and conversion algorithm from address code of geometry space to location code of storage. The performance and robustness of querying geological attribute at 3D spatial region are enhanced greatly by the transformation from 3D to 2D and the method of 2D grid scanning to screen the inner and outer points. Experimental results showed that this method can access the massive data of block model, meeting the database characteristics. The method with LMDB is at least 3 times faster than that with etree, especially when it is used to read. In addition, the larger the amount of data is processed, the more efficient the method would be.
文摘A unified breakdown model of SOI RESURF device with uniform,step,or linear drift region doping profile is firstly proposed.By the model,the electric field distribution and breakdown voltage are researched in detail for the step numbers from 0 to infinity.The critic electric field as the function of the geometry parameters and doping profile is derived.For the thick film device,linear doping profile can be replaced by a single or two steps doping profile in the drift region due to a considerable uniformly lateral electric field,almost ideal breakdown voltage,and simplified design and fabrication.The availability of the proposed model is verified by the good accordance among the analytical results,numerical simulations,and reported experiments.
文摘In this paper, we define a new class of biased linear estimators of the vector of unknown parameters in the deficient_rank linear model based on the spectral decomposition expression of the best linear minimun bias estimator. Some important properties are discussed. By appropriate choices of bias parameters, we construct many interested and useful biased linear estimators, which are the extension of ordinary biased linear estimators in the full_rank linear model to the deficient_rank linear model. At last, we give a numerical example in geodetic adjustment.
基金The National Natural Science Foundation of China(No.60474049,60835001)Specialized Research Fund for Doctoral Program of Higher Education(No.20090092120027)
文摘In order to overcome data-quantization, networked-induced delay, network packet dropouts and wrong sequences in the nonlinear networked control system, a novel nonlinear networked control system model is built by the T-S fuzzy method. Two time-varying quantizers are added in the model. The key analysis steps in the method are to construct an improved interval-delay-dependent Lyapunov functional and to introduce the free-weighting matrix. By making use of the parallel distributed compensation technology and the convexity of the matrix function, the improved criteria of the stabilization and stability are obtained. Simulation experiments show that the parameters of the controllers and quantizers satisfying a certain performance can be obtained by solving a set of LMIs. The application of the nonlinear mass-spring system is provided to show that the proposed method is effective.
文摘Necessary and sufficient conditions for equalities between a 2 y′(I-P Xx)y and minimum norm quadratic unbiased estimator of variance under the general linear model, where a 2 is a known positive number, are derived. Further, when the Gauss? Markov estimators and the ordinary least squares estimator are identical, a relative simply equivalent condition is obtained. At last, this condition is applied to an interesting example.
文摘By analyzing the observed phenomena and the data collected in the study, a multi-compartment linear circulation model for targeting drug delivery system was developed and the function formulas of the drug concentration-time in blood and target organ by computing were figured out. The drug concentration-time curve for target organ can be plotted with reference to the data of drug concentration in blood according to the model. The pharmacokinetic parameters of the drug in target organ could also be obtained. The practicability of the model was further checked by the curves of drug concentration-time in blood and target organ(liver) of liver-targeting nanoparticles in animal tests. Based on the liver drug concentration-time curves calculated by the function formula of the drug in target organ, the pharmacokinetic behavior of the drug in target organ(liver) was analyzed by statistical moment, and its pharmacokinetic parameters in liver were obtained. It is suggested that the (relative targeting index( can be used for quantitative evaluation of the targeting drug delivery systems.
文摘In order to solve serious urban transport problems, according to the proved chaotic characteristic of traffic flow, a non linear chaotic model to analyze the time series of traffic flow is proposed. This model reconstructs the time series of traffic flow in the phase space firstly, and the correlative information in the traffic flow is extracted richly, on the basis of it, a predicted equation for the reconstructed information is established by using chaotic theory, and for the purpose of obtaining the optimal predicted results, recognition and optimization to the model parameters are done by using genetic algorithm. Practical prediction research of urban traffic flow shows that this model has famous predicted precision, and it can provide exact reference for urban traffic programming and control.
基金Supported by the Natural Science Foundation of Anhui Education Committee
文摘In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.