In this work, four empirical models of statistical thickness, namely the models of Harkins and Jura, Hasley, Carbon Black and Jaroniec, were compared in order to determine the textural properties (external surface and...In this work, four empirical models of statistical thickness, namely the models of Harkins and Jura, Hasley, Carbon Black and Jaroniec, were compared in order to determine the textural properties (external surface and surface of micropores) of a clay concrete without molasses and clay concretes stabilized with 8%, 12% and 16% molasses. The results obtained show that Hasley’s model can be used to obtain the external surfaces. However, it does not allow the surface of the micropores to be obtained, and is not suitable for the case of simple clay concrete (without molasses) and for clay concretes stabilized with molasses. The Carbon Black, Jaroniec and Harkins and Jura models can be used for clay concrete and stabilized clay concrete. However, the Carbon Black model is the most relevant for clay concrete and the Harkins and Jura model is for molasses-stabilized clay concrete. These last two models augur well for future research.展开更多
This study aims to reveal the spatial structural characteristics of 1,652 Ethnic-Minority Villages(EMV)in China and to analyze the mechanisms driving their spatial heterogeneity.EMV are a special type of settlement sp...This study aims to reveal the spatial structural characteristics of 1,652 Ethnic-Minority Villages(EMV)in China and to analyze the mechanisms driving their spatial heterogeneity.EMV are a special type of settlement space that preserve a large number of historical traces of the ethnic culture of ancient China.They are important carriers of China’s excellent traditional culture and are key to the implementation of rural revitalization strategies.In this study,1652 EMV in China were selected as the research subjects.The Nearest Neighbor Index,kernel density,and spatial autocorrelation index were employed to reveal the spatial structural characteristics of minority villages.Neural network models,spatial lag models,and geographical detectors were used to analyze the formation mechanism of spatial heterogeneity in EMV.The results indicate that:(1)EMV exhibit significant spatial differentiation characterized by“single-core with multiple surrounding sub-centers,”“polarization between east and west,”“decreasing quantity from southwest to east coast to northeast to northwest,”and“large dispersion with small agglomeration.”(2)EMV are mainly distributed in areas rich in intangible cultural heritage,with high vegetation coverage and low altitude,far from central cities,and having limited arable land and an underdeveloped economy and transportation,particularly in shaded or riverbank areas.(3)Distance from the nearest river(X3),distance from central cities(X8),national intangible cultural heritage(X9),and NDVI(X10)were the main driving factors affecting the spatial distribution of EMV,whereas elevation(X1)and GDP(X5)had the weakest influence.As EMV are a relatively unique territorial spatial unit,the identification of their spatial heterogeneity characteristics not only deepens the research content of settlement geography,but also involves the assessment,protection,and development of Minority Villages,which is of great significance for the inheritance and utilization of excellent ethnic cultures in the era.展开更多
Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore ...Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore the impact of genotyping strategies and statistical models on the accuracy of genomic prediction for survival in pigs during the total growing period from birth to slaughter.Results:We simulated pig populations with different direct and maternal heritabilities and used a linear mixed model,a logit model,and a probit model to predict genomic breeding values of pig survival based on data of individual survival records with binary outcomes(0,1).The results show that in the case of only alive animals having genotype data,unbiased genomic predictions can be achieved when using variances estimated from pedigreebased model.Models using genomic information achieved up to 59.2%higher accuracy of estimated breeding value compared to pedigree-based model,dependent on genotyping scenarios.The scenario of genotyping all individuals,both dead and alive individuals,obtained the highest accuracy.When an equal number of individuals(80%)were genotyped,random sample of individuals with genotypes achieved higher accuracy than only alive individuals with genotypes.The linear model,logit model and probit model achieved similar accuracy.Conclusions:Our conclusion is that genomic prediction of pig survival is feasible in the situation that only alive pigs have genotypes,but genomic information of dead individuals can increase accuracy of genomic prediction by 2.06%to 6.04%.展开更多
COVID-19 has significantly impacted the growth prediction of a pandemic,and it is critical in determining how to battle and track the disease progression.In this case,COVID-19 data is a time-series dataset that can be...COVID-19 has significantly impacted the growth prediction of a pandemic,and it is critical in determining how to battle and track the disease progression.In this case,COVID-19 data is a time-series dataset that can be projected using different methodologies.Thus,this work aims to gauge the spread of the outbreak severity over time.Furthermore,data analytics and Machine Learning(ML)techniques are employed to gain a broader understanding of virus infections.We have simulated,adjusted,and fitted several statistical time-series forecasting models,linearML models,and nonlinear ML models.Examples of these models are Logistic Regression,Lasso,Ridge,ElasticNet,Huber Regressor,Lasso Lars,Passive Aggressive Regressor,K-Neighbors Regressor,Decision Tree Regressor,Extra Trees Regressor,Support Vector Regressions(SVR),AdaBoost Regressor,Random Forest Regressor,Bagging Regressor,AuoRegression,MovingAverage,Gradient Boosting Regressor,Autoregressive Moving Average(ARMA),Auto-Regressive Integrated Moving Averages(ARIMA),SimpleExpSmoothing,Exponential Smoothing,Holt-Winters,Simple Moving Average,Weighted Moving Average,Croston,and naive Bayes.Furthermore,our suggested methodology includes the development and evaluation of ensemble models built on top of the best-performing statistical and ML-based prediction methods.A third stage in the proposed system is to examine three different implementations to determine which model delivers the best performance.Then,this best method is used for future forecasts,and consequently,we can collect the most accurate and dependable predictions.展开更多
The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 ...The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 stations. The methodological approach is based on the statistical modeling of maximum daily rainfall. Adjustments were made on several sample sizes and several return periods (2, 5, 10, 20, 50 and 100 years). The main results have shown that the 30 years series (1931-1960;1961-1990;1991-2020) are better adjusted by the Gumbel (26.92% - 53.85%) and Inverse Gamma (26.92% - 46.15%). Concerning the 60-years series (1931-1990;1961-2020), they are better adjusted by the Inverse Gamma (30.77%), Gamma (15.38% - 46.15%) and Gumbel (15.38% - 42.31%). The full chronicle 1931-2020 (90 years) presents a notable supremacy of 50% of Gumbel model over the Gamma (34.62%) and Gamma Inverse (15.38%) model. It is noted that the Gumbel is the most dominant model overall and more particularly in wet periods. The data for periods with normal and dry trends were better fitted by Gamma and Inverse Gamma.展开更多
This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which...This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which is a high lexical formalism. In order to apply language models based on link grammar in phrase-based models,the concept of linked phrases,an extension of the concept of traditional phrases in phrase-based models was brought out. Experiments were conducted and the results showed that the use of syntax-based language models could improve the performance of the phrase-based models greatly.展开更多
Several statistical methods have been developed for analyzing genotype×environment(GE)interactions in crop breeding programs to identify genotypes with high yield and stability performances.Four statistical metho...Several statistical methods have been developed for analyzing genotype×environment(GE)interactions in crop breeding programs to identify genotypes with high yield and stability performances.Four statistical methods,including joint regression analysis(JRA),additive mean effects and multiplicative interaction(AMMI)analysis,genotype plus GE interaction(GGE)biplot analysis,and yield–stability(YSi)statistic were used to evaluate GE interaction in20 winter wheat genotypes grown in 24 environments in Iran.The main objective was to evaluate the rank correlations among the four statistical methods in genotype rankings for yield,stability and yield–stability.Three kinds of genotypic ranks(yield ranks,stability ranks,and yield–stability ranks)were determined with each method.The results indicated the presence of GE interaction,suggesting the need for stability analysis.With respect to yield,the genotype rankings by the GGE biplot and AMMI analysis were significantly correlated(P<0.01).For stability ranking,the rank correlations ranged from 0.53(GGE–YSi;P<0.05)to0.97(JRA–YSi;P<0.01).AMMI distance(AMMID)was highly correlated(P<0.01)with variance of regression deviation(S2di)in JRA(r=0.83)and Shukla stability variance(σ2)in YSi(r=0.86),indicating that these stability indices can be used interchangeably.No correlation was found between yield ranks and stability ranks(AMMID,S2di,σ2,and GGE stability index),indicating that they measure static stability and accordingly could be used if selection is based primarily on stability.For yield–stability,rank correlation coefficients among the statistical methods varied from 0.64(JRA–YSi;P<0.01)to 0.89(AMMI–YSi;P<0.01),indicating that AMMI and YSi were closely associated in the genotype ranking for integrating yield with stability performance.Based on the results,it can be concluded that YSi was closely correlated with(i)JRA in ranking genotypes for stability and(ii)AMMI for integrating yield and stability.展开更多
[Objective] The study aimed to compare several statistical analysis models for estimating the sugarcane (Saccharum spp.) genotypic stability. [Method] The data of sugarcane regional trials in Guangdong, in 2009 was ...[Objective] The study aimed to compare several statistical analysis models for estimating the sugarcane (Saccharum spp.) genotypic stability. [Method] The data of sugarcane regional trials in Guangdong, in 2009 was analyzed by three models respectively: Finlay and Wilkinson model: the additive main effects and multiplicative interaction (AMMI) model and linear regression-principal components analysis (LR- PCA) model, so as to compare the models. [Result] The Finlay and Wilkinson model was easier, but the analysis of the other two models was more comprehensive, and there was a bit difference between the additive main effects and multiplicative inter- action (AMMI) model and linear regression-principal components analysis (LR-PCA) model. [Conclusion] In practice, while the proper statistical method was usually con- sidered according to the different data, it should be also considered that the same data should be analyzed with different statistical methods in order to get a more reasonable result by comparison.展开更多
The water resources of the Nadhour-Sisseb-El Alem Basin in Tunisia exhibit semi-arid and arid climatic conditions.This induces an excessive pumping of groundwater,which creates drops in water level ranging about 1-2 m...The water resources of the Nadhour-Sisseb-El Alem Basin in Tunisia exhibit semi-arid and arid climatic conditions.This induces an excessive pumping of groundwater,which creates drops in water level ranging about 1-2 m/a.Indeed,these unfavorable conditions require interventions to rationalize integrated management in decision making.The aim of this study is to determine a water recharge index(WRI),delineate the potential groundwater recharge area and estimate the potential groundwater recharge rate based on the integration of statistical models resulted from remote sensing imagery,GIS digital data(e.g.,lithology,soil,runoff),measured artificial recharge data,fuzzy set theory and multi-criteria decision making(MCDM)using the analytical hierarchy process(AHP).Eight factors affecting potential groundwater recharge were determined,namely lithology,soil,slope,topography,land cover/use,runoff,drainage and lineaments.The WRI is between 1.2 and 3.1,which is classified into five classes as poor,weak,moderate,good and very good sites of potential groundwater recharge area.The very good and good classes occupied respectively 27%and 44%of the study area.The potential groundwater recharge rate was 43%of total precipitation.According to the results of the study,river beds are favorable sites for groundwater recharge.展开更多
Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced b...Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced by other important financial indexes across the world such as commodity price and financial technical indicators. This paper systematically investigated four supervised learning models, including Logistic Regression, Gaussian Discriminant Analysis (GDA), Naive Bayes and Support Vector Machine (SVM) in the forecast of S&P 500 index. After several experiments of optimization in features and models, especially the SVM kernel selection and feature selection for different models, this paper concludes that a SVM model with a Radial Basis Function (RBF) kernel can achieve an accuracy rate of 62.51% for the future market trend of the S&P 500 index.展开更多
The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign an...The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign and full-edge randomized models)shuffle both positive and negative topologies at the same time,so it is difficult to distinguish the effect on network topology of positive edges,negative edges,and the correlation between them.In this study,we construct three re-fined edge-randomized null models by only randomizing link relationships without changing positive and negative degree distributions.The results of nontrivial statistical indicators of signed networks,such as average degree connectivity and clustering coefficient,show that the position of positive edges has a stronger effect on positive-edge topology,while the signs of negative edges have a greater influence on negative-edge topology.For some specific statistics(e.g.,embeddedness),the results indicate that the proposed null models can more accurately describe real-life networks compared with the two existing ones,which can be selected to facilitate a better understanding of complex structures,functions,and dynamical behaviors on signed networks.展开更多
The cause-effect relationship is not always possible to trace in GCMs because of the simultaneous inclusion of several highly complex physical processes. Furthermore, the inter-GCM differences are large and there is n...The cause-effect relationship is not always possible to trace in GCMs because of the simultaneous inclusion of several highly complex physical processes. Furthermore, the inter-GCM differences are large and there is no simple way to reconcile them. So, simple climate models, like statistical-dynamical models (SDMs), appear to be useful in this context. This kind of models is essentially mechanistic, being directed towards understanding the dependence of a particular mechanism on the other parameters of the problem. In this paper, the utility of SDMs for studies of climate change is discussed in some detail. We show that these models are an indispensable part of hierarchy of climate models.展开更多
The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the w...The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the web pages is evaluated using several carefully selected statistical models. Universities web pages are chosen as subjects for this work for ease of comparison and ease of collecting data. A series of experiments has been conducted to investigate into the usability and design of the universities web pages. Prototype web pages have been developed according to the structured methodologies of web pages design and usability. Universities web pages were evaluated together with the prototype web pages using a questionnaire which was designed according to the Human Computer Interactions (HCI) heuristics. Nine (users) respondents’ variables and 14 web pages variables (items) were studied. Stringent statistical analysis was adopted to extract the required information to form the data acquired, and augmented interpretation of the statistical results was followed. The results showed that the analysis of variance (ANOVA) procedure showed there were significant differences among the universities web pages regarding most of the 23 items studied. Duncan Multiple Range Test (DMRT) showed that the prototype usability performed significantly better regarding most of the items. The correlation analysis showed significant positive and negative correlations between many items. The regression analysis revealed that the most significant factors (items) that contributed to the best model of the universities web pages design and usability were: multimedia in the web pages, the web pages icons (alone) organisation and design, and graphics attractiveness. The results showed some of the limitations of some heuristics used in conventional interface systems design and proposed some additional heuristics in web pages design and usability.展开更多
The paper deals with the performing of a critical analysis of the problems arising in matching the classical models of the statistical and phenomenological thermodynamics. The performed analysis shows that some concep...The paper deals with the performing of a critical analysis of the problems arising in matching the classical models of the statistical and phenomenological thermodynamics. The performed analysis shows that some concepts of the statistical and phenomenological methods of describing the classical systems do not quite correlate with each other. Particularly, in these methods various caloric ideal gas equations of state are employed, while the possibility existing in the thermodynamic cyclic processes to obtain the same distributions both due to a change of the particle concentration and owing to a change of temperature is not allowed for in the statistical methods. The above-mentioned difference of the equations of state is cleared away when using in the statistical functions corresponding to the canonical Gibbs equations instead of the Planck’s constant a new scale factor that depends on the parameters of a system and coincides with the Planck’s constant in going of the system to the degenerate state. Under such an approach, the statistical entropy is transformed into one of the forms of heat capacity. In its turn, the agreement of the methods under consideration in the question as to the dependence of the molecular distributions on the concentration of particles, apparently, will call for further refinement of the physical model of ideal gas and the techniques for its statistical description.展开更多
Landslide susceptibility mapping is vital for landslide risk management and urban planning.In this study,we used three statistical models[frequency ratio,certainty factor and index of entropy(IOE)]and a machine learni...Landslide susceptibility mapping is vital for landslide risk management and urban planning.In this study,we used three statistical models[frequency ratio,certainty factor and index of entropy(IOE)]and a machine learning model[random forest(RF)]for landslide susceptibility mapping in Wanzhou County,China.First,a landslide inventory map was prepared using earlier geotechnical investigation reports,aerial images,and field surveys.Then,the redundant factors were excluded from the initial fourteen landslide causal factors via factor correlation analysis.To determine the most effective causal factors,landslide susceptibility evaluations were performed based on four cases with different combinations of factors("cases").In the analysis,465(70%)landslide locations were randomly selected for model training,and 200(30%)landslide locations were selected for verification.The results showed that case 3 produced the best performance for the statistical models and that case 2 produced the best performance for the RF model.Finally,the receiver operating characteristic(ROC)curve was used to verify the accuracy of each model's results for its respective optimal case.The ROC curve analysis showed that the machine learning model performed better than the other three models,and among the three statistical models,the IOE model with weight coefficients was superior.展开更多
The spread of an advantageous mutation through a population is of fundamental interest in population genetics. While the classical Moran model is formulated for a well-mixed population, it has long been recognized tha...The spread of an advantageous mutation through a population is of fundamental interest in population genetics. While the classical Moran model is formulated for a well-mixed population, it has long been recognized that in real-world applications, the population usually has an explicit spatial structure which can significantly influence the dynamics. In the context of cancer initiation in epithelial tissue, several recent works have analyzed the dynamics of advantageous mutant spread on integer lattices, using the biased voter model from particle systems theory. In this spatial version of the Moran model, individuals first reproduce according to their fitness and then replace a neighboring individual. From a biological standpoint, the opposite dynamics, where individuals first die and are then replaced by a neighboring individual according to its fitness, are equally relevant. Here, we investigate this death-birth analogue of the biased voter model. We construct the process mathematically, derive the associated dual process, establish bounds on the survival probability of a single mutant, and prove that the process has an asymptotic shape. We also briefly discuss alternative birth-death and death-birth dynamics, depending on how the mutant fitness advantage affects the dynamics. We show that birth-death and death-birth formulations of the biased voter model are equivalent when fitness affects the former event of each update of the model, whereas the birth-death model is fundamentally different from the death-birth model when fitness affects the latter event.展开更多
The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches inc...The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches include coefficient of determination(R2),adjusted coefficient of determination(adj.-R2),root mean squared error(RMSE),Akaike's information criterion(AIC),bias correction of AIC(AICc) and Bayesian information criterion(BIC).The simulation data were generated by five growth models with different numbers of parameters.Four sets of real data were taken from the literature.The parameters in each of the five growth models were estimated using the maximum likelihood method under the assumption of the additive error structure for the data.The best supported model by the data was identified using each of the six approaches.The results show that R2 and RMSE have the same properties and perform worst.The sample size has an effect on the performance of adj.-R2,AIC,AICc and BIC.Adj.-R2 does better in small samples than in large samples.AIC is not suitable to use in small samples and tends to select more complex model when the sample size becomes large.AICc and BIC have best performance in small and large sample cases,respectively.Use of AICc or BIC is recommended for selection of fish growth model according to the size of the length-at-age data.展开更多
Based on the review and comparison of main statistical analysis models for estimating variety-environment cell means in regional crop trials, a new statistical model, LR-PCA composite model was proposed, and the predi...Based on the review and comparison of main statistical analysis models for estimating variety-environment cell means in regional crop trials, a new statistical model, LR-PCA composite model was proposed, and the predictive precision of these models were compared by cross validation of an example data. Results showed that the order of model precision was LR-PCA model > AMMI model > PCA model > Treatment Means (TM) model > Linear Regression (LR) model > Additive Main Effects ANOVA model. The precision gain factor of LR-PCA model was 1.55, increasing by 8.4% compared with AMMI.展开更多
This work correlated the detailed work zone location and time data from the Wis LCS system with the five-min inductive loop detector data. One-sample percentile value test and two-sample Kolmogorov-Smirnov(K-S) test w...This work correlated the detailed work zone location and time data from the Wis LCS system with the five-min inductive loop detector data. One-sample percentile value test and two-sample Kolmogorov-Smirnov(K-S) test were applied to compare the speed and flow characteristics between work zone and non-work zone conditions. Furthermore, we analyzed the mobility characteristics of freeway work zones within the urban area of Milwaukee, WI, USA. More than 50% of investigated work zones have experienced speed reduction and 15%-30% is necessary reduced volumes. Speed reduction was more significant within and at the downstream of work zones than at the upstream.展开更多
QTL mapping for seven quality traits was conducted by using 254 recombinant inbred lines (RIL) derived from a japonica-japonica rice cross of Xiushui 79/C Bao. The seven traits investigated were grain length (GL),...QTL mapping for seven quality traits was conducted by using 254 recombinant inbred lines (RIL) derived from a japonica-japonica rice cross of Xiushui 79/C Bao. The seven traits investigated were grain length (GL), grain length to width ratio (LWR), chalk grain rate (CGR), chalkiness degree (CD), gelatinization temperature (GT), amylose content (AC) and gel consistency (GC) of head rice. Three mapping methods employed were composite interval mapping in QTLMapper 2.0 software based on mixed linear model (MCIM), inclusive composite interval mapping in QTL IciMapping 3.0 software based on stepwise regression linear model (ICIM) and multiple interval mapping with regression forward selection in Windows QTL Cartographer 2.5 based on multiple regression analysis (MIMR). Results showed that five QTLs with additive effect (A-QTLs) were detected by all the three methods simultaneously, two by two methods simultaneously, and 23 by only one method. Five A-QTLs were detected by MCIM, nine by ICIM and 28 by MIMR. The contribution rates of single A-QTL ranged from 0.89% to 38.07%. All the QTLs with epistatic effect (E-QTLs) detected by MIMR were not detected by the other two methods. Fourteen pairs of E-QTLs were detected by both MCIM and ICIM, and 142 pairs of E-QTLs were detected by only one method. Twenty-five pairs of E-QTLs were detected by MCIM, 141 pairs by ICIM and four pairs by MIMR. The contribution rates of single pair of E-QTL were from 2.60% to 23.78%. In the Xiu-Bao RIL population, epistatic effect played a major role in the variation of GL and CD, and additive effect was the dominant in the variation of LWR, while both epistatic effect and additive effect had equal importance in the variation of CGR, AC, GT and GC. QTLs detected by two or more methods simultaneously were highly reliable, and could be applied to improve the quality traits in japonica hybrid rice.展开更多
文摘In this work, four empirical models of statistical thickness, namely the models of Harkins and Jura, Hasley, Carbon Black and Jaroniec, were compared in order to determine the textural properties (external surface and surface of micropores) of a clay concrete without molasses and clay concretes stabilized with 8%, 12% and 16% molasses. The results obtained show that Hasley’s model can be used to obtain the external surfaces. However, it does not allow the surface of the micropores to be obtained, and is not suitable for the case of simple clay concrete (without molasses) and for clay concretes stabilized with molasses. The Carbon Black, Jaroniec and Harkins and Jura models can be used for clay concrete and stabilized clay concrete. However, the Carbon Black model is the most relevant for clay concrete and the Harkins and Jura model is for molasses-stabilized clay concrete. These last two models augur well for future research.
文摘This study aims to reveal the spatial structural characteristics of 1,652 Ethnic-Minority Villages(EMV)in China and to analyze the mechanisms driving their spatial heterogeneity.EMV are a special type of settlement space that preserve a large number of historical traces of the ethnic culture of ancient China.They are important carriers of China’s excellent traditional culture and are key to the implementation of rural revitalization strategies.In this study,1652 EMV in China were selected as the research subjects.The Nearest Neighbor Index,kernel density,and spatial autocorrelation index were employed to reveal the spatial structural characteristics of minority villages.Neural network models,spatial lag models,and geographical detectors were used to analyze the formation mechanism of spatial heterogeneity in EMV.The results indicate that:(1)EMV exhibit significant spatial differentiation characterized by“single-core with multiple surrounding sub-centers,”“polarization between east and west,”“decreasing quantity from southwest to east coast to northeast to northwest,”and“large dispersion with small agglomeration.”(2)EMV are mainly distributed in areas rich in intangible cultural heritage,with high vegetation coverage and low altitude,far from central cities,and having limited arable land and an underdeveloped economy and transportation,particularly in shaded or riverbank areas.(3)Distance from the nearest river(X3),distance from central cities(X8),national intangible cultural heritage(X9),and NDVI(X10)were the main driving factors affecting the spatial distribution of EMV,whereas elevation(X1)and GDP(X5)had the weakest influence.As EMV are a relatively unique territorial spatial unit,the identification of their spatial heterogeneity characteristics not only deepens the research content of settlement geography,but also involves the assessment,protection,and development of Minority Villages,which is of great significance for the inheritance and utilization of excellent ethnic cultures in the era.
基金funded by the"Genetic improvement of pig survival"project from Danish Pig Levy Foundation (Aarhus,Denmark)The China Scholarship Council (CSC)for providing scholarship to the first author。
文摘Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore the impact of genotyping strategies and statistical models on the accuracy of genomic prediction for survival in pigs during the total growing period from birth to slaughter.Results:We simulated pig populations with different direct and maternal heritabilities and used a linear mixed model,a logit model,and a probit model to predict genomic breeding values of pig survival based on data of individual survival records with binary outcomes(0,1).The results show that in the case of only alive animals having genotype data,unbiased genomic predictions can be achieved when using variances estimated from pedigreebased model.Models using genomic information achieved up to 59.2%higher accuracy of estimated breeding value compared to pedigree-based model,dependent on genotyping scenarios.The scenario of genotyping all individuals,both dead and alive individuals,obtained the highest accuracy.When an equal number of individuals(80%)were genotyped,random sample of individuals with genotypes achieved higher accuracy than only alive individuals with genotypes.The linear model,logit model and probit model achieved similar accuracy.Conclusions:Our conclusion is that genomic prediction of pig survival is feasible in the situation that only alive pigs have genotypes,but genomic information of dead individuals can increase accuracy of genomic prediction by 2.06%to 6.04%.
基金The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the project number RI-44-0525.
文摘COVID-19 has significantly impacted the growth prediction of a pandemic,and it is critical in determining how to battle and track the disease progression.In this case,COVID-19 data is a time-series dataset that can be projected using different methodologies.Thus,this work aims to gauge the spread of the outbreak severity over time.Furthermore,data analytics and Machine Learning(ML)techniques are employed to gain a broader understanding of virus infections.We have simulated,adjusted,and fitted several statistical time-series forecasting models,linearML models,and nonlinear ML models.Examples of these models are Logistic Regression,Lasso,Ridge,ElasticNet,Huber Regressor,Lasso Lars,Passive Aggressive Regressor,K-Neighbors Regressor,Decision Tree Regressor,Extra Trees Regressor,Support Vector Regressions(SVR),AdaBoost Regressor,Random Forest Regressor,Bagging Regressor,AuoRegression,MovingAverage,Gradient Boosting Regressor,Autoregressive Moving Average(ARMA),Auto-Regressive Integrated Moving Averages(ARIMA),SimpleExpSmoothing,Exponential Smoothing,Holt-Winters,Simple Moving Average,Weighted Moving Average,Croston,and naive Bayes.Furthermore,our suggested methodology includes the development and evaluation of ensemble models built on top of the best-performing statistical and ML-based prediction methods.A third stage in the proposed system is to examine three different implementations to determine which model delivers the best performance.Then,this best method is used for future forecasts,and consequently,we can collect the most accurate and dependable predictions.
文摘The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 stations. The methodological approach is based on the statistical modeling of maximum daily rainfall. Adjustments were made on several sample sizes and several return periods (2, 5, 10, 20, 50 and 100 years). The main results have shown that the 30 years series (1931-1960;1961-1990;1991-2020) are better adjusted by the Gumbel (26.92% - 53.85%) and Inverse Gamma (26.92% - 46.15%). Concerning the 60-years series (1931-1990;1961-2020), they are better adjusted by the Inverse Gamma (30.77%), Gamma (15.38% - 46.15%) and Gumbel (15.38% - 42.31%). The full chronicle 1931-2020 (90 years) presents a notable supremacy of 50% of Gumbel model over the Gamma (34.62%) and Gamma Inverse (15.38%) model. It is noted that the Gumbel is the most dominant model overall and more particularly in wet periods. The data for periods with normal and dry trends were better fitted by Gamma and Inverse Gamma.
基金National Natural Science Foundation of China ( No.60803078)National High Technology Research and Development Programs of China (No.2006AA010107, No.2006AA010108)
文摘This paper proposed a method to incorporate syntax-based language models in phrase-based statistical machine translation (SMT) systems. The syntax-based language model used in this paper is based on link grammar,which is a high lexical formalism. In order to apply language models based on link grammar in phrase-based models,the concept of linked phrases,an extension of the concept of traditional phrases in phrase-based models was brought out. Experiments were conducted and the results showed that the use of syntax-based language models could improve the performance of the phrase-based models greatly.
基金the bread wheat project of the Dryland Agricultural Research Institute (DARI)supported by the Agricultural Research and Education Organization (AREO) of Iran
文摘Several statistical methods have been developed for analyzing genotype×environment(GE)interactions in crop breeding programs to identify genotypes with high yield and stability performances.Four statistical methods,including joint regression analysis(JRA),additive mean effects and multiplicative interaction(AMMI)analysis,genotype plus GE interaction(GGE)biplot analysis,and yield–stability(YSi)statistic were used to evaluate GE interaction in20 winter wheat genotypes grown in 24 environments in Iran.The main objective was to evaluate the rank correlations among the four statistical methods in genotype rankings for yield,stability and yield–stability.Three kinds of genotypic ranks(yield ranks,stability ranks,and yield–stability ranks)were determined with each method.The results indicated the presence of GE interaction,suggesting the need for stability analysis.With respect to yield,the genotype rankings by the GGE biplot and AMMI analysis were significantly correlated(P<0.01).For stability ranking,the rank correlations ranged from 0.53(GGE–YSi;P<0.05)to0.97(JRA–YSi;P<0.01).AMMI distance(AMMID)was highly correlated(P<0.01)with variance of regression deviation(S2di)in JRA(r=0.83)and Shukla stability variance(σ2)in YSi(r=0.86),indicating that these stability indices can be used interchangeably.No correlation was found between yield ranks and stability ranks(AMMID,S2di,σ2,and GGE stability index),indicating that they measure static stability and accordingly could be used if selection is based primarily on stability.For yield–stability,rank correlation coefficients among the statistical methods varied from 0.64(JRA–YSi;P<0.01)to 0.89(AMMI–YSi;P<0.01),indicating that AMMI and YSi were closely associated in the genotype ranking for integrating yield with stability performance.Based on the results,it can be concluded that YSi was closely correlated with(i)JRA in ranking genotypes for stability and(ii)AMMI for integrating yield and stability.
基金Supported by the Guangdong Technological Program (2009B02001002)the Special Funds of National Agricultural Department for Commonweal Trade Research (nyhyzx07-019)the Earmarked Fund for Modern Agro-industry Technology Research System~~
文摘[Objective] The study aimed to compare several statistical analysis models for estimating the sugarcane (Saccharum spp.) genotypic stability. [Method] The data of sugarcane regional trials in Guangdong, in 2009 was analyzed by three models respectively: Finlay and Wilkinson model: the additive main effects and multiplicative interaction (AMMI) model and linear regression-principal components analysis (LR- PCA) model, so as to compare the models. [Result] The Finlay and Wilkinson model was easier, but the analysis of the other two models was more comprehensive, and there was a bit difference between the additive main effects and multiplicative inter- action (AMMI) model and linear regression-principal components analysis (LR-PCA) model. [Conclusion] In practice, while the proper statistical method was usually con- sidered according to the different data, it should be also considered that the same data should be analyzed with different statistical methods in order to get a more reasonable result by comparison.
文摘The water resources of the Nadhour-Sisseb-El Alem Basin in Tunisia exhibit semi-arid and arid climatic conditions.This induces an excessive pumping of groundwater,which creates drops in water level ranging about 1-2 m/a.Indeed,these unfavorable conditions require interventions to rationalize integrated management in decision making.The aim of this study is to determine a water recharge index(WRI),delineate the potential groundwater recharge area and estimate the potential groundwater recharge rate based on the integration of statistical models resulted from remote sensing imagery,GIS digital data(e.g.,lithology,soil,runoff),measured artificial recharge data,fuzzy set theory and multi-criteria decision making(MCDM)using the analytical hierarchy process(AHP).Eight factors affecting potential groundwater recharge were determined,namely lithology,soil,slope,topography,land cover/use,runoff,drainage and lineaments.The WRI is between 1.2 and 3.1,which is classified into five classes as poor,weak,moderate,good and very good sites of potential groundwater recharge area.The very good and good classes occupied respectively 27%and 44%of the study area.The potential groundwater recharge rate was 43%of total precipitation.According to the results of the study,river beds are favorable sites for groundwater recharge.
文摘Forecasting the movement of stock market is a long-time attractive topic. This paper implements different statistical learning models to predict the movement of S&P 500 index. The S&P 500 index is influenced by other important financial indexes across the world such as commodity price and financial technical indicators. This paper systematically investigated four supervised learning models, including Logistic Regression, Gaussian Discriminant Analysis (GDA), Naive Bayes and Support Vector Machine (SVM) in the forecast of S&P 500 index. After several experiments of optimization in features and models, especially the SVM kernel selection and feature selection for different models, this paper concludes that a SVM model with a Radial Basis Function (RBF) kernel can achieve an accuracy rate of 62.51% for the future market trend of the S&P 500 index.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61773091 and 61603073)the LiaoNing Revitalization Talents Program(Grant No.XLYC1807106)the Natural Science Foundation of Liaoning Province,China(Grant No.2020-MZLH-22).
文摘The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign and full-edge randomized models)shuffle both positive and negative topologies at the same time,so it is difficult to distinguish the effect on network topology of positive edges,negative edges,and the correlation between them.In this study,we construct three re-fined edge-randomized null models by only randomizing link relationships without changing positive and negative degree distributions.The results of nontrivial statistical indicators of signed networks,such as average degree connectivity and clustering coefficient,show that the position of positive edges has a stronger effect on positive-edge topology,while the signs of negative edges have a greater influence on negative-edge topology.For some specific statistics(e.g.,embeddedness),the results indicate that the proposed null models can more accurately describe real-life networks compared with the two existing ones,which can be selected to facilitate a better understanding of complex structures,functions,and dynamical behaviors on signed networks.
文摘The cause-effect relationship is not always possible to trace in GCMs because of the simultaneous inclusion of several highly complex physical processes. Furthermore, the inter-GCM differences are large and there is no simple way to reconcile them. So, simple climate models, like statistical-dynamical models (SDMs), appear to be useful in this context. This kind of models is essentially mechanistic, being directed towards understanding the dependence of a particular mechanism on the other parameters of the problem. In this paper, the utility of SDMs for studies of climate change is discussed in some detail. We show that these models are an indispensable part of hierarchy of climate models.
文摘The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the web pages is evaluated using several carefully selected statistical models. Universities web pages are chosen as subjects for this work for ease of comparison and ease of collecting data. A series of experiments has been conducted to investigate into the usability and design of the universities web pages. Prototype web pages have been developed according to the structured methodologies of web pages design and usability. Universities web pages were evaluated together with the prototype web pages using a questionnaire which was designed according to the Human Computer Interactions (HCI) heuristics. Nine (users) respondents’ variables and 14 web pages variables (items) were studied. Stringent statistical analysis was adopted to extract the required information to form the data acquired, and augmented interpretation of the statistical results was followed. The results showed that the analysis of variance (ANOVA) procedure showed there were significant differences among the universities web pages regarding most of the 23 items studied. Duncan Multiple Range Test (DMRT) showed that the prototype usability performed significantly better regarding most of the items. The correlation analysis showed significant positive and negative correlations between many items. The regression analysis revealed that the most significant factors (items) that contributed to the best model of the universities web pages design and usability were: multimedia in the web pages, the web pages icons (alone) organisation and design, and graphics attractiveness. The results showed some of the limitations of some heuristics used in conventional interface systems design and proposed some additional heuristics in web pages design and usability.
文摘The paper deals with the performing of a critical analysis of the problems arising in matching the classical models of the statistical and phenomenological thermodynamics. The performed analysis shows that some concepts of the statistical and phenomenological methods of describing the classical systems do not quite correlate with each other. Particularly, in these methods various caloric ideal gas equations of state are employed, while the possibility existing in the thermodynamic cyclic processes to obtain the same distributions both due to a change of the particle concentration and owing to a change of temperature is not allowed for in the statistical methods. The above-mentioned difference of the equations of state is cleared away when using in the statistical functions corresponding to the canonical Gibbs equations instead of the Planck’s constant a new scale factor that depends on the parameters of a system and coincides with the Planck’s constant in going of the system to the degenerate state. Under such an approach, the statistical entropy is transformed into one of the forms of heat capacity. In its turn, the agreement of the methods under consideration in the question as to the dependence of the molecular distributions on the concentration of particles, apparently, will call for further refinement of the physical model of ideal gas and the techniques for its statistical description.
基金the projects ‘‘The risk assessment of geological hazards induced by reservoir water level fluctuation in Chongqing, Three-Gorges Reservoir, China.’’ (No. 2016065135)‘‘The study of mechanism and forecast criterion of the gentle-dip landslides in The Three Gorges Reservoir Region, China’’ (No. 41572292) funded by the National Natural Science Foundation of China
文摘Landslide susceptibility mapping is vital for landslide risk management and urban planning.In this study,we used three statistical models[frequency ratio,certainty factor and index of entropy(IOE)]and a machine learning model[random forest(RF)]for landslide susceptibility mapping in Wanzhou County,China.First,a landslide inventory map was prepared using earlier geotechnical investigation reports,aerial images,and field surveys.Then,the redundant factors were excluded from the initial fourteen landslide causal factors via factor correlation analysis.To determine the most effective causal factors,landslide susceptibility evaluations were performed based on four cases with different combinations of factors("cases").In the analysis,465(70%)landslide locations were randomly selected for model training,and 200(30%)landslide locations were selected for verification.The results showed that case 3 produced the best performance for the statistical models and that case 2 produced the best performance for the RF model.Finally,the receiver operating characteristic(ROC)curve was used to verify the accuracy of each model's results for its respective optimal case.The ROC curve analysis showed that the machine learning model performed better than the other three models,and among the three statistical models,the IOE model with weight coefficients was superior.
基金supported in part by the NIH grant R01CA241134supported in part by the NSF grant CMMI-1552764+3 种基金supported in part by the NSF grants DMS-1349724 and DMS-2052465supported in part by the NSF grant CCF-1740761supported in part by the U.S.-Norway Fulbright Foundation and the Research Council of Norway R&D Grant 309273supported in part by the Norwegian Centennial Chair grant and the Doctoral Dissertation Fellowship from the University of Minnesota.
文摘The spread of an advantageous mutation through a population is of fundamental interest in population genetics. While the classical Moran model is formulated for a well-mixed population, it has long been recognized that in real-world applications, the population usually has an explicit spatial structure which can significantly influence the dynamics. In the context of cancer initiation in epithelial tissue, several recent works have analyzed the dynamics of advantageous mutant spread on integer lattices, using the biased voter model from particle systems theory. In this spatial version of the Moran model, individuals first reproduce according to their fitness and then replace a neighboring individual. From a biological standpoint, the opposite dynamics, where individuals first die and are then replaced by a neighboring individual according to its fitness, are equally relevant. Here, we investigate this death-birth analogue of the biased voter model. We construct the process mathematically, derive the associated dual process, establish bounds on the survival probability of a single mutant, and prove that the process has an asymptotic shape. We also briefly discuss alternative birth-death and death-birth dynamics, depending on how the mutant fitness advantage affects the dynamics. We show that birth-death and death-birth formulations of the biased voter model are equivalent when fitness affects the former event of each update of the model, whereas the birth-death model is fundamentally different from the death-birth model when fitness affects the latter event.
基金Supported by the High Technology Research and Development Program of China (863 Program,No2006AA100301)
文摘The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches include coefficient of determination(R2),adjusted coefficient of determination(adj.-R2),root mean squared error(RMSE),Akaike's information criterion(AIC),bias correction of AIC(AICc) and Bayesian information criterion(BIC).The simulation data were generated by five growth models with different numbers of parameters.Four sets of real data were taken from the literature.The parameters in each of the five growth models were estimated using the maximum likelihood method under the assumption of the additive error structure for the data.The best supported model by the data was identified using each of the six approaches.The results show that R2 and RMSE have the same properties and perform worst.The sample size has an effect on the performance of adj.-R2,AIC,AICc and BIC.Adj.-R2 does better in small samples than in large samples.AIC is not suitable to use in small samples and tends to select more complex model when the sample size becomes large.AICc and BIC have best performance in small and large sample cases,respectively.Use of AICc or BIC is recommended for selection of fish growth model according to the size of the length-at-age data.
文摘Based on the review and comparison of main statistical analysis models for estimating variety-environment cell means in regional crop trials, a new statistical model, LR-PCA composite model was proposed, and the predictive precision of these models were compared by cross validation of an example data. Results showed that the order of model precision was LR-PCA model > AMMI model > PCA model > Treatment Means (TM) model > Linear Regression (LR) model > Additive Main Effects ANOVA model. The precision gain factor of LR-PCA model was 1.55, increasing by 8.4% compared with AMMI.
基金Project(61620106002)supported by the National Natural Science Foundation of ChinaProject(2016YFB0100906)supported by the National Key R&D Program in China+1 种基金Project(2015364X16030)supported by the Information Technology Research Project of Ministry of Transport of ChinaProject(2242015K42132)supported by the Fundamental Sciences of Southeast University,China
文摘This work correlated the detailed work zone location and time data from the Wis LCS system with the five-min inductive loop detector data. One-sample percentile value test and two-sample Kolmogorov-Smirnov(K-S) test were applied to compare the speed and flow characteristics between work zone and non-work zone conditions. Furthermore, we analyzed the mobility characteristics of freeway work zones within the urban area of Milwaukee, WI, USA. More than 50% of investigated work zones have experienced speed reduction and 15%-30% is necessary reduced volumes. Speed reduction was more significant within and at the downstream of work zones than at the upstream.
基金supported by the National High Technology Research and Development Program of China (Grant No. 2010AA101301)the Program of Introducing International Advanced Agricultural Science and Technology in China (Grant No. 2006-G8[4]-31-1)the Program of Science-Technology Basis and Conditional Platform in China (Grant No. 505005)
文摘QTL mapping for seven quality traits was conducted by using 254 recombinant inbred lines (RIL) derived from a japonica-japonica rice cross of Xiushui 79/C Bao. The seven traits investigated were grain length (GL), grain length to width ratio (LWR), chalk grain rate (CGR), chalkiness degree (CD), gelatinization temperature (GT), amylose content (AC) and gel consistency (GC) of head rice. Three mapping methods employed were composite interval mapping in QTLMapper 2.0 software based on mixed linear model (MCIM), inclusive composite interval mapping in QTL IciMapping 3.0 software based on stepwise regression linear model (ICIM) and multiple interval mapping with regression forward selection in Windows QTL Cartographer 2.5 based on multiple regression analysis (MIMR). Results showed that five QTLs with additive effect (A-QTLs) were detected by all the three methods simultaneously, two by two methods simultaneously, and 23 by only one method. Five A-QTLs were detected by MCIM, nine by ICIM and 28 by MIMR. The contribution rates of single A-QTL ranged from 0.89% to 38.07%. All the QTLs with epistatic effect (E-QTLs) detected by MIMR were not detected by the other two methods. Fourteen pairs of E-QTLs were detected by both MCIM and ICIM, and 142 pairs of E-QTLs were detected by only one method. Twenty-five pairs of E-QTLs were detected by MCIM, 141 pairs by ICIM and four pairs by MIMR. The contribution rates of single pair of E-QTL were from 2.60% to 23.78%. In the Xiu-Bao RIL population, epistatic effect played a major role in the variation of GL and CD, and additive effect was the dominant in the variation of LWR, while both epistatic effect and additive effect had equal importance in the variation of CGR, AC, GT and GC. QTLs detected by two or more methods simultaneously were highly reliable, and could be applied to improve the quality traits in japonica hybrid rice.