This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s in...This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s input and output are fuzzy numbers, and the regression coefficients are clear numbers. This paper considers the parameter estimation and impact analysis based on data deletion. Through the study of example and comparison with other models, it can be concluded that the model in this paper is applied easily and better.展开更多
Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed...Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed by McIntyre[1952.A method for unbiased selective sampling,using ranked sets.Australian Journal of Agricultural Research 3,385-390]as an effective way to estimate the pasture mean.In the current paper,a modification of ranked set sampling called moving extremes ranked set sampling(MERSS)is considered for the best linear unbiased estimators(BLUEs)for the simple linear regression model.The BLUEs for this model under MERSS are derived.The BLUEs under MERSS are shown to be markedly more efficient for normal data when compared with the BLUEs under simple random sampling.展开更多
This paper selects seven indicators of financial revenue and housing sales price in recent 19 years in China,and uses SPSS and Excel to carry out descriptive statistics,independent sample t-test,correlation analysis a...This paper selects seven indicators of financial revenue and housing sales price in recent 19 years in China,and uses SPSS and Excel to carry out descriptive statistics,independent sample t-test,correlation analysis and regression analysis to comprehensively study the correlation between financial revenue and housing sales price in China,and establishes the relationship between financial revenue and housing sales price When the average selling price of commercial housing increases by one unit,the fiscal revenue will increase by 27.855 points.展开更多
In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual nor...In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual normality assumption. It is well known that, in general, there is no closed form for the probability density function of stable distributions. However, under a Bayesian approach, the use of a latent or auxiliary random variable gives some simplification to obtain any posterior distribution when related to stable distributions. To show the usefulness of the computational aspects, the methodology is applied to two examples: one is related to a standard linear regression model with an explanatory variable and the other is related to a simulated data set assuming a 23 factorial experiment. Posterior summaries of interest are obtained using MCMC (Markov Chain Monte Carlo) methods and the OpenBugs software.展开更多
In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to...In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
Mathematical modeling of economic indices is a challenging topic in crop production systems.The present study aimed to model the economic indices of mechanized and semimechanized rainfed wheat production systems using...Mathematical modeling of economic indices is a challenging topic in crop production systems.The present study aimed to model the economic indices of mechanized and semimechanized rainfed wheat production systems using various multiple linear regression models.The study area was Behshahr County located in the east of Mazandaran Province,Northern Iran.The statistical population included all wheat producers in Behshahr County in 2016/17 crop year.Five input variables were human labor,machinery,diesel fuel,chemical(chemical fertilizers and chemical pesticides)costs,and the income was considered to be the output.The results showed that the cost of wheat production in the semimechanized system was higher than that of the mechanized system.In both systems,the highest cost was related to agricultural machinery input.Moreover,seed cost was lower in the mechanized system than that of the semi-mechanized system.The net return indicator was 993.68$ha1 and 626.71$ha1 for the mechanized and semi-mechanized systems,respectively.The average benefit to cost ratio was 3.46 and 2.40 for the mechanized and semi-mechanized systems,respectively,demonstrating the greater profitability of the mechanized system.The results of the evaluation of five types of regression models including the Cobb-Douglas,linear,2FI,quadratic and pure-quadratic for the mechanized and semi-mechanized production systems indicated that in the developed Cobb-Douglas model,the R2-value was higher than that of the quadratic model while RMSE and MAPE of the quadratic model were determined to be smaller than that of the Cobb-Douglas model.Therefore,the best model to investigate the relationship between input costs and the income of wheat production in both mechanized and semi-mechanized systems was the quadratic model.展开更多
The paper investigates the sequential observations’ variance change in linear regression model. The procedure is based on a detection function constructed by residual squares of CUSUM and a boundary function which is...The paper investigates the sequential observations’ variance change in linear regression model. The procedure is based on a detection function constructed by residual squares of CUSUM and a boundary function which is designed so that the test has a small probability of false alarm and asymptotic power one. Simulation results show our monitoring procedure performs well when variance change occurs shortly after the monitoring time. The method is still feasible for regression coefficients change or both variance and regression coefficients change problem.展开更多
This paper concerns computational problems of the concave penalized linear regression model.We propose a fixed point iterative algorithm to solve the computational problem based on the fact that the penalized estimato...This paper concerns computational problems of the concave penalized linear regression model.We propose a fixed point iterative algorithm to solve the computational problem based on the fact that the penalized estimator satisfies a fixed point equation.The convergence property of the proposed algorithm is established.Numerical studies are conducted to evaluate the finite sample performance of the proposed algorithm.展开更多
In this paper, we introduce a generalized Liu estimator and jackknifed Liu estimator in a linear regression model with correlated or heteroscedastic errors. Therefore, we extend the Liu estimator. Under the mean squar...In this paper, we introduce a generalized Liu estimator and jackknifed Liu estimator in a linear regression model with correlated or heteroscedastic errors. Therefore, we extend the Liu estimator. Under the mean square error(MSE), the jackknifed estimator is superior to the Liu estimator and the jackknifed ridge estimator. We also give a method to select the biasing parameter for d. Furthermore, a numerical example is given to illustvate these theoretical results.展开更多
In this paper, we have constructed a random weighting statistic to approximate thedistribution of studentized least square estimator in a linear regression model with ideal accuracyo(n<sup>-1/2</sup>). Thu...In this paper, we have constructed a random weighting statistic to approximate thedistribution of studentized least square estimator in a linear regression model with ideal accuracyo(n<sup>-1/2</sup>). Thus, we have provided a more practical distribution approximating method.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Observed rainfall is a very essential parameter for the analysis of rainfall,day to day weather forecast and its validation.The observed rainfall data is only available from five observatories of IMD;while no rainfall...Observed rainfall is a very essential parameter for the analysis of rainfall,day to day weather forecast and its validation.The observed rainfall data is only available from five observatories of IMD;while no rainfall data is available at various important locations in and around Delhi-NCR.However,the 24-hour rainfall data observed by Doppler Weather Radar(DWR)for entire Delhi and surrounding region(up to 150 km)is readily available in a pictorial form.In this paper,efforts have been made to derive/estimate the rainfall at desired locations using DWR hydrological products.Firstly,the rainfall at desired locations has been estimated from the precipitation accumulation product(PAC)of the DWR using image processing in Python language.After this,a linear regression model using the least square method has been developed in R language.Estimated and observed rainfall data of year 2018(July,August and September)was used to train the model.After this,the model was tested on rainfall data of year 2019(July,August and September)and validated.With the use of linear regression model,the error in mean rainfall estimation reduced by 46.58% and the error in max rainfall estimation reduced by 84.53% for the year 2019.The error in mean rainfall estimation reduced by 81.36% and the error in max rainfall estimation reduced by 33.81%for the year 2018.Thus,the rainfall can be estimated with a fair degree of accuracy at desired locations within the range of the Doppler Weather Radar using the radar rainfall products and the developed linear regression model.展开更多
The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecologic...The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecological environment changes and even global changes.Based on field measurements,combined with Linear Regression(LR)model and Inverse Distance Weighing(IDW)method,this paper presents detailed analysis on the change history and trend of the shoreline and tidal flat in Bohai Bay.The shoreline faces a high erosion chance under the action of natural factors,while the tidal flat faces a different erosion and deposition patterns in Bohai Bay due to the impact of human activities.The implication of change rule for ecological protection and recovery is also discussed.Measures should be taken to protect the coastal ecological environment.The models used in this paper show a high correlation coefficient between observed and modeling data,which means that this method can be used to predict the changing trend of shoreline and tidal flat.The research results of present study can provide scientific supports for future coastal protection and management.展开更多
Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have dev...Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have developed from the logistic regression model,the geographical weighted logistic regression model,the Lasso regression model,the random forest model,and the support vector machine model based on historical forest fire data from 2000 to 2019 in Jilin Province.The models,along with a distribution map are presented in this paper to provide a theoretical basis for forest fire management in this area.Existing studies show that the prediction accuracies of the two machine learning models are higher than those of the three generalized linear regression models.The accuracies of the random forest model,the support vector machine model,geographical weighted logistic regression model,the Lasso regression model,and logistic model were 88.7%,87.7%,86.0%,85.0%and 84.6%,respectively.Weather is the main factor affecting forest fires,while the impacts of topography factors,human and social-economic factors on fire occurrence were similar.展开更多
The nonlinearity of the strain energy at an interval period of applying seismic load on the geostructures makes it difficult for a seismic designer to makes appropriate engineering judgments timely.The nonlinear stres...The nonlinearity of the strain energy at an interval period of applying seismic load on the geostructures makes it difficult for a seismic designer to makes appropriate engineering judgments timely.The nonlinear stress and strain analysis of an embankment is needed to evaluate by using a combination of suitable methods.In this study,a large-scale geostructure was seismically simulated and analyzed using the nonlinear finite element method(NFEM),and linear regression method which is a soft computing technique(SC)was applied for evaluating the results of NFEM,and it supports engineering judgment because the design of the geostructures is usually considered to be an inaccurate process owing to high nonlinearity of the large-scale geostructures seismic response and such nonlinearity may induce the complexity for decision making in geostructures seismic design.The occurrence of nonlinear stress and nonlinear strain probability distribution can be observed and density of stress and strain are predicted by using the histogram.The results of both the simulation from the NFEM and the linear regression method confirm the nonlinearity of strain energy and stress behavior have a close value of R2 and root-mean-square error(RMSE).The linear regression and histogram simulation shows the accuracy of NFEM results.The outcome of this study guides to improve engineering judgment quality for seismic analysis of an embankment through validating results of NFEM by employing appropriate soft computing techniques.展开更多
Glacier response patterns at the catchment scale are highly heterogeneous and defined by a complex interplay of various dynamics and surface factors.Previous studies have explained heterogeneous responses in qualitati...Glacier response patterns at the catchment scale are highly heterogeneous and defined by a complex interplay of various dynamics and surface factors.Previous studies have explained heterogeneous responses in qualitative ways but quantitative assessment is lacking yet where an intrazone homogeneous climate assumption can be valid.Hence,in the current study,the reason for heterogeneous mass balance has been explained in quantitative methods using a multiple linear regression model in the Sikkim Himalayan region.At first,the topographical parameters are selected from previously published studies,then the most significant topographical and geomorphological parameters are selected with backward stepwise subset selection methods.Finally,the contributions of selected parameters are calculated by least square methods.The results show that,the magnitude of mass balance lies between-0.003±0.24 to-1.029±0.24 m.w.e.a^(-1) between 2000 and 2020 in the Sikkim Himalaya region.Also,the study shows that,out of the terminus type of the glacier,glacier area,debris cover,ice-mixed debris,slope,aspect,mean elevation,and snout elevation of the glaciers,only the terminus type and mean elevation of the glacier are significantly altering the glacier mass balance in the Sikkim Himalayan region.Mathematically,the mass loss is approximately 0.40 m.w.e.a^(-1) higher in the lake-terminating glaciers compared to the land-terminating glaciers in the same elevation zone.On the other hand,a thousand meters mean elevation drop is associated with 0.179 m.w.e.a-1of mass loss despite the terminus type of the glaciers.In the current study,the model using the terminus type of the glaciers and the mean elevation of the glaciers explains 76% of fluctuation of mass balance in the Sikkim Himalayan region.展开更多
Human biometric analysis has gotten much attention due to itswidespread use in different research areas, such as security, surveillance,health, human identification, and classification. Human gait is one of the keyhum...Human biometric analysis has gotten much attention due to itswidespread use in different research areas, such as security, surveillance,health, human identification, and classification. Human gait is one of the keyhuman traits that can identify and classify humans based on their age, gender,and ethnicity. Different approaches have been proposed for the estimation ofhuman age based on gait so far. However, challenges are there, for which anefficient, low-cost technique or algorithm is needed. In this paper, we proposea three-dimensional real-time gait-based age detection system using a machinelearning approach. The proposed system consists of training and testingphases. The proposed training phase consists of gait features extraction usingthe Microsoft Kinect (MS Kinect) controller, dataset generation based onjoints’ position, pre-processing of gait features, feature selection by calculatingthe Standard error and Standard deviation of the arithmetic mean and bestmodel selection using R2 and adjusted R2 techniques. T-test and ANOVAtechniques show that nine joints (right shoulder, right elbow, right hand, leftknee, right knee, right ankle, left ankle, left, and right foot) are statisticallysignificant at a 5% level of significance for age estimation. The proposedtesting phase correctly predicts the age of a walking person using the resultsobtained from the training phase. The proposed approach is evaluated on thedata that is experimentally recorded from the user in a real-time scenario.Fifty (50) volunteers of different ages participated in the experimental study.Using the limited features, the proposed method estimates the age with 98.0%accuracy on experimental images acquired in real-time via a classical generallinear regression model.展开更多
文摘This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s input and output are fuzzy numbers, and the regression coefficients are clear numbers. This paper considers the parameter estimation and impact analysis based on data deletion. Through the study of example and comparison with other models, it can be concluded that the model in this paper is applied easily and better.
基金Supported by the National Natural Science Foundation of China(11901236)the Scientific Research Fund of Hunan Provincial Science and Technology Department(2019JJ50479)+3 种基金the Scientific Research Fund of Hunan Provincial Education Department(18B322)the Winning Bid Project of Hunan Province for the 4th National Economic Census([2020]1)the Young Core Teacher Foundation of Hunan Province([2020]43)the Funda-mental Research Fund of Xiangxi Autonomous Prefecture(2018SF5026)。
文摘Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed by McIntyre[1952.A method for unbiased selective sampling,using ranked sets.Australian Journal of Agricultural Research 3,385-390]as an effective way to estimate the pasture mean.In the current paper,a modification of ranked set sampling called moving extremes ranked set sampling(MERSS)is considered for the best linear unbiased estimators(BLUEs)for the simple linear regression model.The BLUEs for this model under MERSS are derived.The BLUEs under MERSS are shown to be markedly more efficient for normal data when compared with the BLUEs under simple random sampling.
基金Thank you for your valuable comments and suggestions.This research was supported by Yunnan applied basic research project(NO.2017FD150)Chuxiong Normal University General Research Project(NO.XJYB2001).
文摘This paper selects seven indicators of financial revenue and housing sales price in recent 19 years in China,and uses SPSS and Excel to carry out descriptive statistics,independent sample t-test,correlation analysis and regression analysis to comprehensively study the correlation between financial revenue and housing sales price in China,and establishes the relationship between financial revenue and housing sales price When the average selling price of commercial housing increases by one unit,the fiscal revenue will increase by 27.855 points.
基金financial support from the Brazilian Institution Conselho Nacional de Desenvolvimento Cientifico e Tecnologico(CNPq).
文摘In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual normality assumption. It is well known that, in general, there is no closed form for the probability density function of stable distributions. However, under a Bayesian approach, the use of a latent or auxiliary random variable gives some simplification to obtain any posterior distribution when related to stable distributions. To show the usefulness of the computational aspects, the methodology is applied to two examples: one is related to a standard linear regression model with an explanatory variable and the other is related to a simulated data set assuming a 23 factorial experiment. Posterior summaries of interest are obtained using MCMC (Markov Chain Monte Carlo) methods and the OpenBugs software.
基金Supported by the National Social Science Foundation of China(Grant No.22BTJ059)。
文摘In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
文摘Mathematical modeling of economic indices is a challenging topic in crop production systems.The present study aimed to model the economic indices of mechanized and semimechanized rainfed wheat production systems using various multiple linear regression models.The study area was Behshahr County located in the east of Mazandaran Province,Northern Iran.The statistical population included all wheat producers in Behshahr County in 2016/17 crop year.Five input variables were human labor,machinery,diesel fuel,chemical(chemical fertilizers and chemical pesticides)costs,and the income was considered to be the output.The results showed that the cost of wheat production in the semimechanized system was higher than that of the mechanized system.In both systems,the highest cost was related to agricultural machinery input.Moreover,seed cost was lower in the mechanized system than that of the semi-mechanized system.The net return indicator was 993.68$ha1 and 626.71$ha1 for the mechanized and semi-mechanized systems,respectively.The average benefit to cost ratio was 3.46 and 2.40 for the mechanized and semi-mechanized systems,respectively,demonstrating the greater profitability of the mechanized system.The results of the evaluation of five types of regression models including the Cobb-Douglas,linear,2FI,quadratic and pure-quadratic for the mechanized and semi-mechanized production systems indicated that in the developed Cobb-Douglas model,the R2-value was higher than that of the quadratic model while RMSE and MAPE of the quadratic model were determined to be smaller than that of the Cobb-Douglas model.Therefore,the best model to investigate the relationship between input costs and the income of wheat production in both mechanized and semi-mechanized systems was the quadratic model.
基金Supported by the National Natural Science Foundation of China (Grant Nos.60972150 10926197)the Scienceand Technology Innovation Foundation of Northwestern Polytechnical University (Grant No.2007KJ01033)
文摘The paper investigates the sequential observations’ variance change in linear regression model. The procedure is based on a detection function constructed by residual squares of CUSUM and a boundary function which is designed so that the test has a small probability of false alarm and asymptotic power one. Simulation results show our monitoring procedure performs well when variance change occurs shortly after the monitoring time. The method is still feasible for regression coefficients change or both variance and regression coefficients change problem.
基金Supported by the National Natural Science Foundation of China(11701571)
文摘This paper concerns computational problems of the concave penalized linear regression model.We propose a fixed point iterative algorithm to solve the computational problem based on the fact that the penalized estimator satisfies a fixed point equation.The convergence property of the proposed algorithm is established.Numerical studies are conducted to evaluate the finite sample performance of the proposed algorithm.
基金Supported by the National Natural Science Foundation of China(11071022)Science and Technology Project of Hubei Provincial Department of Education(Q20122202)
文摘In this paper, we introduce a generalized Liu estimator and jackknifed Liu estimator in a linear regression model with correlated or heteroscedastic errors. Therefore, we extend the Liu estimator. Under the mean square error(MSE), the jackknifed estimator is superior to the Liu estimator and the jackknifed ridge estimator. We also give a method to select the biasing parameter for d. Furthermore, a numerical example is given to illustvate these theoretical results.
基金Supported by the Doctoral Program Foundation of the Institute of Higher Educationthe National Natural Science Foundation of China.
文摘In this paper, we have constructed a random weighting statistic to approximate thedistribution of studentized least square estimator in a linear regression model with ideal accuracyo(n<sup>-1/2</sup>). Thus, we have provided a more practical distribution approximating method.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘Observed rainfall is a very essential parameter for the analysis of rainfall,day to day weather forecast and its validation.The observed rainfall data is only available from five observatories of IMD;while no rainfall data is available at various important locations in and around Delhi-NCR.However,the 24-hour rainfall data observed by Doppler Weather Radar(DWR)for entire Delhi and surrounding region(up to 150 km)is readily available in a pictorial form.In this paper,efforts have been made to derive/estimate the rainfall at desired locations using DWR hydrological products.Firstly,the rainfall at desired locations has been estimated from the precipitation accumulation product(PAC)of the DWR using image processing in Python language.After this,a linear regression model using the least square method has been developed in R language.Estimated and observed rainfall data of year 2018(July,August and September)was used to train the model.After this,the model was tested on rainfall data of year 2019(July,August and September)and validated.With the use of linear regression model,the error in mean rainfall estimation reduced by 46.58% and the error in max rainfall estimation reduced by 84.53% for the year 2019.The error in mean rainfall estimation reduced by 81.36% and the error in max rainfall estimation reduced by 33.81%for the year 2018.Thus,the rainfall can be estimated with a fair degree of accuracy at desired locations within the range of the Doppler Weather Radar using the radar rainfall products and the developed linear regression model.
基金supported by the National Natural Science Foundation of China (41602205, 42293261)the China Geological Survey Program (DD20189506, DD20211301)+2 种基金the Special Investigation Project on Science and Technology Basic Resources of the Ministry of Science and Technology (2021FY101003)the Central Guidance for Local Scientific and Technological Development Fund of 2023the Project of Hebei University of Environmental Engineering (GCY202301)
文摘The change processes and trends of shoreline and tidal flat forced by human activities are essential issues for the sustainability of coastal area,which is also of great significance for understanding coastal ecological environment changes and even global changes.Based on field measurements,combined with Linear Regression(LR)model and Inverse Distance Weighing(IDW)method,this paper presents detailed analysis on the change history and trend of the shoreline and tidal flat in Bohai Bay.The shoreline faces a high erosion chance under the action of natural factors,while the tidal flat faces a different erosion and deposition patterns in Bohai Bay due to the impact of human activities.The implication of change rule for ecological protection and recovery is also discussed.Measures should be taken to protect the coastal ecological environment.The models used in this paper show a high correlation coefficient between observed and modeling data,which means that this method can be used to predict the changing trend of shoreline and tidal flat.The research results of present study can provide scientific supports for future coastal protection and management.
基金This research was funded by the National Natural Science Foundation of China(grant no.32271881).
文摘Forest fires are natural disasters that can occur suddenly and can be very damaging,burning thousands of square kilometers.Prevention is better than suppression and prediction models of forest fire occurrence have developed from the logistic regression model,the geographical weighted logistic regression model,the Lasso regression model,the random forest model,and the support vector machine model based on historical forest fire data from 2000 to 2019 in Jilin Province.The models,along with a distribution map are presented in this paper to provide a theoretical basis for forest fire management in this area.Existing studies show that the prediction accuracies of the two machine learning models are higher than those of the three generalized linear regression models.The accuracies of the random forest model,the support vector machine model,geographical weighted logistic regression model,the Lasso regression model,and logistic model were 88.7%,87.7%,86.0%,85.0%and 84.6%,respectively.Weather is the main factor affecting forest fires,while the impacts of topography factors,human and social-economic factors on fire occurrence were similar.
文摘The nonlinearity of the strain energy at an interval period of applying seismic load on the geostructures makes it difficult for a seismic designer to makes appropriate engineering judgments timely.The nonlinear stress and strain analysis of an embankment is needed to evaluate by using a combination of suitable methods.In this study,a large-scale geostructure was seismically simulated and analyzed using the nonlinear finite element method(NFEM),and linear regression method which is a soft computing technique(SC)was applied for evaluating the results of NFEM,and it supports engineering judgment because the design of the geostructures is usually considered to be an inaccurate process owing to high nonlinearity of the large-scale geostructures seismic response and such nonlinearity may induce the complexity for decision making in geostructures seismic design.The occurrence of nonlinear stress and nonlinear strain probability distribution can be observed and density of stress and strain are predicted by using the histogram.The results of both the simulation from the NFEM and the linear regression method confirm the nonlinearity of strain energy and stress behavior have a close value of R2 and root-mean-square error(RMSE).The linear regression and histogram simulation shows the accuracy of NFEM results.The outcome of this study guides to improve engineering judgment quality for seismic analysis of an embankment through validating results of NFEM by employing appropriate soft computing techniques.
文摘Glacier response patterns at the catchment scale are highly heterogeneous and defined by a complex interplay of various dynamics and surface factors.Previous studies have explained heterogeneous responses in qualitative ways but quantitative assessment is lacking yet where an intrazone homogeneous climate assumption can be valid.Hence,in the current study,the reason for heterogeneous mass balance has been explained in quantitative methods using a multiple linear regression model in the Sikkim Himalayan region.At first,the topographical parameters are selected from previously published studies,then the most significant topographical and geomorphological parameters are selected with backward stepwise subset selection methods.Finally,the contributions of selected parameters are calculated by least square methods.The results show that,the magnitude of mass balance lies between-0.003±0.24 to-1.029±0.24 m.w.e.a^(-1) between 2000 and 2020 in the Sikkim Himalaya region.Also,the study shows that,out of the terminus type of the glacier,glacier area,debris cover,ice-mixed debris,slope,aspect,mean elevation,and snout elevation of the glaciers,only the terminus type and mean elevation of the glacier are significantly altering the glacier mass balance in the Sikkim Himalayan region.Mathematically,the mass loss is approximately 0.40 m.w.e.a^(-1) higher in the lake-terminating glaciers compared to the land-terminating glaciers in the same elevation zone.On the other hand,a thousand meters mean elevation drop is associated with 0.179 m.w.e.a-1of mass loss despite the terminus type of the glaciers.In the current study,the model using the terminus type of the glaciers and the mean elevation of the glaciers explains 76% of fluctuation of mass balance in the Sikkim Himalayan region.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups RGP.2/212/1443.
文摘Human biometric analysis has gotten much attention due to itswidespread use in different research areas, such as security, surveillance,health, human identification, and classification. Human gait is one of the keyhuman traits that can identify and classify humans based on their age, gender,and ethnicity. Different approaches have been proposed for the estimation ofhuman age based on gait so far. However, challenges are there, for which anefficient, low-cost technique or algorithm is needed. In this paper, we proposea three-dimensional real-time gait-based age detection system using a machinelearning approach. The proposed system consists of training and testingphases. The proposed training phase consists of gait features extraction usingthe Microsoft Kinect (MS Kinect) controller, dataset generation based onjoints’ position, pre-processing of gait features, feature selection by calculatingthe Standard error and Standard deviation of the arithmetic mean and bestmodel selection using R2 and adjusted R2 techniques. T-test and ANOVAtechniques show that nine joints (right shoulder, right elbow, right hand, leftknee, right knee, right ankle, left ankle, left, and right foot) are statisticallysignificant at a 5% level of significance for age estimation. The proposedtesting phase correctly predicts the age of a walking person using the resultsobtained from the training phase. The proposed approach is evaluated on thedata that is experimentally recorded from the user in a real-time scenario.Fifty (50) volunteers of different ages participated in the experimental study.Using the limited features, the proposed method estimates the age with 98.0%accuracy on experimental images acquired in real-time via a classical generallinear regression model.