In case of heteroscedasticity, a Generalized Minimum Perpendicular Distance Square (GMPDS) method has been suggested instead of traditionally used Generalized Least Square (GLS) method to fit a regression line, with a...In case of heteroscedasticity, a Generalized Minimum Perpendicular Distance Square (GMPDS) method has been suggested instead of traditionally used Generalized Least Square (GLS) method to fit a regression line, with an aim to get a better fitted regression line, so that the estimated line will be closest one to the observed points. Mathematical form of the estimator for the parameters has been presented. A logical argument behind the relationship between the slopes of the lines and has been placed.展开更多
MDM (minimum distance method) is a very popular algorithm in staterecognition. But it has a presupposition, that is, the distance within one class must be shorterenough than the distance between classes. When this pre...MDM (minimum distance method) is a very popular algorithm in staterecognition. But it has a presupposition, that is, the distance within one class must be shorterenough than the distance between classes. When this presupposition is not satisfied, the method isno longer valid. In order to overcome the shortcomings of MDM, an improved minimum distance method(IMDM) based on ANN (artificial neural networks) is presented. The simulation results demonstratethat IMDM has two advantages, that is, the rate of recognition is faster and the accuracy ofrecognition is higher compared with MDM.展开更多
The evaluation of the minimum distance of linear block codes remains an open problem in coding theory, and it is not easy to determine its true value by classical methods, for this reason the problem has been solved i...The evaluation of the minimum distance of linear block codes remains an open problem in coding theory, and it is not easy to determine its true value by classical methods, for this reason the problem has been solved in the literature with heuristic techniques such as genetic algorithms and local search algorithms. In this paper we propose two approaches to attack the hardness of this problem. The first approach is based on genetic algorithms and it yield to good results comparing to another work based also on genetic algorithms. The second approach is based on a new randomized algorithm which we call 'Multiple Impulse Method (MIM)', where the principle is to search codewords locally around the all-zero codeword perturbed by a minimum level of noise, anticipating that the resultant nearest nonzero codewords will most likely contain the minimum Hamming-weight codeword whose Hamming weight is equal to the minimum distance of the linear code.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
A class of pseudo distances is used to derive test statistics using transformed data or spacings for testing goodness-of-fit for parametric models. These statistics can be considered as density based statistics and ex...A class of pseudo distances is used to derive test statistics using transformed data or spacings for testing goodness-of-fit for parametric models. These statistics can be considered as density based statistics and expressible as simple functions of spacings. It is known that when the null hypothesis is simple, the statistics follow asymptotic normal distributions without unknown parameters. In this paper we emphasize results for the null composite hypothesis: the parameters can be estimated by a generalized spacing method (GSP) first which is equivalent to minimize a pseudo distance from the class which is considered;subsequently the estimated parameters are used to replace the parameters in the pseudo distance used for estimation;goodness-of-fit statistics for the composite hypothesis can be constructed and shown to have again an asymptotic normal distribution without unknown parameters. Since these statistics are related to a discrepancy measure, these tests can be shown to be consistent in general. Furthermore, due to the simplicity of these statistics and they come a no extra cost after fitting the model, they can be considered as alternative statistics to chi-square statistics which require a choice of intervals and statistics based on empirical distribution (EDF) using the original data with a complicated null distribution which might depend on the parametric family being considered and also might depend on the vector of true parameters but EDF tests might be more powerful against some specific models which are specified by the alternative hypothesis.展开更多
文摘In case of heteroscedasticity, a Generalized Minimum Perpendicular Distance Square (GMPDS) method has been suggested instead of traditionally used Generalized Least Square (GLS) method to fit a regression line, with an aim to get a better fitted regression line, so that the estimated line will be closest one to the observed points. Mathematical form of the estimator for the parameters has been presented. A logical argument behind the relationship between the slopes of the lines and has been placed.
基金This work was financially supported by the National Natural Science Foundation of China under the contract No.69372031.]
文摘MDM (minimum distance method) is a very popular algorithm in staterecognition. But it has a presupposition, that is, the distance within one class must be shorterenough than the distance between classes. When this presupposition is not satisfied, the method isno longer valid. In order to overcome the shortcomings of MDM, an improved minimum distance method(IMDM) based on ANN (artificial neural networks) is presented. The simulation results demonstratethat IMDM has two advantages, that is, the rate of recognition is faster and the accuracy ofrecognition is higher compared with MDM.
文摘The evaluation of the minimum distance of linear block codes remains an open problem in coding theory, and it is not easy to determine its true value by classical methods, for this reason the problem has been solved in the literature with heuristic techniques such as genetic algorithms and local search algorithms. In this paper we propose two approaches to attack the hardness of this problem. The first approach is based on genetic algorithms and it yield to good results comparing to another work based also on genetic algorithms. The second approach is based on a new randomized algorithm which we call 'Multiple Impulse Method (MIM)', where the principle is to search codewords locally around the all-zero codeword perturbed by a minimum level of noise, anticipating that the resultant nearest nonzero codewords will most likely contain the minimum Hamming-weight codeword whose Hamming weight is equal to the minimum distance of the linear code.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘A class of pseudo distances is used to derive test statistics using transformed data or spacings for testing goodness-of-fit for parametric models. These statistics can be considered as density based statistics and expressible as simple functions of spacings. It is known that when the null hypothesis is simple, the statistics follow asymptotic normal distributions without unknown parameters. In this paper we emphasize results for the null composite hypothesis: the parameters can be estimated by a generalized spacing method (GSP) first which is equivalent to minimize a pseudo distance from the class which is considered;subsequently the estimated parameters are used to replace the parameters in the pseudo distance used for estimation;goodness-of-fit statistics for the composite hypothesis can be constructed and shown to have again an asymptotic normal distribution without unknown parameters. Since these statistics are related to a discrepancy measure, these tests can be shown to be consistent in general. Furthermore, due to the simplicity of these statistics and they come a no extra cost after fitting the model, they can be considered as alternative statistics to chi-square statistics which require a choice of intervals and statistics based on empirical distribution (EDF) using the original data with a complicated null distribution which might depend on the parametric family being considered and also might depend on the vector of true parameters but EDF tests might be more powerful against some specific models which are specified by the alternative hypothesis.