Based on the generalization of the central limit theorem(CLT) to special dependent variables, this paper shows that maximization of the nonGaussianity(NG) measure can separate the statistically dependent source signal...Based on the generalization of the central limit theorem(CLT) to special dependent variables, this paper shows that maximization of the nonGaussianity(NG) measure can separate the statistically dependent source signals, and the novel NG measure is given by Cook's Euclidean distance using the Chebyshev-Hermite series expansion. Then, a novel blind source separation (BSS) algorithm for linear mixed signals is proposed using Cook's NG measure, which makes it possible to separate statistically dependent source signals. Moreover, the proposed separation algorithm can result in the famous FastICA algorithm. Simulation results show that the proposed separation algorithm is able to separate the dependent signals and yield ideal展开更多
Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique develope...Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique developed in this work to simultaneously detect influential data points and select optimal predictor variables. It is an addition to the body of existing literature in this area of study to both having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as conditions of no influence, some sort of influence and perfectly single outlier data point in an entire data set which are proposed in this work. The method is implemented in R by an algorithm that iterates over all data points;deleting data points one at a time while computing BICs and selecting optimal predictors alongside RIMs. From the analyses done using evaporation data to compare the proposed method and the existing methods, the results show that the same data cases selected as having high influences by the two existing methods are also selected by the proposed method. The three methods show same performance;hence the relevance of the BIC-based RIM cannot be undermined.展开更多
In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data...In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data handin neural network in outlier detection.This study builds a GMDH-based outlier detectio model.This model first implements feature selection in the training set L using GMDH neural network.Then a new training set L can be obtained by mapping the selected key feature subset.Next,a linear regression model can be constructed in the set L by ordinary least squares estimation.Further,it eliminates a sample from the set L randomly every time,and then rebuilds a linear regression model.Finally,outlier detection is realized by calculating Cook’s distance for each sample.Four different customer classification datasets are used to conduct experiments.Results show that GOD model can effectively eliminate outliers,and compared with the five existing outlier detection models,it generally performs significantly better.This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model.展开更多
When a real-world data set is fitted to a specific type of models, it is often encountered that one or a set of observations have undue influence on the model fitting, which may lead to misleading conclusions. Therefo...When a real-world data set is fitted to a specific type of models, it is often encountered that one or a set of observations have undue influence on the model fitting, which may lead to misleading conclusions. Therefore, it is necessary for data analysts to identify these influential observations and assess their impact on various aspects of model fitting. In this paper, one type of modified Cook's distances is defined to gauge the influence of one or a set observations on the estimate of the constant coefficient part in partially varying- coefficient models, and the Cook's distances are expressed as functions of the corresponding residuals and leverages. Meanwhile, a bootstrap procedure is suggested to derive the reference values for the proposed Cook's distances. Some simulations are conducted, and a real-world data set is further analyzed to examine the performance of the proposed method. The experimental results are satisfactory.展开更多
基金The National Natural Science Foundation of China (No.60672049)the Science Foundation of Henan University of Technolo-gy(No.06XJC032)
文摘Based on the generalization of the central limit theorem(CLT) to special dependent variables, this paper shows that maximization of the nonGaussianity(NG) measure can separate the statistically dependent source signals, and the novel NG measure is given by Cook's Euclidean distance using the Chebyshev-Hermite series expansion. Then, a novel blind source separation (BSS) algorithm for linear mixed signals is proposed using Cook's NG measure, which makes it possible to separate statistically dependent source signals. Moreover, the proposed separation algorithm can result in the famous FastICA algorithm. Simulation results show that the proposed separation algorithm is able to separate the dependent signals and yield ideal
文摘Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique developed in this work to simultaneously detect influential data points and select optimal predictor variables. It is an addition to the body of existing literature in this area of study to both having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as conditions of no influence, some sort of influence and perfectly single outlier data point in an entire data set which are proposed in this work. The method is implemented in R by an algorithm that iterates over all data points;deleting data points one at a time while computing BICs and selecting optimal predictors alongside RIMs. From the analyses done using evaporation data to compare the proposed method and the existing methods, the results show that the same data cases selected as having high influences by the two existing methods are also selected by the proposed method. The three methods show same performance;hence the relevance of the BIC-based RIM cannot be undermined.
基金partly supported by the Major Project of the National Social Science Foundation of China under Grant No.18VZL006the National Natural Science Foundation of China under Grant Nos.71571126and 71974139+6 种基金the Excellent Youth Foundation of Sichuan Province under Grant No.20JCQN0225the Tianfu Ten-thousand Talents Program of Sichuan Provincethe Excellent Youth Foundation of Sichuan University under Grant No.sksyl201709the Leading Cultivation Talents Program of Sichuan Universitythe Teacher and Student Joint Innovation Project of Business School of Sichuan University under Grant No.LH2018011the2018 Special Project for Cultivation and Innovation of New AcademicQian Platform Talent under Grant No.5772-012。
文摘In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data handin neural network in outlier detection.This study builds a GMDH-based outlier detectio model.This model first implements feature selection in the training set L using GMDH neural network.Then a new training set L can be obtained by mapping the selected key feature subset.Next,a linear regression model can be constructed in the set L by ordinary least squares estimation.Further,it eliminates a sample from the set L randomly every time,and then rebuilds a linear regression model.Finally,outlier detection is realized by calculating Cook’s distance for each sample.Four different customer classification datasets are used to conduct experiments.Results show that GOD model can effectively eliminate outliers,and compared with the five existing outlier detection models,it generally performs significantly better.This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model.
基金the National Natural Science Foundations of China(No.10531030,No.60675013)
文摘When a real-world data set is fitted to a specific type of models, it is often encountered that one or a set of observations have undue influence on the model fitting, which may lead to misleading conclusions. Therefore, it is necessary for data analysts to identify these influential observations and assess their impact on various aspects of model fitting. In this paper, one type of modified Cook's distances is defined to gauge the influence of one or a set observations on the estimate of the constant coefficient part in partially varying- coefficient models, and the Cook's distances are expressed as functions of the corresponding residuals and leverages. Meanwhile, a bootstrap procedure is suggested to derive the reference values for the proposed Cook's distances. Some simulations are conducted, and a real-world data set is further analyzed to examine the performance of the proposed method. The experimental results are satisfactory.