Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers ...Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.展开更多
The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model ...The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.展开更多
Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limita...Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limitation of measuring methods. Such outliers pose challenges for data-powered applications such as data assimilation, statistical analysis of pollution characteristics and ensemble forecasting. Here, a fully automatic outlier detection method was developed based on the probability of residuals, which are the discrepancies between the observed and the estimated concentration values. The estimation can be conducted using filtering—or regressions when appropriate—to discriminate four types of outliers characterized by temporal and spatial inconsistency, instrument-induced low variances, periodic calibration exceptions, and less PM_(10) than PM_(2.5) in concentration observations, respectively. This probabilistic method was applied to detect all four types of outliers in hourly surface measurements of six pollutants(PM_(2.5), PM_(10),SO_2,NO_2,CO and O_3) from 1436 stations of the China National Environmental Monitoring Network during 2014-16. Among the measurements, 0.65%-5.68% are marked as outliers. with PM_(10) and CO more prone to outliers. Our method successfully identifies a trend of decreasing outliers from 2014 to 2016,which corresponds to known improvements in the quality assurance and quality control procedures of the China National Environmental Monitoring Network. The outliers can have a significant impact on the annual mean concentrations of PM_(2.5),with differences exceeding 10 μg m^(-3) at 66 sites.展开更多
A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of ou...A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.展开更多
基金Science Council of Taiwan Province under Grant Nos.NSC 96-2628-E-366-004-MY2 and 96-2628-E-132-001-MY2
文摘Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.
文摘The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.
基金supported by the National Natural Science Foundation (Grant Nos.91644216 and 41575128)the CAS Information Technology Program (Grant No.XXH13506-302)Guangdong Provincial Science and Technology Development Special Fund (No.2017B020216007)
文摘Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limitation of measuring methods. Such outliers pose challenges for data-powered applications such as data assimilation, statistical analysis of pollution characteristics and ensemble forecasting. Here, a fully automatic outlier detection method was developed based on the probability of residuals, which are the discrepancies between the observed and the estimated concentration values. The estimation can be conducted using filtering—or regressions when appropriate—to discriminate four types of outliers characterized by temporal and spatial inconsistency, instrument-induced low variances, periodic calibration exceptions, and less PM_(10) than PM_(2.5) in concentration observations, respectively. This probabilistic method was applied to detect all four types of outliers in hourly surface measurements of six pollutants(PM_(2.5), PM_(10),SO_2,NO_2,CO and O_3) from 1436 stations of the China National Environmental Monitoring Network during 2014-16. Among the measurements, 0.65%-5.68% are marked as outliers. with PM_(10) and CO more prone to outliers. Our method successfully identifies a trend of decreasing outliers from 2014 to 2016,which corresponds to known improvements in the quality assurance and quality control procedures of the China National Environmental Monitoring Network. The outliers can have a significant impact on the annual mean concentrations of PM_(2.5),with differences exceeding 10 μg m^(-3) at 66 sites.
基金supported by the National Basic Research Program (973) of China (No. 2004CB117306)the Hi-Tech Research and Devel-opment Program (863) of China (No. 2006AA10A102)
文摘A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.