We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be c...We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.展开更多
In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For exampl...In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.展开更多
Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-v...Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.展开更多
We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and ...We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.展开更多
特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和...特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和卡方检验的负相关、干扰计算等问题,导致分类准确率不高.本文首先针对低频词引起的信息增益和卡方检验的负相关、干扰计算等问题进行研究,分别引入平衡因子和词频因子来提高算法的准确率;其次,根据微博信息传播的特点,结合改进的IG算法和CHI算法,提出了一种基于BIG-WFCHI(Balance Information Gain-Word Frequency CHI-square test)的特征选择方法.实验分析中,本文采用基于最大熵模型、支持向量机、朴素贝叶斯分类器、KNN和多层感知器5种分类器对两个异构数据集进行了测试.实验结果表明,本文提出的方法能有效消除无关特征和冗余特征,提高分类精度,并减少运算时间.展开更多
In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the ...In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the determination of the covariance matrix and its inverse, the Moore-Penrose matrix, our approach, avoids this step. We also show that the Cochran statistic in ANOVA two ways is equivalent to conventional homogeneity statistics test. In particular, we show that it satisfies the invariance property. Finally, we conduct empirical verification from a meta-analysis that confirms our theoretical results.展开更多
In goodness-of-fit tests,Pearson's chi-squared test is one of most widely used tools of formal statistical analysis.However,Pearson's chi-squared test depends on the partition of the sample space.Different con...In goodness-of-fit tests,Pearson's chi-squared test is one of most widely used tools of formal statistical analysis.However,Pearson's chi-squared test depends on the partition of the sample space.Different constructions of the partition of the sample space may lead to different conclusions.Based on an equiprobable partition of sample space,a modified chi-squared test is proposed.A method for constructing the modified chi-squared test is proposed.As an application,the proposed test is used to test whether vectorial data come from an uniformity distribution defined on the hypersphere.Some simulation studies show that the modified chi-squared test against different alternative is robust.展开更多
The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of c...The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of classes varying with sample size in the test has attached more and more attention.However,in this situation,there is not theoretical results for the asymptotic property of such chi-squared test statistic.This paper proves the consistency of chi-squared test with varying number of classes under some conditions.Meanwhile,the authors also give a convergence rate of KolmogorovSimirnov distance between the test statistic and corresponding chi-square distributed random variable.In addition,a real example and simulation results validate the reasonability of theoretical result and the superiority of chi-squared test with varying number of classes.展开更多
基金the National Natural Science Foundation of China (10571139)
文摘We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.
文摘In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.
文摘Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.
文摘We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.
文摘特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和卡方检验的负相关、干扰计算等问题,导致分类准确率不高.本文首先针对低频词引起的信息增益和卡方检验的负相关、干扰计算等问题进行研究,分别引入平衡因子和词频因子来提高算法的准确率;其次,根据微博信息传播的特点,结合改进的IG算法和CHI算法,提出了一种基于BIG-WFCHI(Balance Information Gain-Word Frequency CHI-square test)的特征选择方法.实验分析中,本文采用基于最大熵模型、支持向量机、朴素贝叶斯分类器、KNN和多层感知器5种分类器对两个异构数据集进行了测试.实验结果表明,本文提出的方法能有效消除无关特征和冗余特征,提高分类精度,并减少运算时间.
文摘In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the determination of the covariance matrix and its inverse, the Moore-Penrose matrix, our approach, avoids this step. We also show that the Cochran statistic in ANOVA two ways is equivalent to conventional homogeneity statistics test. In particular, we show that it satisfies the invariance property. Finally, we conduct empirical verification from a meta-analysis that confirms our theoretical results.
基金Foundation item: the Natural Science Foundation of Beijing (No. 1062001)Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality(No. 05006011200702).
Acknowledgements The authors cordially thank the Associate Editor and Reviewers for their constructive comments which lead to improvement of the manuscript. They are also very grateful to Prof. Adelaide Figueiredo for his help.
文摘In goodness-of-fit tests,Pearson's chi-squared test is one of most widely used tools of formal statistical analysis.However,Pearson's chi-squared test depends on the partition of the sample space.Different constructions of the partition of the sample space may lead to different conclusions.Based on an equiprobable partition of sample space,a modified chi-squared test is proposed.A method for constructing the modified chi-squared test is proposed.As an application,the proposed test is used to test whether vectorial data come from an uniformity distribution defined on the hypersphere.Some simulation studies show that the modified chi-squared test against different alternative is robust.
基金supported by the Natural Science Foundation of China under Grant Nos.11071022,11028103,11231010,11471223,BCMIISthe Beijing Municipal Educational Commission Foundation under Grant Nos.KZ201410028030,KM201210028005Jishou University Subject in 2014(No:14JD035)
文摘The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of classes varying with sample size in the test has attached more and more attention.However,in this situation,there is not theoretical results for the asymptotic property of such chi-squared test statistic.This paper proves the consistency of chi-squared test with varying number of classes under some conditions.Meanwhile,the authors also give a convergence rate of KolmogorovSimirnov distance between the test statistic and corresponding chi-square distributed random variable.In addition,a real example and simulation results validate the reasonability of theoretical result and the superiority of chi-squared test with varying number of classes.