Modeling non coding background sequences appropriately is important for the detection of regulatory elements from DNA sequences. Based on the chi square statistic test, some explanations about why to choose higher ...Modeling non coding background sequences appropriately is important for the detection of regulatory elements from DNA sequences. Based on the chi square statistic test, some explanations about why to choose higher order Markov chain model and how to automatically select the proper order are given in this paper. The chi square test is first run on synthetic data sets to show that it can efficiently find the proper order of Markov chain. Using chi square test, distinct higher order context dependences inherent in ten sets of sequences of yeast S.cerevisiae from other literature have been found. So the Markov chain with higher order would be more suitable for modeling the non coding background sequences than an independent model.展开更多
We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be c...We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.展开更多
In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For exampl...In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.展开更多
Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-v...Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.展开更多
We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and ...We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.展开更多
In detecting system fault algorithms,the false alarm rate and undectect rate generated by residual Chi-square test can affect the stability of filters.The paper proposes a fault detection algorithm based on sequential...In detecting system fault algorithms,the false alarm rate and undectect rate generated by residual Chi-square test can affect the stability of filters.The paper proposes a fault detection algorithm based on sequential residual Chi-square test and applies to fault detection of an integrated navigation system.The simulation result shows that the algorithm can accurately detect the fault information of global positioning system(GPS),eliminate the influence of false alarm and missed detection on filter,and enhance fault tolerance of integrated navigation systems.展开更多
特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和...特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和卡方检验的负相关、干扰计算等问题,导致分类准确率不高.本文首先针对低频词引起的信息增益和卡方检验的负相关、干扰计算等问题进行研究,分别引入平衡因子和词频因子来提高算法的准确率;其次,根据微博信息传播的特点,结合改进的IG算法和CHI算法,提出了一种基于BIG-WFCHI(Balance Information Gain-Word Frequency CHI-square test)的特征选择方法.实验分析中,本文采用基于最大熵模型、支持向量机、朴素贝叶斯分类器、KNN和多层感知器5种分类器对两个异构数据集进行了测试.实验结果表明,本文提出的方法能有效消除无关特征和冗余特征,提高分类精度,并减少运算时间.展开更多
In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the ...In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the determination of the covariance matrix and its inverse, the Moore-Penrose matrix, our approach, avoids this step. We also show that the Cochran statistic in ANOVA two ways is equivalent to conventional homogeneity statistics test. In particular, we show that it satisfies the invariance property. Finally, we conduct empirical verification from a meta-analysis that confirms our theoretical results.展开更多
Genetic association studies usually apply the simple chi-square (χ<sup>2</sup>)-test for testing association between a single-nucleotide polymorphism (SNP) and a particular phenotype, assuming the genotyp...Genetic association studies usually apply the simple chi-square (χ<sup>2</sup>)-test for testing association between a single-nucleotide polymorphism (SNP) and a particular phenotype, assuming the genotypes and phenotypes are independent. So, the conventional χ<sup>2</sup>-test does not consider the increased risk of an individual carrying the increasing number of disease responsible allele (a particular genotype). But, the association tests should be performed with the consideration of this disease risk according to the mode of inheritance (additive, dominant, recessive). Practical demonstration of the two possible methods for considering such order or trends in contingency tables of genetic association studies using SNP genotype data is the purpose of this paper. One method is by pooling the genotypes, and the other is scoring the individual genotypes, based on the disease risk according to the inheritance pattern. The results show that the p-values obtained from both the methods are similar for the dominant and recessive models. The other important features of the methods were also extracted using the SNP genotype data for different inheritance patterns.展开更多
针对传统故障检测算法对组合导航系统的缓变故障检测效率不高的问题,提出了一种基于正交性原理的故障检测算法。无故障时,卡尔曼滤波相邻残差满足正交关系,残差正交值为零均值的白噪声序列;有故障时,相邻残差高度相关,残差正交值不满足...针对传统故障检测算法对组合导航系统的缓变故障检测效率不高的问题,提出了一种基于正交性原理的故障检测算法。无故障时,卡尔曼滤波相邻残差满足正交关系,残差正交值为零均值的白噪声序列;有故障时,相邻残差高度相关,残差正交值不满足零均值条件。综上所述,以残差正交值为基础,构建卡方检验算法以实现对故障的检测。残差正交值的特殊结构使其对故障较敏感。实验结果表明,该算法对缓变故障的检测效果优于残差卡方检验算法和渐消序贯概率比检验(sequential probability ratio test,SPRT)算法,提高了组合导航系统的估计精度与可靠性。展开更多
Knowledge on individual’s HIV/AIDS status provides a tool to reduce or avoid HIV transmission, spread and mortalities due to HIV-related illness. However, most people still do not know their HIV status because they a...Knowledge on individual’s HIV/AIDS status provides a tool to reduce or avoid HIV transmission, spread and mortalities due to HIV-related illness. However, most people still do not know their HIV status because they are not willing to test for HIV/AIDS due to various reasons. Hence the aim of this paper is to investigate the effects of various risk factors that are likely to influence decision to ever test for HIV/AIDS. The data used in this paper were obtained from the Ghana Demographic and Health Survey (n = 1828 observations and 32 risk factors). We applied the Chi-Square test statistic and the logistic regression model to the data in order to study the effects of these risk factors on one’s decision to ever test for HIV. STATA version 14.1 and R version 3.5.2 were used to carry out the statistical analyses. Generally, the results show that education, especially higher education significantly (OR = 0.53, 95% = 0.230, 0.837) increases the likelihood to ever test for HIV. Also, the younger the age groups the higher the effect and significance in the likelihood to ever test for HIV. We found that HIV-TB co-infection (OR = 0.53, 95% = 0.165, 0.893), use of condom anytime one has sex (OR = 0.31, 95% = 0.054, 0.573), wealth index (OR = 0.46, 95% = 0.137, 0.791), awareness of HIV transmission during child-delivery, number of partners significantly affect HIV testing. Those with many partners are less likely (OR = -0.26, 95% = -0.504, -0.007) to ever test for HIV and those who know that healthy person may have HIV are more likely (OR = 0.41, 95% = 0.137, 0.679) to ever test for HIV. Age is the common significant risk factor of ever tested for HIV across the 10 regions in Ghana. Resources should be allocated for more education on these significant risk factors in order to help in the fight against HIV-Health related issues.展开更多
文摘Modeling non coding background sequences appropriately is important for the detection of regulatory elements from DNA sequences. Based on the chi square statistic test, some explanations about why to choose higher order Markov chain model and how to automatically select the proper order are given in this paper. The chi square test is first run on synthetic data sets to show that it can efficiently find the proper order of Markov chain. Using chi square test, distinct higher order context dependences inherent in ten sets of sequences of yeast S.cerevisiae from other literature have been found. So the Markov chain with higher order would be more suitable for modeling the non coding background sequences than an independent model.
基金the National Natural Science Foundation of China (10571139)
文摘We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.
文摘In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.
文摘Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.
文摘We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.
基金supported by the National Natural Science Foundation of China(6063403060702066)+1 种基金the Aerospace Science Foundation(20090853013)Fundmental Research Foundation of NWPU(JC201015),Soaring Star of NWPU
文摘In detecting system fault algorithms,the false alarm rate and undectect rate generated by residual Chi-square test can affect the stability of filters.The paper proposes a fault detection algorithm based on sequential residual Chi-square test and applies to fault detection of an integrated navigation system.The simulation result shows that the algorithm can accurately detect the fault information of global positioning system(GPS),eliminate the influence of false alarm and missed detection on filter,and enhance fault tolerance of integrated navigation systems.
文摘特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和卡方检验的负相关、干扰计算等问题,导致分类准确率不高.本文首先针对低频词引起的信息增益和卡方检验的负相关、干扰计算等问题进行研究,分别引入平衡因子和词频因子来提高算法的准确率;其次,根据微博信息传播的特点,结合改进的IG算法和CHI算法,提出了一种基于BIG-WFCHI(Balance Information Gain-Word Frequency CHI-square test)的特征选择方法.实验分析中,本文采用基于最大熵模型、支持向量机、朴素贝叶斯分类器、KNN和多层感知器5种分类器对两个异构数据集进行了测试.实验结果表明,本文提出的方法能有效消除无关特征和冗余特征,提高分类精度,并减少运算时间.
文摘In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the determination of the covariance matrix and its inverse, the Moore-Penrose matrix, our approach, avoids this step. We also show that the Cochran statistic in ANOVA two ways is equivalent to conventional homogeneity statistics test. In particular, we show that it satisfies the invariance property. Finally, we conduct empirical verification from a meta-analysis that confirms our theoretical results.
文摘Genetic association studies usually apply the simple chi-square (χ<sup>2</sup>)-test for testing association between a single-nucleotide polymorphism (SNP) and a particular phenotype, assuming the genotypes and phenotypes are independent. So, the conventional χ<sup>2</sup>-test does not consider the increased risk of an individual carrying the increasing number of disease responsible allele (a particular genotype). But, the association tests should be performed with the consideration of this disease risk according to the mode of inheritance (additive, dominant, recessive). Practical demonstration of the two possible methods for considering such order or trends in contingency tables of genetic association studies using SNP genotype data is the purpose of this paper. One method is by pooling the genotypes, and the other is scoring the individual genotypes, based on the disease risk according to the inheritance pattern. The results show that the p-values obtained from both the methods are similar for the dominant and recessive models. The other important features of the methods were also extracted using the SNP genotype data for different inheritance patterns.
文摘针对传统故障检测算法对组合导航系统的缓变故障检测效率不高的问题,提出了一种基于正交性原理的故障检测算法。无故障时,卡尔曼滤波相邻残差满足正交关系,残差正交值为零均值的白噪声序列;有故障时,相邻残差高度相关,残差正交值不满足零均值条件。综上所述,以残差正交值为基础,构建卡方检验算法以实现对故障的检测。残差正交值的特殊结构使其对故障较敏感。实验结果表明,该算法对缓变故障的检测效果优于残差卡方检验算法和渐消序贯概率比检验(sequential probability ratio test,SPRT)算法,提高了组合导航系统的估计精度与可靠性。
文摘Knowledge on individual’s HIV/AIDS status provides a tool to reduce or avoid HIV transmission, spread and mortalities due to HIV-related illness. However, most people still do not know their HIV status because they are not willing to test for HIV/AIDS due to various reasons. Hence the aim of this paper is to investigate the effects of various risk factors that are likely to influence decision to ever test for HIV/AIDS. The data used in this paper were obtained from the Ghana Demographic and Health Survey (n = 1828 observations and 32 risk factors). We applied the Chi-Square test statistic and the logistic regression model to the data in order to study the effects of these risk factors on one’s decision to ever test for HIV. STATA version 14.1 and R version 3.5.2 were used to carry out the statistical analyses. Generally, the results show that education, especially higher education significantly (OR = 0.53, 95% = 0.230, 0.837) increases the likelihood to ever test for HIV. Also, the younger the age groups the higher the effect and significance in the likelihood to ever test for HIV. We found that HIV-TB co-infection (OR = 0.53, 95% = 0.165, 0.893), use of condom anytime one has sex (OR = 0.31, 95% = 0.054, 0.573), wealth index (OR = 0.46, 95% = 0.137, 0.791), awareness of HIV transmission during child-delivery, number of partners significantly affect HIV testing. Those with many partners are less likely (OR = -0.26, 95% = -0.504, -0.007) to ever test for HIV and those who know that healthy person may have HIV are more likely (OR = 0.41, 95% = 0.137, 0.679) to ever test for HIV. Age is the common significant risk factor of ever tested for HIV across the 10 regions in Ghana. Resources should be allocated for more education on these significant risk factors in order to help in the fight against HIV-Health related issues.