This study aimed to examine the performance of the Siegel-Tukey and Savage tests on data sets with heterogeneous variances. The analysis, considering Normal, Platykurtic, and Skewed distributions and a standard deviat...This study aimed to examine the performance of the Siegel-Tukey and Savage tests on data sets with heterogeneous variances. The analysis, considering Normal, Platykurtic, and Skewed distributions and a standard deviation ratio of 1, was conducted for both small and large sample sizes. For small sample sizes, two main categories were established: equal and different sample sizes. Analyses were performed using Monte Carlo simulations with 20,000 repetitions for each scenario, and the simulations were evaluated using SAS software. For small sample sizes, the I. type error rate of the Siegel-Tukey test generally ranged from 0.045 to 0.055, while the I. type error rate of the Savage test was observed to range from 0.016 to 0.041. Similar trends were observed for Platykurtic and Skewed distributions. In scenarios with different sample sizes, the Savage test generally exhibited lower I. type error rates. For large sample sizes, two main categories were established: equal and different sample sizes. For large sample sizes, the I. type error rate of the Siegel-Tukey test ranged from 0.047 to 0.052, while the I. type error rate of the Savage test ranged from 0.043 to 0.051. In cases of equal sample sizes, both tests generally had lower error rates, with the Savage test providing more consistent results for large sample sizes. In conclusion, it was determined that the Savage test provides lower I. type error rates for small sample sizes and that both tests have similar error rates for large sample sizes. These findings suggest that the Savage test could be a more reliable option when analyzing variance differences.展开更多
Heteroscedasticity and multicollinearity are serious problems when they exist in econometrics data. These problems exist as a result of violating the assumptions of equal variance between the error terms and that of i...Heteroscedasticity and multicollinearity are serious problems when they exist in econometrics data. These problems exist as a result of violating the assumptions of equal variance between the error terms and that of independence between the explanatory variables of the model. With these assumption violations, Ordinary Least Square Estimator</span><span style="font-family:""> </span><span style="font-family:""><span style="font-family:Verdana;">(OLS) will not give best linear unbiased, efficient and consistent estimator. In practice, there are several structures of heteroscedasticity and several methods of heteroscedasticity detection. For better estimation result, best heteroscedasticity detection methods must be determined for any structure of heteroscedasticity in the presence of multicollinearity between the explanatory variables of the model. In this paper we examine the effects of multicollinearity on type I error rates of some methods of heteroscedasticity detection in linear regression model in other to determine the best method of heteroscedasticity detection to use when both problems exist in the model. Nine heteroscedasticity detection methods were considered with seven heteroscedasticity structures. Simulation study was done via a Monte Carlo experiment on a multiple linear regression model with 3 explanatory variables. This experiment was conducted 1000 times with linear model parameters of </span><span style="white-space:nowrap;"><em><span style="font-family:Verdana;">β</span></em><sub><span style="font-family:Verdana;">0</span></sub><span style="font-family:Verdana;"> = 4 , </span><em><span style="font-family:Verdana;">β</span></em><sub><span style="font-family:Verdana;">1</span></sub><span style="font-family:Verdana;"> = 0.4 , </span><em><span style="font-family:Verdana;">β</span></em><sub><span style="font-family:Verdana;">2</span></sub><span style="font-family:Verdana;">= 1.5</span></span></span><span style="font-family:""><span style="font-family:Verdana;"> and </span><em style="font-family:""><span style="font-family:Verdana;">β</span><span style="font-family:Verdana;"><sub>3 </sub></span></em><span style="font-family:Verdana;">= 3.6</span><span style="font-family:Verdana;">. </span><span style="font-family:Verdana;">Five (5) </span><span style="font-family:Verdana;"></span><span style="font-family:Verdana;">levels of</span><span style="white-space:nowrap;font-family:Verdana;"> </span><span style="font-family:Verdana;"></span><span style="font-family:Verdana;">mulicollinearity </span></span><span style="font-family:Verdana;">are </span><span style="font-family:Verdana;">with seven</span><span style="font-family:""> </span><span style="font-family:Verdana;">(7) different sample sizes. The method’s performances were compared with the aids of set confidence interval (C.I</span><span style="font-family:Verdana;">.</span><span style="font-family:Verdana;">) criterion. Results showed that whenever multicollinearity exists in the model with any forms of heteroscedasticity structures, Breusch-Godfrey (BG) test is the best method to determine the existence of heteroscedasticity at all chosen levels of significance.展开更多
In this simulation study, five correlation coefficients, namely, Pearson, Spearman, Kendal Tau, Permutation-based, and Winsorized were compared in terms of Type I error rate and power under different scenarios where t...In this simulation study, five correlation coefficients, namely, Pearson, Spearman, Kendal Tau, Permutation-based, and Winsorized were compared in terms of Type I error rate and power under different scenarios where the underlying distributions of the variables of interest, sample sizes and correlation patterns were varied. Simulation results showed that the Type I error rate and power of Pearson correlation coefficient were negatively affected by the distribution shapes especially for small sample sizes, which was much more pronounced for Spearman Rank and Kendal Tau correlation coefficients especially when sample sizes were small. In general, Permutation-based and Winsorized correlation coefficients are more robust to distribution shapes and correlation patterns, regardless of sample size. In conclusion, when assumptions of Pearson correlation coefficient are not satisfied, Permutation-based and Winsorized correlation coefficients seem to be better alternatives.展开更多
The use of Statistical Hypothesis Testing procedure to determine type I and type II errors was linked to the measurement of sensitivity and specificity in clinical trial test and experimental pathogen detection techni...The use of Statistical Hypothesis Testing procedure to determine type I and type II errors was linked to the measurement of sensitivity and specificity in clinical trial test and experimental pathogen detection techniques. A theoretical analysis of establishing these types of errors was made and compared to determination of False Positive, False Negative, True Positive and True Negative. Experimental laboratory detection methods used to detect Cryptosporidium spp. were used to highlight the relationship between hypothesis testing, sensitivity, specificity and predicted values. The study finds that, sensitivity and specificity for the two laboratory methods used for Cryptosporidium detection were low hence lowering the probability of detecting a “false null hypothesis” for the presence of cryptosporidium in the water samples using either Microscopic or PCR. Nevertheless, both procedures for cryptosporidium detection had higher “true negatives” increasing its probability of failing to reject a “true null hypothesis” with specificity of 1.00 for both Microscopic and PCR laboratory detection methods.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
A maximum test in lieu of forcing a choice between the two dependent samples t-test and Wilcoxon signed-ranks test is proposed. The maximum test, which requires a new table of critical values, maintains nominal α whi...A maximum test in lieu of forcing a choice between the two dependent samples t-test and Wilcoxon signed-ranks test is proposed. The maximum test, which requires a new table of critical values, maintains nominal α while guaranteeing the maximum power of the two constituent tests. Critical values, obtained via Monte Carlo methods, are uniformly smaller than the Bonferroni-Dunn adjustment, giving it power superiority when testing for treatment alternatives of shift in location parameter when data are sampled from non-normal distributions.展开更多
Personal credit scoring is the application of financial risk forecasting. It becomes an even important task as financial institutions have been experiencing serious competition and challenges. In this paper, the techn...Personal credit scoring is the application of financial risk forecasting. It becomes an even important task as financial institutions have been experiencing serious competition and challenges. In this paper, the techniques used for credit scoring are summarized and classified and the new method—ensemble learning model is introduced. This article also discusses some problems in current study. It points out that changing the focus from static credit scoring to dynamic behavioral scoring and maximizing revenue by decreasing the Type I and Type II error are two issues in current study. It also suggested that more complex models cannot always been applied to actual situation. Therefore, how to use the assessment models widely and improve the prediction accuracy is the main task for future research.展开更多
In testing statistical hypotheses, as in other statistical problems, we may be confronted with fuzzy concepts. This paper deals with the problem of testing hypotheses, when the hypotheses are fuzzy and the data are cr...In testing statistical hypotheses, as in other statistical problems, we may be confronted with fuzzy concepts. This paper deals with the problem of testing hypotheses, when the hypotheses are fuzzy and the data are crisp. We first give new definitions for notion of mass (density) probability function with fuzzy parameter, probability of type I and type II errors and then state and prove the sequential probability ratio test, on the basis of these new errors, for testing fuzzy hypotheses. Numerical examples are also provided to illustrate the approach.展开更多
目的:探讨随机化检验(Randomization test)在内部预试验IPS(Internal Pilot Study)自适应设计样本量调整中对I型错误和检验效能的影响.方法:利用蒙特-卡罗(MonteCarlo)法模拟样本量较小时的IPS样本量调整,分别采用随机化检验和t检验分...目的:探讨随机化检验(Randomization test)在内部预试验IPS(Internal Pilot Study)自适应设计样本量调整中对I型错误和检验效能的影响.方法:利用蒙特-卡罗(MonteCarlo)法模拟样本量较小时的IPS样本量调整,分别采用随机化检验和t检验分析最后数据并比较二者对I型错误、检验效能值的影响.结果:重计算的第二阶段样本量波动性较大,t检验不能很好地抑制I型错误,随机化检验能较好的抑制I型错误,检验效能略有降低.结论:在临床试验样本量较小的情况下,内部预试验盲态下样本量调整后随机化检验能保护I型错误不增大,同时保证检验效能亦满足要求.展开更多
文摘This study aimed to examine the performance of the Siegel-Tukey and Savage tests on data sets with heterogeneous variances. The analysis, considering Normal, Platykurtic, and Skewed distributions and a standard deviation ratio of 1, was conducted for both small and large sample sizes. For small sample sizes, two main categories were established: equal and different sample sizes. Analyses were performed using Monte Carlo simulations with 20,000 repetitions for each scenario, and the simulations were evaluated using SAS software. For small sample sizes, the I. type error rate of the Siegel-Tukey test generally ranged from 0.045 to 0.055, while the I. type error rate of the Savage test was observed to range from 0.016 to 0.041. Similar trends were observed for Platykurtic and Skewed distributions. In scenarios with different sample sizes, the Savage test generally exhibited lower I. type error rates. For large sample sizes, two main categories were established: equal and different sample sizes. For large sample sizes, the I. type error rate of the Siegel-Tukey test ranged from 0.047 to 0.052, while the I. type error rate of the Savage test ranged from 0.043 to 0.051. In cases of equal sample sizes, both tests generally had lower error rates, with the Savage test providing more consistent results for large sample sizes. In conclusion, it was determined that the Savage test provides lower I. type error rates for small sample sizes and that both tests have similar error rates for large sample sizes. These findings suggest that the Savage test could be a more reliable option when analyzing variance differences.
文摘Heteroscedasticity and multicollinearity are serious problems when they exist in econometrics data. These problems exist as a result of violating the assumptions of equal variance between the error terms and that of independence between the explanatory variables of the model. With these assumption violations, Ordinary Least Square Estimator</span><span style="font-family:""> </span><span style="font-family:""><span style="font-family:Verdana;">(OLS) will not give best linear unbiased, efficient and consistent estimator. In practice, there are several structures of heteroscedasticity and several methods of heteroscedasticity detection. For better estimation result, best heteroscedasticity detection methods must be determined for any structure of heteroscedasticity in the presence of multicollinearity between the explanatory variables of the model. In this paper we examine the effects of multicollinearity on type I error rates of some methods of heteroscedasticity detection in linear regression model in other to determine the best method of heteroscedasticity detection to use when both problems exist in the model. Nine heteroscedasticity detection methods were considered with seven heteroscedasticity structures. Simulation study was done via a Monte Carlo experiment on a multiple linear regression model with 3 explanatory variables. This experiment was conducted 1000 times with linear model parameters of </span><span style="white-space:nowrap;"><em><span style="font-family:Verdana;">β</span></em><sub><span style="font-family:Verdana;">0</span></sub><span style="font-family:Verdana;"> = 4 , </span><em><span style="font-family:Verdana;">β</span></em><sub><span style="font-family:Verdana;">1</span></sub><span style="font-family:Verdana;"> = 0.4 , </span><em><span style="font-family:Verdana;">β</span></em><sub><span style="font-family:Verdana;">2</span></sub><span style="font-family:Verdana;">= 1.5</span></span></span><span style="font-family:""><span style="font-family:Verdana;"> and </span><em style="font-family:""><span style="font-family:Verdana;">β</span><span style="font-family:Verdana;"><sub>3 </sub></span></em><span style="font-family:Verdana;">= 3.6</span><span style="font-family:Verdana;">. </span><span style="font-family:Verdana;">Five (5) </span><span style="font-family:Verdana;"></span><span style="font-family:Verdana;">levels of</span><span style="white-space:nowrap;font-family:Verdana;"> </span><span style="font-family:Verdana;"></span><span style="font-family:Verdana;">mulicollinearity </span></span><span style="font-family:Verdana;">are </span><span style="font-family:Verdana;">with seven</span><span style="font-family:""> </span><span style="font-family:Verdana;">(7) different sample sizes. The method’s performances were compared with the aids of set confidence interval (C.I</span><span style="font-family:Verdana;">.</span><span style="font-family:Verdana;">) criterion. Results showed that whenever multicollinearity exists in the model with any forms of heteroscedasticity structures, Breusch-Godfrey (BG) test is the best method to determine the existence of heteroscedasticity at all chosen levels of significance.
文摘In this simulation study, five correlation coefficients, namely, Pearson, Spearman, Kendal Tau, Permutation-based, and Winsorized were compared in terms of Type I error rate and power under different scenarios where the underlying distributions of the variables of interest, sample sizes and correlation patterns were varied. Simulation results showed that the Type I error rate and power of Pearson correlation coefficient were negatively affected by the distribution shapes especially for small sample sizes, which was much more pronounced for Spearman Rank and Kendal Tau correlation coefficients especially when sample sizes were small. In general, Permutation-based and Winsorized correlation coefficients are more robust to distribution shapes and correlation patterns, regardless of sample size. In conclusion, when assumptions of Pearson correlation coefficient are not satisfied, Permutation-based and Winsorized correlation coefficients seem to be better alternatives.
文摘The use of Statistical Hypothesis Testing procedure to determine type I and type II errors was linked to the measurement of sensitivity and specificity in clinical trial test and experimental pathogen detection techniques. A theoretical analysis of establishing these types of errors was made and compared to determination of False Positive, False Negative, True Positive and True Negative. Experimental laboratory detection methods used to detect Cryptosporidium spp. were used to highlight the relationship between hypothesis testing, sensitivity, specificity and predicted values. The study finds that, sensitivity and specificity for the two laboratory methods used for Cryptosporidium detection were low hence lowering the probability of detecting a “false null hypothesis” for the presence of cryptosporidium in the water samples using either Microscopic or PCR. Nevertheless, both procedures for cryptosporidium detection had higher “true negatives” increasing its probability of failing to reject a “true null hypothesis” with specificity of 1.00 for both Microscopic and PCR laboratory detection methods.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
文摘A maximum test in lieu of forcing a choice between the two dependent samples t-test and Wilcoxon signed-ranks test is proposed. The maximum test, which requires a new table of critical values, maintains nominal α while guaranteeing the maximum power of the two constituent tests. Critical values, obtained via Monte Carlo methods, are uniformly smaller than the Bonferroni-Dunn adjustment, giving it power superiority when testing for treatment alternatives of shift in location parameter when data are sampled from non-normal distributions.
文摘Personal credit scoring is the application of financial risk forecasting. It becomes an even important task as financial institutions have been experiencing serious competition and challenges. In this paper, the techniques used for credit scoring are summarized and classified and the new method—ensemble learning model is introduced. This article also discusses some problems in current study. It points out that changing the focus from static credit scoring to dynamic behavioral scoring and maximizing revenue by decreasing the Type I and Type II error are two issues in current study. It also suggested that more complex models cannot always been applied to actual situation. Therefore, how to use the assessment models widely and improve the prediction accuracy is the main task for future research.
文摘In testing statistical hypotheses, as in other statistical problems, we may be confronted with fuzzy concepts. This paper deals with the problem of testing hypotheses, when the hypotheses are fuzzy and the data are crisp. We first give new definitions for notion of mass (density) probability function with fuzzy parameter, probability of type I and type II errors and then state and prove the sequential probability ratio test, on the basis of these new errors, for testing fuzzy hypotheses. Numerical examples are also provided to illustrate the approach.
文摘目的:探讨随机化检验(Randomization test)在内部预试验IPS(Internal Pilot Study)自适应设计样本量调整中对I型错误和检验效能的影响.方法:利用蒙特-卡罗(MonteCarlo)法模拟样本量较小时的IPS样本量调整,分别采用随机化检验和t检验分析最后数据并比较二者对I型错误、检验效能值的影响.结果:重计算的第二阶段样本量波动性较大,t检验不能很好地抑制I型错误,随机化检验能较好的抑制I型错误,检验效能略有降低.结论:在临床试验样本量较小的情况下,内部预试验盲态下样本量调整后随机化检验能保护I型错误不增大,同时保证检验效能亦满足要求.