Normality testing is a fundamental hypothesis test in the statistical analysis of key biological indicators of diabetes.If this assumption is violated,it may cause the test results to deviate from the true value,leadi...Normality testing is a fundamental hypothesis test in the statistical analysis of key biological indicators of diabetes.If this assumption is violated,it may cause the test results to deviate from the true value,leading to incorrect inferences and conclusions,and ultimately affecting the validity and accuracy of statistical inferences.Considering this,the study designs a unified analysis scheme for different data types based on parametric statistical test methods and non-parametric test methods.The data were grouped according to sample type and divided into discrete data and continuous data.To account for differences among subgroups,the conventional chi-squared test was used for discrete data.The normal distribution is the basis of many statistical methods;if the data does not follow a normal distribution,many statistical methods will fail or produce incorrect results.Therefore,before data analysis and modeling,the data were divided into normal and non-normal groups through normality testing.For normally distributed data,parametric statistical methods were used to judge the differences between groups.For non-normal data,non-parametric tests were employed to improve the accuracy of the analysis.Statistically significant indicators were retained according to the significance index P-value of the statistical test or corresponding statistics.These indicators were then combined with relevant medical background to further explore the etiology leading to the occurrence or transformation of diabetes status.展开更多
In this study, to power comparison test, different univariate normality testing procedures are compared by using new algorithm. Different univariate and multivariate test are also analyzed here. And also review effici...In this study, to power comparison test, different univariate normality testing procedures are compared by using new algorithm. Different univariate and multivariate test are also analyzed here. And also review efficient algorithm for calculating the size corrected power of the test which can be used to compare the efficiency of the test. Also to test the randomness of generated random numbers. For this purpose, 1000 data sets with combinations of sample size n = 10, 20, 25, 30, 40, 50, 100, 200, 300 were generated from uniform distribution and tested by using different tests for randomness. The assessment of normality using statistical tests is sensitive to the sample size. Observed that with the increase of n, overall powers are increased but Shapiro Wilk (SW) test, Shapiro Francia (SF) test and Andeson Darling (AD) test are the most powerful test among other tests. Cramer-Von-Mises (CVM) test performs better than Pearson chi-square, Lilliefors test has better power than Jarque Bera (JB) Test. Jarque Bera (JB) Test is less powerful test among other tests.展开更多
The objective of this study is to propose the Parametric Seven-Number Summary (PSNS) as a significance test for normality and to verify its accuracy and power in comparison with two well-known tests, such as Royston’...The objective of this study is to propose the Parametric Seven-Number Summary (PSNS) as a significance test for normality and to verify its accuracy and power in comparison with two well-known tests, such as Royston’s W test and D’Agostino-Belanger-D’Agostino K-squared test. An experiment with 384 conditions was simulated. The conditions were generated by crossing 24 sample sizes and 16 types of continuous distributions: one normal and 15 non-normal. The percentage of success in maintaining the null hypothesis of normality against normal samples and in rejecting the null hypothesis against non-normal samples (accuracy) was calculated. In addition, the type II error against normal samples and the statistical power against normal samples were computed. Comparisons of percentage and means were performed using Cochran’s Q-test, Friedman’s test, and repeated measures analysis of variance. With sample sizes of 150 or greater, high accuracy and mean power or type II error (≥0.70 and ≥0.80, respectively) were achieved. All three normality tests were similarly accurate;however, the PSNS-based test showed lower mean power than K-squared and W tests, especially against non-normal samples of symmetrical-platykurtic distributions, such as the uniform, semicircle, and arcsine distributions. It is concluded that the PSNS-based omnibus test is accurate and powerful for testing normality with samples of at least 150 observations.展开更多
Based on the asymptotic spectral distribution of Wigner matrices, a new normality test method is proposed via reforming the white noise sequence. In this work, the asymptotic cumulative distribution function (CDF) o...Based on the asymptotic spectral distribution of Wigner matrices, a new normality test method is proposed via reforming the white noise sequence. In this work, the asymptotic cumulative distribution function (CDF) of eigenvalues of the Wigner matrix is deduced. A numerical Kullback-Leibler divergence of the empiric-d spectral CDF based on test samples from the deduced asymptotic CDF is established, which is treated as the test statistic. For validating the superiority of our proposed normality test, we apply the method to weak SIPSK signal detection in the single-input single-output (SISO) system and the single-input multiple-output (SIMO) system. By comparing with other common normality tests and the existing signal detection methods, simulation results show that the proposed method is superior and robust.展开更多
This paper investigates the normality of some real data set obtained from waist measurements of a group of 49 young adults. The quantile - quantile (Q-Q) plot and the analysis of correlation coefficients for the Q-Q...This paper investigates the normality of some real data set obtained from waist measurements of a group of 49 young adults. The quantile - quantile (Q-Q) plot and the analysis of correlation coefficients for the Q-Q plot is used to determine the normality or otherwise of the data set. In this regards, the probabilities of the quantiles were computed, modified and plotted. Thereafter the correlation coefficients for the quantile - quantile plots were obtained. Results indicate that at 0.1 level of significance, the data for young adult males of the sample were not normally distributed, and had a mean value that is within the range of low risk, healthwise, whereas the distribution of the data for young female adults showed reasonable normality, but also with a mean value that is within the range of low risk in terms of health condition.展开更多
Asphalt mixture is a highly heterogeneous material, which is one of the reasons for high measurements uncertainty when subjected to tests. The results of such tests are often unreliable, which may lead to making bad p...Asphalt mixture is a highly heterogeneous material, which is one of the reasons for high measurements uncertainty when subjected to tests. The results of such tests are often unreliable, which may lead to making bad professional judgments. They can be avoided by carrying out reliable analyses of measurement uncertainty adequate for the research methods used and conducted before the actual research is done. This paper presents the calculation of measurements uncertainty using as an example--the determination of the stiffness modulus of the asphalt mixture, which, in turn, was accomplished using the indirect tension method. The paper also shows the employment of the basic methods of statistical analysis, such as testing two mean values and conformity tests. Essential concepts in measurements uncertainty have been compiled and the determination of the stiffness module parameters are discussed. It has been demonstrated that the biggest source of error in the stiffness modulus measuring process is the displacement measure. The aim of the research was to find the measurement uncertainty for stiffness modulus by an indirect tensile test and the presentation of examples of the used statistical methods.展开更多
Due to differences in the distribution of scores for different trials, the performance of a speaker verification system will be seriously diminished if raw scores are directly used for detection with a unified thresho...Due to differences in the distribution of scores for different trials, the performance of a speaker verification system will be seriously diminished if raw scores are directly used for detection with a unified threshold value. As such, the scores must be normalized. To tackle the shortcomings of score normalization methods, we propose a speaker verification system based on log-likelihood normalization (LLN). Without a priori knowledge, LLN increases the separation between scores of target and non-target speaker models, so as to improve score aliasing of “same-speaker” and “different-speaker” trials corresponding to the same test speech, enabling better discrimination and decision capability. The experiment shows that LLN is an effective method of scoring normalization.展开更多
The following problem is called the everywhere-cover problem:“Given a set of dependencies over a database scheme,is the set of dependencies explicitly given for each relation scheme equivalent to the dependencies imp...The following problem is called the everywhere-cover problem:“Given a set of dependencies over a database scheme,is the set of dependencies explicitly given for each relation scheme equivalent to the dependencies implied for that relation scheme?”It is shown that when the everywhere-cover problem has a ‘yes’ answer,examining only the dependencies explicitly given will suffice to test 3NF,BCNF and 4NF of a database scheme.But this does not hold for 2NF.Consequently,in such cases,tests of BCNF and 4NF all take polynomial time.Then a proof is given that test of 3NF of a database scheme is Co-NP-complete,and from this result it is shown that everywhere-cover is also Co-NP-complete when only functional dependencies are allowed.These results lead to doubt the truth of the well believed conjec- ture that no polynomial time algorithm for designing a Iossless BCNF database scheme is likely to exist.展开更多
To quantitatively characterize the horizontal shale gas well productivity and identify the dominant productivity factors in the Weiyuan Shale Gas Field,Sichuan Basin,a practical productivity method involving multiple ...To quantitatively characterize the horizontal shale gas well productivity and identify the dominant productivity factors in the Weiyuan Shale Gas Field,Sichuan Basin,a practical productivity method involving multiple indicators was proposed to analyze the production performance of 150 horizontal wells.The normalized test production,flowback ratio,first-year initial production and estimated/expected ultimate recovery(EUR)were introduced to estimate the well productivity in different production stages.The correlation between these four indicators was determined to reveal their effects on production performance forecasts.In addition,the dominant productivity factors in the present stage were identified to provide guidance for production performance enhancement.Research indicates that favorable linear relations exist between the normalized test production,first-year initial production and EUR.The normalized test production is regarded as an important indicator to preliminarily characterize the well productivity in the initial stage.The first-year initial production is the most accurate productivity evaluation indicator after a year.The flowback ratio is a supplementary indicator that qualitatively represents the well productivity and fracturing performance.The well productivity is greatly dependent on the lateral target interval,drilling length of Longmaxi1_(1)^(1)(LM1_(1)^(1))and wellbore integrity.The first-year recovery degree of EUR is 24%–58%with a P50 value of 35%.展开更多
For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recentl...For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recently, a new SVM-based speaker verification method using GMM super vectors has been proposed. This paper describes the construction of a new speaker verification system and investigates the use of nuisance attribute projection and test normalization to further enhance performance. Experiments were conducted on the core test of the 2006 NIST speaker recognition evaluation corpus. The experimental results indicate that an SVM-based speaker verification system using GMM super vectors can achieve appealing performance. With the use of nuisance attribute projection and test normalization, the system performance can be significantly improved, with improvements in the equal error rate from 7.78% to 4.92% and detection cost function from 0.0376 to 0.0251.展开更多
We give an algorithm for computing the factor ring of a given ideal in a Dedekind domain with finite rank, which runs in deterministic and polynomial time. We provide two applications of the algorithm:judging whether ...We give an algorithm for computing the factor ring of a given ideal in a Dedekind domain with finite rank, which runs in deterministic and polynomial time. We provide two applications of the algorithm:judging whether a given ideal is prime or prime power. The main algorithm is based on basis representation of finite rings which is computed via Hermite and Smith normal forms.展开更多
基金National Natural Science Foundation of China(No.12271261)Postgraduate Research and Practice Innovation Program of Jiangsu Province,China(Grant No.SJCX230368)。
文摘Normality testing is a fundamental hypothesis test in the statistical analysis of key biological indicators of diabetes.If this assumption is violated,it may cause the test results to deviate from the true value,leading to incorrect inferences and conclusions,and ultimately affecting the validity and accuracy of statistical inferences.Considering this,the study designs a unified analysis scheme for different data types based on parametric statistical test methods and non-parametric test methods.The data were grouped according to sample type and divided into discrete data and continuous data.To account for differences among subgroups,the conventional chi-squared test was used for discrete data.The normal distribution is the basis of many statistical methods;if the data does not follow a normal distribution,many statistical methods will fail or produce incorrect results.Therefore,before data analysis and modeling,the data were divided into normal and non-normal groups through normality testing.For normally distributed data,parametric statistical methods were used to judge the differences between groups.For non-normal data,non-parametric tests were employed to improve the accuracy of the analysis.Statistically significant indicators were retained according to the significance index P-value of the statistical test or corresponding statistics.These indicators were then combined with relevant medical background to further explore the etiology leading to the occurrence or transformation of diabetes status.
文摘In this study, to power comparison test, different univariate normality testing procedures are compared by using new algorithm. Different univariate and multivariate test are also analyzed here. And also review efficient algorithm for calculating the size corrected power of the test which can be used to compare the efficiency of the test. Also to test the randomness of generated random numbers. For this purpose, 1000 data sets with combinations of sample size n = 10, 20, 25, 30, 40, 50, 100, 200, 300 were generated from uniform distribution and tested by using different tests for randomness. The assessment of normality using statistical tests is sensitive to the sample size. Observed that with the increase of n, overall powers are increased but Shapiro Wilk (SW) test, Shapiro Francia (SF) test and Andeson Darling (AD) test are the most powerful test among other tests. Cramer-Von-Mises (CVM) test performs better than Pearson chi-square, Lilliefors test has better power than Jarque Bera (JB) Test. Jarque Bera (JB) Test is less powerful test among other tests.
文摘The objective of this study is to propose the Parametric Seven-Number Summary (PSNS) as a significance test for normality and to verify its accuracy and power in comparison with two well-known tests, such as Royston’s W test and D’Agostino-Belanger-D’Agostino K-squared test. An experiment with 384 conditions was simulated. The conditions were generated by crossing 24 sample sizes and 16 types of continuous distributions: one normal and 15 non-normal. The percentage of success in maintaining the null hypothesis of normality against normal samples and in rejecting the null hypothesis against non-normal samples (accuracy) was calculated. In addition, the type II error against normal samples and the statistical power against normal samples were computed. Comparisons of percentage and means were performed using Cochran’s Q-test, Friedman’s test, and repeated measures analysis of variance. With sample sizes of 150 or greater, high accuracy and mean power or type II error (≥0.70 and ≥0.80, respectively) were achieved. All three normality tests were similarly accurate;however, the PSNS-based test showed lower mean power than K-squared and W tests, especially against non-normal samples of symmetrical-platykurtic distributions, such as the uniform, semicircle, and arcsine distributions. It is concluded that the PSNS-based omnibus test is accurate and powerful for testing normality with samples of at least 150 observations.
基金Supported by the National Natural Science Foundation of China under Grant No 61371170the Fundamental Research Funds for the Central Universities under Grant Nos NP2015404 and NS2016038+1 种基金the Aeronautical Science Foundation of China under Grant No 20152052028the Funding of Jiangsu Innovation Program for Graduate Education under Grant No KYLX15_0282
文摘Based on the asymptotic spectral distribution of Wigner matrices, a new normality test method is proposed via reforming the white noise sequence. In this work, the asymptotic cumulative distribution function (CDF) of eigenvalues of the Wigner matrix is deduced. A numerical Kullback-Leibler divergence of the empiric-d spectral CDF based on test samples from the deduced asymptotic CDF is established, which is treated as the test statistic. For validating the superiority of our proposed normality test, we apply the method to weak SIPSK signal detection in the single-input single-output (SISO) system and the single-input multiple-output (SIMO) system. By comparing with other common normality tests and the existing signal detection methods, simulation results show that the proposed method is superior and robust.
文摘This paper investigates the normality of some real data set obtained from waist measurements of a group of 49 young adults. The quantile - quantile (Q-Q) plot and the analysis of correlation coefficients for the Q-Q plot is used to determine the normality or otherwise of the data set. In this regards, the probabilities of the quantiles were computed, modified and plotted. Thereafter the correlation coefficients for the quantile - quantile plots were obtained. Results indicate that at 0.1 level of significance, the data for young adult males of the sample were not normally distributed, and had a mean value that is within the range of low risk, healthwise, whereas the distribution of the data for young female adults showed reasonable normality, but also with a mean value that is within the range of low risk in terms of health condition.
文摘Asphalt mixture is a highly heterogeneous material, which is one of the reasons for high measurements uncertainty when subjected to tests. The results of such tests are often unreliable, which may lead to making bad professional judgments. They can be avoided by carrying out reliable analyses of measurement uncertainty adequate for the research methods used and conducted before the actual research is done. This paper presents the calculation of measurements uncertainty using as an example--the determination of the stiffness modulus of the asphalt mixture, which, in turn, was accomplished using the indirect tension method. The paper also shows the employment of the basic methods of statistical analysis, such as testing two mean values and conformity tests. Essential concepts in measurements uncertainty have been compiled and the determination of the stiffness module parameters are discussed. It has been demonstrated that the biggest source of error in the stiffness modulus measuring process is the displacement measure. The aim of the research was to find the measurement uncertainty for stiffness modulus by an indirect tensile test and the presentation of examples of the used statistical methods.
文摘Due to differences in the distribution of scores for different trials, the performance of a speaker verification system will be seriously diminished if raw scores are directly used for detection with a unified threshold value. As such, the scores must be normalized. To tackle the shortcomings of score normalization methods, we propose a speaker verification system based on log-likelihood normalization (LLN). Without a priori knowledge, LLN increases the separation between scores of target and non-target speaker models, so as to improve score aliasing of “same-speaker” and “different-speaker” trials corresponding to the same test speech, enabling better discrimination and decision capability. The experiment shows that LLN is an effective method of scoring normalization.
基金Supported by the National Natural Science Foundation of China.
文摘The following problem is called the everywhere-cover problem:“Given a set of dependencies over a database scheme,is the set of dependencies explicitly given for each relation scheme equivalent to the dependencies implied for that relation scheme?”It is shown that when the everywhere-cover problem has a ‘yes’ answer,examining only the dependencies explicitly given will suffice to test 3NF,BCNF and 4NF of a database scheme.But this does not hold for 2NF.Consequently,in such cases,tests of BCNF and 4NF all take polynomial time.Then a proof is given that test of 3NF of a database scheme is Co-NP-complete,and from this result it is shown that everywhere-cover is also Co-NP-complete when only functional dependencies are allowed.These results lead to doubt the truth of the well believed conjec- ture that no polynomial time algorithm for designing a Iossless BCNF database scheme is likely to exist.
基金the National S&T Major Project of China(No.2017ZX05035004-005)for their support.
文摘To quantitatively characterize the horizontal shale gas well productivity and identify the dominant productivity factors in the Weiyuan Shale Gas Field,Sichuan Basin,a practical productivity method involving multiple indicators was proposed to analyze the production performance of 150 horizontal wells.The normalized test production,flowback ratio,first-year initial production and estimated/expected ultimate recovery(EUR)were introduced to estimate the well productivity in different production stages.The correlation between these four indicators was determined to reveal their effects on production performance forecasts.In addition,the dominant productivity factors in the present stage were identified to provide guidance for production performance enhancement.Research indicates that favorable linear relations exist between the normalized test production,first-year initial production and EUR.The normalized test production is regarded as an important indicator to preliminarily characterize the well productivity in the initial stage.The first-year initial production is the most accurate productivity evaluation indicator after a year.The flowback ratio is a supplementary indicator that qualitatively represents the well productivity and fracturing performance.The well productivity is greatly dependent on the lateral target interval,drilling length of Longmaxi1_(1)^(1)(LM1_(1)^(1))and wellbore integrity.The first-year recovery degree of EUR is 24%–58%with a P50 value of 35%.
文摘For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recently, a new SVM-based speaker verification method using GMM super vectors has been proposed. This paper describes the construction of a new speaker verification system and investigates the use of nuisance attribute projection and test normalization to further enhance performance. Experiments were conducted on the core test of the 2006 NIST speaker recognition evaluation corpus. The experimental results indicate that an SVM-based speaker verification system using GMM super vectors can achieve appealing performance. With the use of nuisance attribute projection and test normalization, the system performance can be significantly improved, with improvements in the equal error rate from 7.78% to 4.92% and detection cost function from 0.0376 to 0.0251.
基金supported by National Natural Science Foundation of China (Grant Nos. 11601202, 11471314 and 11401312)the Natural Science Foundation of the Jiangsu Higher Education Institutions (Grant No. 14KJB110012)+1 种基金the High-Level Talent Scientific Research Foundation of Jinling Institute of Technology (Grant No. jit-b-201527)the National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences
文摘We give an algorithm for computing the factor ring of a given ideal in a Dedekind domain with finite rank, which runs in deterministic and polynomial time. We provide two applications of the algorithm:judging whether a given ideal is prime or prime power. The main algorithm is based on basis representation of finite rings which is computed via Hermite and Smith normal forms.