This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for ar...This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for arrhythmia detection.The proposed classifier leverages the Chi-square distance as a primary metric,providing a specialized and original approach for precise arrhythmia detection.To optimize feature selection and refine the classifier’s performance,particle swarm optimization(PSO)is integrated with the Chi-square distance as a fitness function.This synergistic integration enhances the classifier’s capabilities,resulting in a substantial improvement in accuracy for arrhythmia detection.Experimental results demonstrate the efficacy of the proposed method,achieving a noteworthy accuracy rate of 98% with PSO,higher than 89% achieved without any previous optimization.The classifier outperforms machine learning(ML)and deep learning(DL)techniques,underscoring its reliability and superiority in the realm of arrhythmia classification.The promising results render it an effective method to support both academic and medical communities,offering an advanced and precise solution for arrhythmia detection in electrocardiogram(ECG)data.展开更多
In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For exampl...In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.展开更多
In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the m...In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the modified Chi-square technique, the unknown concentration of reactants and products in test samples withdrawn from the process is determined. The technique avoids the need for the spectral data to conform to Beer’s Law and the best spectral range is determined automatically.展开更多
传统Pearson相关系数计算公式具有不稳健性,离群值的存在会导致计算结果与实际不符。针对此问题,文章给出了一种稳健估计方法。在模拟样本量分别为20、50、100、200,污染率分别为1%、5%、10%情形下,比较传统相关系数值与稳健相关系数值...传统Pearson相关系数计算公式具有不稳健性,离群值的存在会导致计算结果与实际不符。针对此问题,文章给出了一种稳健估计方法。在模拟样本量分别为20、50、100、200,污染率分别为1%、5%、10%情形下,比较传统相关系数值与稳健相关系数值,发现:稳健相关系数公式正确率均显著高于传统相关系数。在实例分析中进一步验证了稳健相关系数的可行性和有效性。文章研究结论可用于含离群值变量的相关系数稳健估计。The traditional Pearson correlation coefficient calculation formula is not robust, and the existence of outliers will cause the calculation results to be inconsistent with reality. To solve this problem, this paper presents a robust estimation method. When the simulated sample size is 20, 50, 100 and 200 respectively, the pollution rate is 1%, 5% and 10% respectively, it is found that the accuracy of the robust correlation coefficient formula is significantly higher than that of the traditional correlation coefficient. The feasibility and effectiveness of a robust correlation coefficient are further verified in the example analysis. The conclusions of this paper can be used for robust estimation of correlation coefficients with outlier variables.展开更多
The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two c...The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.展开更多
Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the dise...Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the disease since 2000 and an 80%rise in male deaths.If untreated,it results in complications of many vital organs of the human body which may lead to fatality.Early detection of diabetes is a task of significant importance to start timely treatment.This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis.An ensemble model,logistic tree classifier(LTC),is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism.Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression,extra tree classifier,AdaBoost,Gaussian naive Bayes,decision tree,random forest,and k nearest neighbor.In addition,several experiments are carried out using principal component analysis(PCA)and Chi-square(Chi-2)fea-tures to analyze the influence of feature selection on the performance of machine learning classifiers.Results indicate that Chi-2 features show high performance than both PCA features and original features.However,the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction.In addition,the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.展开更多
We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be c...We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.展开更多
With the improvement of equipment reliability,human factors have become the most uncertain part in the system.The standardized Plant Analysis of Risk-Human Reliability Analysis(SPAR-H)method is a reliable method in th...With the improvement of equipment reliability,human factors have become the most uncertain part in the system.The standardized Plant Analysis of Risk-Human Reliability Analysis(SPAR-H)method is a reliable method in the field of human reliability analysis(HRA)to evaluate human reliability and assess risk in large complex systems.However,the classical SPAR-H method does not consider the dependencies among performance shaping factors(PSFs),whichmay cause overestimation or underestimation of the risk of the actual situation.To address this issue,this paper proposes a new method to deal with the dependencies among PSFs in SPAR-H based on the Pearson correlation coefficient.First,the dependence between every two PSFs is measured by the Pearson correlation coefficient.Second,the weights of the PSFs are obtained by considering the total dependence degree.Finally,PSFs’multipliers are modified based on the weights of corresponding PSFs,and then used in the calculating of human error probability(HEP).A case study is used to illustrate the procedure and effectiveness of the proposed method.展开更多
We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and ...We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.展开更多
A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function an...A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function and cumulative distribution function is presented. The statistical features of the Generalized Kumaraswamy Generalized Power Gompertz distribution are systematically derived and adequately studied. The estimation of the model parameters in the absence of censoring and under-right censoring is performed using the method of maximum likelihood. The test statistic for right-censored data, criteria test for GKGPG distribution, estimated matrix Ŵ, Ĉ, and Ĝ, criteria test Y<sup>2</sup>n</sub>, alongside the quadratic form of the test statistic is derived. Mean simulated values of maximum likelihood estimates and their corresponding square mean errors are presented and confirmed to agree closely with the true parameter values. Simulated levels of significance for Y<sup>2</sup>n</sub> (γ) test for the GKGPG model against their theoretical values were recorded. We conclude that the null hypothesis for which simulated samples are fitted by GKGPG distribution is widely validated for the different levels of significance considered. From the summary of the results of the strength of a specific type of braided cord dataset on the GKGPG model, it is observed that the proposed GKGPG model fits the data set for a significance level ε = 0.05.展开更多
Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-v...Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.展开更多
The use of metamaterial enhances the performance of a specific class of antennas known as metamaterial antennas.The radiation cost and quality factor of the antenna are influenced by the size of the antenna.Metamateri...The use of metamaterial enhances the performance of a specific class of antennas known as metamaterial antennas.The radiation cost and quality factor of the antenna are influenced by the size of the antenna.Metamaterial antennas allow for the circumvention of the bandwidth restriction for small antennas.Antenna parameters have recently been predicted using machine learning algorithms in existing literature.Machine learning can take the place of the manual process of experimenting to find the ideal simulated antenna parameters.The accuracy of the prediction will be primarily dependent on the model that is used.In this paper,a novel method for forecasting the bandwidth of the metamaterial antenna is proposed,based on using the Pearson Kernel as a standard kernel.Along with these new approaches,this paper suggests a unique hypersphere-based normalization to normalize the values of the dataset attributes and a dimensionality reduction method based on the Pearson kernel to reduce the dimension.A novel algorithm for optimizing the parameters of Convolutional Neural Network(CNN)based on improved Bat Algorithm-based Optimization with Pearson Mutation(BAO-PM)is also presented in this work.The prediction results of the proposed work are better when compared to the existing models in the literature.展开更多
Background: The signal-to-noise ratio (SNR) is recognized as an index of measurements reproducibility. We derive the maximum likelihood estimators of SNR and discuss confidence interval construction on the difference ...Background: The signal-to-noise ratio (SNR) is recognized as an index of measurements reproducibility. We derive the maximum likelihood estimators of SNR and discuss confidence interval construction on the difference between two correlated SNRs when the readings are from bivariate normal and bivariate lognormal distribution. We use the Pearsons system of curves to approximate the difference between the two estimates and use the bootstrap methods to validate the approximate distributions of the statistic of interest. Methods: The paper uses the delta method to find the first four central moments, and hence the skewness and kurtosis which are important in the determination of the parameters of the Pearsons distribution. Results: The approach is illustrated in two examples;one from veterinary microbiology and food safety data and the other on data from clinical medicine. We derived the four central moments of the target statistics, together with the bootstrap method to evaluate the parameters of Pearsons distribution. The fitted Pearsons curves of Types I and II were recommended based on the available data. The R-codes are also provided to be readily used by the readers.展开更多
文摘This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for arrhythmia detection.The proposed classifier leverages the Chi-square distance as a primary metric,providing a specialized and original approach for precise arrhythmia detection.To optimize feature selection and refine the classifier’s performance,particle swarm optimization(PSO)is integrated with the Chi-square distance as a fitness function.This synergistic integration enhances the classifier’s capabilities,resulting in a substantial improvement in accuracy for arrhythmia detection.Experimental results demonstrate the efficacy of the proposed method,achieving a noteworthy accuracy rate of 98% with PSO,higher than 89% achieved without any previous optimization.The classifier outperforms machine learning(ML)and deep learning(DL)techniques,underscoring its reliability and superiority in the realm of arrhythmia classification.The promising results render it an effective method to support both academic and medical communities,offering an advanced and precise solution for arrhythmia detection in electrocardiogram(ECG)data.
文摘In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.
文摘In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the modified Chi-square technique, the unknown concentration of reactants and products in test samples withdrawn from the process is determined. The technique avoids the need for the spectral data to conform to Beer’s Law and the best spectral range is determined automatically.
文摘传统Pearson相关系数计算公式具有不稳健性,离群值的存在会导致计算结果与实际不符。针对此问题,文章给出了一种稳健估计方法。在模拟样本量分别为20、50、100、200,污染率分别为1%、5%、10%情形下,比较传统相关系数值与稳健相关系数值,发现:稳健相关系数公式正确率均显著高于传统相关系数。在实例分析中进一步验证了稳健相关系数的可行性和有效性。文章研究结论可用于含离群值变量的相关系数稳健估计。The traditional Pearson correlation coefficient calculation formula is not robust, and the existence of outliers will cause the calculation results to be inconsistent with reality. To solve this problem, this paper presents a robust estimation method. When the simulated sample size is 20, 50, 100 and 200 respectively, the pollution rate is 1%, 5% and 10% respectively, it is found that the accuracy of the robust correlation coefficient formula is significantly higher than that of the traditional correlation coefficient. The feasibility and effectiveness of a robust correlation coefficient are further verified in the example analysis. The conclusions of this paper can be used for robust estimation of correlation coefficients with outlier variables.
文摘The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.
基金supported by the Florida Center for Advanced Analytics and Data Science funded by Ernesto.Net(under the Algorithms for Good Grant).
文摘Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the disease since 2000 and an 80%rise in male deaths.If untreated,it results in complications of many vital organs of the human body which may lead to fatality.Early detection of diabetes is a task of significant importance to start timely treatment.This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis.An ensemble model,logistic tree classifier(LTC),is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism.Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression,extra tree classifier,AdaBoost,Gaussian naive Bayes,decision tree,random forest,and k nearest neighbor.In addition,several experiments are carried out using principal component analysis(PCA)and Chi-square(Chi-2)fea-tures to analyze the influence of feature selection on the performance of machine learning classifiers.Results indicate that Chi-2 features show high performance than both PCA features and original features.However,the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction.In addition,the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.
基金the National Natural Science Foundation of China (10571139)
文摘We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.
基金Shanghai Rising-Star Program(Grant No.21QA1403400)Shanghai Sailing Program(Grant No.20YF1414800)Shanghai Key Laboratory of Power Station Automation Technology(Grant No.13DZ2273800).
文摘With the improvement of equipment reliability,human factors have become the most uncertain part in the system.The standardized Plant Analysis of Risk-Human Reliability Analysis(SPAR-H)method is a reliable method in the field of human reliability analysis(HRA)to evaluate human reliability and assess risk in large complex systems.However,the classical SPAR-H method does not consider the dependencies among performance shaping factors(PSFs),whichmay cause overestimation or underestimation of the risk of the actual situation.To address this issue,this paper proposes a new method to deal with the dependencies among PSFs in SPAR-H based on the Pearson correlation coefficient.First,the dependence between every two PSFs is measured by the Pearson correlation coefficient.Second,the weights of the PSFs are obtained by considering the total dependence degree.Finally,PSFs’multipliers are modified based on the weights of corresponding PSFs,and then used in the calculating of human error probability(HEP).A case study is used to illustrate the procedure and effectiveness of the proposed method.
文摘We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.
文摘A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function and cumulative distribution function is presented. The statistical features of the Generalized Kumaraswamy Generalized Power Gompertz distribution are systematically derived and adequately studied. The estimation of the model parameters in the absence of censoring and under-right censoring is performed using the method of maximum likelihood. The test statistic for right-censored data, criteria test for GKGPG distribution, estimated matrix Ŵ, Ĉ, and Ĝ, criteria test Y<sup>2</sup>n</sub>, alongside the quadratic form of the test statistic is derived. Mean simulated values of maximum likelihood estimates and their corresponding square mean errors are presented and confirmed to agree closely with the true parameter values. Simulated levels of significance for Y<sup>2</sup>n</sub> (γ) test for the GKGPG model against their theoretical values were recorded. We conclude that the null hypothesis for which simulated samples are fitted by GKGPG distribution is widely validated for the different levels of significance considered. From the summary of the results of the strength of a specific type of braided cord dataset on the GKGPG model, it is observed that the proposed GKGPG model fits the data set for a significance level ε = 0.05.
文摘Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.
文摘The use of metamaterial enhances the performance of a specific class of antennas known as metamaterial antennas.The radiation cost and quality factor of the antenna are influenced by the size of the antenna.Metamaterial antennas allow for the circumvention of the bandwidth restriction for small antennas.Antenna parameters have recently been predicted using machine learning algorithms in existing literature.Machine learning can take the place of the manual process of experimenting to find the ideal simulated antenna parameters.The accuracy of the prediction will be primarily dependent on the model that is used.In this paper,a novel method for forecasting the bandwidth of the metamaterial antenna is proposed,based on using the Pearson Kernel as a standard kernel.Along with these new approaches,this paper suggests a unique hypersphere-based normalization to normalize the values of the dataset attributes and a dimensionality reduction method based on the Pearson kernel to reduce the dimension.A novel algorithm for optimizing the parameters of Convolutional Neural Network(CNN)based on improved Bat Algorithm-based Optimization with Pearson Mutation(BAO-PM)is also presented in this work.The prediction results of the proposed work are better when compared to the existing models in the literature.
文摘Background: The signal-to-noise ratio (SNR) is recognized as an index of measurements reproducibility. We derive the maximum likelihood estimators of SNR and discuss confidence interval construction on the difference between two correlated SNRs when the readings are from bivariate normal and bivariate lognormal distribution. We use the Pearsons system of curves to approximate the difference between the two estimates and use the bootstrap methods to validate the approximate distributions of the statistic of interest. Methods: The paper uses the delta method to find the first four central moments, and hence the skewness and kurtosis which are important in the determination of the parameters of the Pearsons distribution. Results: The approach is illustrated in two examples;one from veterinary microbiology and food safety data and the other on data from clinical medicine. We derived the four central moments of the target statistics, together with the bootstrap method to evaluate the parameters of Pearsons distribution. The fitted Pearsons curves of Types I and II were recommended based on the available data. The R-codes are also provided to be readily used by the readers.