This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for ar...This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for arrhythmia detection.The proposed classifier leverages the Chi-square distance as a primary metric,providing a specialized and original approach for precise arrhythmia detection.To optimize feature selection and refine the classifier’s performance,particle swarm optimization(PSO)is integrated with the Chi-square distance as a fitness function.This synergistic integration enhances the classifier’s capabilities,resulting in a substantial improvement in accuracy for arrhythmia detection.Experimental results demonstrate the efficacy of the proposed method,achieving a noteworthy accuracy rate of 98% with PSO,higher than 89% achieved without any previous optimization.The classifier outperforms machine learning(ML)and deep learning(DL)techniques,underscoring its reliability and superiority in the realm of arrhythmia classification.The promising results render it an effective method to support both academic and medical communities,offering an advanced and precise solution for arrhythmia detection in electrocardiogram(ECG)data.展开更多
Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the dise...Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the disease since 2000 and an 80%rise in male deaths.If untreated,it results in complications of many vital organs of the human body which may lead to fatality.Early detection of diabetes is a task of significant importance to start timely treatment.This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis.An ensemble model,logistic tree classifier(LTC),is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism.Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression,extra tree classifier,AdaBoost,Gaussian naive Bayes,decision tree,random forest,and k nearest neighbor.In addition,several experiments are carried out using principal component analysis(PCA)and Chi-square(Chi-2)fea-tures to analyze the influence of feature selection on the performance of machine learning classifiers.Results indicate that Chi-2 features show high performance than both PCA features and original features.However,the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction.In addition,the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.展开更多
In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the m...In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the modified Chi-square technique, the unknown concentration of reactants and products in test samples withdrawn from the process is determined. The technique avoids the need for the spectral data to conform to Beer’s Law and the best spectral range is determined automatically.展开更多
The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two c...The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.展开更多
We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be c...We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.展开更多
We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and ...We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.展开更多
Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-v...Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.展开更多
In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For exampl...In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.展开更多
A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function an...A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function and cumulative distribution function is presented. The statistical features of the Generalized Kumaraswamy Generalized Power Gompertz distribution are systematically derived and adequately studied. The estimation of the model parameters in the absence of censoring and under-right censoring is performed using the method of maximum likelihood. The test statistic for right-censored data, criteria test for GKGPG distribution, estimated matrix Ŵ, Ĉ, and Ĝ, criteria test Y<sup>2</sup>n</sub>, alongside the quadratic form of the test statistic is derived. Mean simulated values of maximum likelihood estimates and their corresponding square mean errors are presented and confirmed to agree closely with the true parameter values. Simulated levels of significance for Y<sup>2</sup>n</sub> (γ) test for the GKGPG model against their theoretical values were recorded. We conclude that the null hypothesis for which simulated samples are fitted by GKGPG distribution is widely validated for the different levels of significance considered. From the summary of the results of the strength of a specific type of braided cord dataset on the GKGPG model, it is observed that the proposed GKGPG model fits the data set for a significance level ε = 0.05.展开更多
Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconc...Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1,2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.展开更多
It is well known that smart thermostats (STs) have become key devices in the implementation of smart homes;thus, they are considered as primary elements for the control of electrical energy consumption in households. ...It is well known that smart thermostats (STs) have become key devices in the implementation of smart homes;thus, they are considered as primary elements for the control of electrical energy consumption in households. Moreover, energy consumption is drastically affected when the end users select unsuitable STs or when they do not use the STs correctly. Furthermore, in future, Mexico will face serious electrical energy challenges that can be considerably resolved if the end users operate the STs in a correct manner. Hence, it is important to carry out an in-depth study and analysis on thermostats, by focusing on social aspects that influence the technological use and performance of the thermostats. This paper proposes the use of a signal detection theory (SDT), fuzzy detection theory (FDT), and chi-square (CS) test in order to understand the perceptions and beliefs of end users about the use of STs in Mexico. This paper extensively shows the perceptions and beliefs about the selected thermostats in Mexico. Besides, it presents an in-depth discussion on the cognitive perceptions and beliefs of end users. Moreover, it shows why the expectations of the end users about STs are not met. It also promotes the technological and social development of STs such that they are relatively more accepted in complex electrical grids such as smart grids.展开更多
In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different const...In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different constructions of the partition of the sample space may lead to different conclusions. Based on an equiprobable partition of sample space, a modified chi^quared test is proposed. A method for constructing the modified chi-squared test is proposed. As an application, the proposed test is used to test whether vectorial data come from an uniformity distribution defined on the hypersphere. Some simulation studies show that the modified chisquared test against different alternative is robust.展开更多
The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of c...The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of classes varying with sample size in the test has attached more and more attention.However,in this situation,there is not theoretical results for the asymptotic property of such chi-squared test statistic.This paper proves the consistency of chi-squared test with varying number of classes under some conditions.Meanwhile,the authors also give a convergence rate of KolmogorovSimirnov distance between the test statistic and corresponding chi-square distributed random variable.In addition,a real example and simulation results validate the reasonability of theoretical result and the superiority of chi-squared test with varying number of classes.展开更多
“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Bu...“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Buxa Tiger Reserve(BTR)and its adjoining area in West Bengal State,India,making the area volatile.People’s attitudes towards elephant conservation activity are very crucial to get rid of HEC,because people’s proximity with wild elephants’habitat can trigger the occurrence of HEC.The aim of this study is to conduct an in-depth investigation about the association of people’s attitudes towards HEC with their locational,demographic,and socio-economic characteristics in BTR and its adjoining area by using Pearson’s bivariate chi-square test and binary logistic regression analysis.BTR is one of the constituent parts of Eastern Doors Elephant Reserve(EDER).We interviewed 500 respondents to understand their perceptions to HEC and investigated their locational,demographic,and socio-economic characteristics including location of village,gender,age,ethnicity,religion,caste,poverty level,education level,primary occupation,secondary occupation,household type,and source of firewood.The results indicate that respondents who are living in enclave forest villages(EFVs),peripheral forest villages(PFVs),corridor village(CVs),or forest and corridor villages(FCVs),mainly males,at the age of 18–48 years old,engaged with agriculture occupation,and living in kancha and mixed houses,have more likelihood to witness HEC.Besides,respondents who are illiterate or at primary education level are more likely to regard elephant as a main problematic animal around their villages and refuse to participate in elephant conservation activity.For the sake of a sustainable environment for both human beings and wildlife,people’s attitudes towards elephants must be friendly in a more prudent way,so that the two communities can live in harmony.展开更多
Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.Th...Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.展开更多
The Internet service provider(ISP)is the heart of any country’s Internet infrastructure and plays an important role in connecting to theWorld WideWeb.Internet exchange point(IXP)allows the interconnection of two or m...The Internet service provider(ISP)is the heart of any country’s Internet infrastructure and plays an important role in connecting to theWorld WideWeb.Internet exchange point(IXP)allows the interconnection of two or more separate network infrastructures.All Internet traffic entering a country should pass through its IXP.Thus,it is an ideal location for performing malicious traffic analysis.Distributed denial of service(DDoS)attacks are becoming a more serious daily threat.Malicious actors in DDoS attacks control numerous infected machines known as botnets.Botnets are used to send numerous fake requests to overwhelm the resources of victims and make them unavailable for some periods.To date,such attacks present a major devastating security threat on the Internet.This paper proposes an effective and efficient machine learning(ML)-based DDoS detection approach for the early warning and protection of the Saudi Arabia Internet exchange point(SAIXP)platform.The effectiveness and efficiency of the proposed approach are verified by selecting an accurate ML method with a small number of input features.A chi-square method is used for feature selection because it is easier to compute than other methods,and it does not require any assumption about feature distribution values.Several ML methods are assessed using holdout and 10-fold tests on a public large-size dataset.The experiments showed that the performance of the decision tree(DT)classifier achieved a high accuracy result(99.98%)with a small number of features(10 features).The experimental results confirmthe applicability of using DT and chi-square for DDoS detection and early warning in SAIXP.展开更多
One of the significant health issues affecting women that impacts their fertility and results in serious health concerns is Polycystic ovarian syndrome(PCOS).Consequently,timely screening of polycystic ovarian syndrom...One of the significant health issues affecting women that impacts their fertility and results in serious health concerns is Polycystic ovarian syndrome(PCOS).Consequently,timely screening of polycystic ovarian syndrome can help in the process of recovery.Finding a method to aid doctors in this procedure was crucial due to the difficulties in detecting this condition.This research aimed to determine whether it is possible to optimize the detection of PCOS utilizing Deep Learning algorithms and methodologies.Additionally,feature selection methods that produce the most important subset of features can speed up calculation and enhance the effectiveness of classifiers.In this research,the tri-stage wrapper method is used because it reduces the computation time.The proposed study for the Automatic diagnosis of PCOS contains preprocessing,data normalization,feature selection,and classification.A dataset with 39 characteristics,including metabolism,neuroimaging,hormones,and biochemical information for 541 subjects,was employed in this scenario.To start,this research pre-processed the information.Next for feature selection,a tri-stage wrapper method such as Mutual Information,ReliefF,Chi-Square,and Xvariance is used.Then,various classification methods are tested and trained.Deep learning techniques including convolutional neural network(CNN),multi-layer perceptron(MLP),Recurrent neural network(RNN),and Bi long short-term memory(Bi-LSTM)are utilized for categorization.The experimental finding demonstrates that with effective feature extraction process using tri stage wrapper method+CNN delivers the highest precision(97%),high accuracy(98.67%),and recall(89%)when compared with other machine learning algorithms.展开更多
文摘This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for arrhythmia detection.The proposed classifier leverages the Chi-square distance as a primary metric,providing a specialized and original approach for precise arrhythmia detection.To optimize feature selection and refine the classifier’s performance,particle swarm optimization(PSO)is integrated with the Chi-square distance as a fitness function.This synergistic integration enhances the classifier’s capabilities,resulting in a substantial improvement in accuracy for arrhythmia detection.Experimental results demonstrate the efficacy of the proposed method,achieving a noteworthy accuracy rate of 98% with PSO,higher than 89% achieved without any previous optimization.The classifier outperforms machine learning(ML)and deep learning(DL)techniques,underscoring its reliability and superiority in the realm of arrhythmia classification.The promising results render it an effective method to support both academic and medical communities,offering an advanced and precise solution for arrhythmia detection in electrocardiogram(ECG)data.
基金supported by the Florida Center for Advanced Analytics and Data Science funded by Ernesto.Net(under the Algorithms for Good Grant).
文摘Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the disease since 2000 and an 80%rise in male deaths.If untreated,it results in complications of many vital organs of the human body which may lead to fatality.Early detection of diabetes is a task of significant importance to start timely treatment.This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis.An ensemble model,logistic tree classifier(LTC),is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism.Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression,extra tree classifier,AdaBoost,Gaussian naive Bayes,decision tree,random forest,and k nearest neighbor.In addition,several experiments are carried out using principal component analysis(PCA)and Chi-square(Chi-2)fea-tures to analyze the influence of feature selection on the performance of machine learning classifiers.Results indicate that Chi-2 features show high performance than both PCA features and original features.However,the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction.In addition,the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.
文摘In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the modified Chi-square technique, the unknown concentration of reactants and products in test samples withdrawn from the process is determined. The technique avoids the need for the spectral data to conform to Beer’s Law and the best spectral range is determined automatically.
文摘The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.
基金the National Natural Science Foundation of China (10571139)
文摘We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.
文摘We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.
文摘Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.
文摘In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.
文摘A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function and cumulative distribution function is presented. The statistical features of the Generalized Kumaraswamy Generalized Power Gompertz distribution are systematically derived and adequately studied. The estimation of the model parameters in the absence of censoring and under-right censoring is performed using the method of maximum likelihood. The test statistic for right-censored data, criteria test for GKGPG distribution, estimated matrix Ŵ, Ĉ, and Ĝ, criteria test Y<sup>2</sup>n</sub>, alongside the quadratic form of the test statistic is derived. Mean simulated values of maximum likelihood estimates and their corresponding square mean errors are presented and confirmed to agree closely with the true parameter values. Simulated levels of significance for Y<sup>2</sup>n</sub> (γ) test for the GKGPG model against their theoretical values were recorded. We conclude that the null hypothesis for which simulated samples are fitted by GKGPG distribution is widely validated for the different levels of significance considered. From the summary of the results of the strength of a specific type of braided cord dataset on the GKGPG model, it is observed that the proposed GKGPG model fits the data set for a significance level ε = 0.05.
文摘Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1,2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
文摘It is well known that smart thermostats (STs) have become key devices in the implementation of smart homes;thus, they are considered as primary elements for the control of electrical energy consumption in households. Moreover, energy consumption is drastically affected when the end users select unsuitable STs or when they do not use the STs correctly. Furthermore, in future, Mexico will face serious electrical energy challenges that can be considerably resolved if the end users operate the STs in a correct manner. Hence, it is important to carry out an in-depth study and analysis on thermostats, by focusing on social aspects that influence the technological use and performance of the thermostats. This paper proposes the use of a signal detection theory (SDT), fuzzy detection theory (FDT), and chi-square (CS) test in order to understand the perceptions and beliefs of end users about the use of STs in Mexico. This paper extensively shows the perceptions and beliefs about the selected thermostats in Mexico. Besides, it presents an in-depth discussion on the cognitive perceptions and beliefs of end users. Moreover, it shows why the expectations of the end users about STs are not met. It also promotes the technological and social development of STs such that they are relatively more accepted in complex electrical grids such as smart grids.
基金Foundation item: the Natural Science Foundation of Beijing (No. 1062001)Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality(No. 05006011200702).Acknowledgements The authors cordially thank the Associate Editor and Reviewers for their constructive comments which lead to improvement of the manuscript. They are also very grateful to Prof. Adelaide Figueiredo for his help.
文摘In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different constructions of the partition of the sample space may lead to different conclusions. Based on an equiprobable partition of sample space, a modified chi^quared test is proposed. A method for constructing the modified chi-squared test is proposed. As an application, the proposed test is used to test whether vectorial data come from an uniformity distribution defined on the hypersphere. Some simulation studies show that the modified chisquared test against different alternative is robust.
基金supported by the Natural Science Foundation of China under Grant Nos.11071022,11028103,11231010,11471223,BCMIISthe Beijing Municipal Educational Commission Foundation under Grant Nos.KZ201410028030,KM201210028005Jishou University Subject in 2014(No:14JD035)
文摘The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of classes varying with sample size in the test has attached more and more attention.However,in this situation,there is not theoretical results for the asymptotic property of such chi-squared test statistic.This paper proves the consistency of chi-squared test with varying number of classes under some conditions.Meanwhile,the authors also give a convergence rate of KolmogorovSimirnov distance between the test statistic and corresponding chi-square distributed random variable.In addition,a real example and simulation results validate the reasonability of theoretical result and the superiority of chi-squared test with varying number of classes.
文摘“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Buxa Tiger Reserve(BTR)and its adjoining area in West Bengal State,India,making the area volatile.People’s attitudes towards elephant conservation activity are very crucial to get rid of HEC,because people’s proximity with wild elephants’habitat can trigger the occurrence of HEC.The aim of this study is to conduct an in-depth investigation about the association of people’s attitudes towards HEC with their locational,demographic,and socio-economic characteristics in BTR and its adjoining area by using Pearson’s bivariate chi-square test and binary logistic regression analysis.BTR is one of the constituent parts of Eastern Doors Elephant Reserve(EDER).We interviewed 500 respondents to understand their perceptions to HEC and investigated their locational,demographic,and socio-economic characteristics including location of village,gender,age,ethnicity,religion,caste,poverty level,education level,primary occupation,secondary occupation,household type,and source of firewood.The results indicate that respondents who are living in enclave forest villages(EFVs),peripheral forest villages(PFVs),corridor village(CVs),or forest and corridor villages(FCVs),mainly males,at the age of 18–48 years old,engaged with agriculture occupation,and living in kancha and mixed houses,have more likelihood to witness HEC.Besides,respondents who are illiterate or at primary education level are more likely to regard elephant as a main problematic animal around their villages and refuse to participate in elephant conservation activity.For the sake of a sustainable environment for both human beings and wildlife,people’s attitudes towards elephants must be friendly in a more prudent way,so that the two communities can live in harmony.
基金supported by the Science and Technology Project of China Southern Power Grid(GZHKJXM20210043-080041KK52210002).
文摘Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.
文摘The Internet service provider(ISP)is the heart of any country’s Internet infrastructure and plays an important role in connecting to theWorld WideWeb.Internet exchange point(IXP)allows the interconnection of two or more separate network infrastructures.All Internet traffic entering a country should pass through its IXP.Thus,it is an ideal location for performing malicious traffic analysis.Distributed denial of service(DDoS)attacks are becoming a more serious daily threat.Malicious actors in DDoS attacks control numerous infected machines known as botnets.Botnets are used to send numerous fake requests to overwhelm the resources of victims and make them unavailable for some periods.To date,such attacks present a major devastating security threat on the Internet.This paper proposes an effective and efficient machine learning(ML)-based DDoS detection approach for the early warning and protection of the Saudi Arabia Internet exchange point(SAIXP)platform.The effectiveness and efficiency of the proposed approach are verified by selecting an accurate ML method with a small number of input features.A chi-square method is used for feature selection because it is easier to compute than other methods,and it does not require any assumption about feature distribution values.Several ML methods are assessed using holdout and 10-fold tests on a public large-size dataset.The experiments showed that the performance of the decision tree(DT)classifier achieved a high accuracy result(99.98%)with a small number of features(10 features).The experimental results confirmthe applicability of using DT and chi-square for DDoS detection and early warning in SAIXP.
基金The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through Project Number WE-44-0033.
文摘One of the significant health issues affecting women that impacts their fertility and results in serious health concerns is Polycystic ovarian syndrome(PCOS).Consequently,timely screening of polycystic ovarian syndrome can help in the process of recovery.Finding a method to aid doctors in this procedure was crucial due to the difficulties in detecting this condition.This research aimed to determine whether it is possible to optimize the detection of PCOS utilizing Deep Learning algorithms and methodologies.Additionally,feature selection methods that produce the most important subset of features can speed up calculation and enhance the effectiveness of classifiers.In this research,the tri-stage wrapper method is used because it reduces the computation time.The proposed study for the Automatic diagnosis of PCOS contains preprocessing,data normalization,feature selection,and classification.A dataset with 39 characteristics,including metabolism,neuroimaging,hormones,and biochemical information for 541 subjects,was employed in this scenario.To start,this research pre-processed the information.Next for feature selection,a tri-stage wrapper method such as Mutual Information,ReliefF,Chi-Square,and Xvariance is used.Then,various classification methods are tested and trained.Deep learning techniques including convolutional neural network(CNN),multi-layer perceptron(MLP),Recurrent neural network(RNN),and Bi long short-term memory(Bi-LSTM)are utilized for categorization.The experimental finding demonstrates that with effective feature extraction process using tri stage wrapper method+CNN delivers the highest precision(97%),high accuracy(98.67%),and recall(89%)when compared with other machine learning algorithms.