This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for ar...This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for arrhythmia detection.The proposed classifier leverages the Chi-square distance as a primary metric,providing a specialized and original approach for precise arrhythmia detection.To optimize feature selection and refine the classifier’s performance,particle swarm optimization(PSO)is integrated with the Chi-square distance as a fitness function.This synergistic integration enhances the classifier’s capabilities,resulting in a substantial improvement in accuracy for arrhythmia detection.Experimental results demonstrate the efficacy of the proposed method,achieving a noteworthy accuracy rate of 98% with PSO,higher than 89% achieved without any previous optimization.The classifier outperforms machine learning(ML)and deep learning(DL)techniques,underscoring its reliability and superiority in the realm of arrhythmia classification.The promising results render it an effective method to support both academic and medical communities,offering an advanced and precise solution for arrhythmia detection in electrocardiogram(ECG)data.展开更多
In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the m...In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the modified Chi-square technique, the unknown concentration of reactants and products in test samples withdrawn from the process is determined. The technique avoids the need for the spectral data to conform to Beer’s Law and the best spectral range is determined automatically.展开更多
The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two c...The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.展开更多
Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the dise...Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the disease since 2000 and an 80%rise in male deaths.If untreated,it results in complications of many vital organs of the human body which may lead to fatality.Early detection of diabetes is a task of significant importance to start timely treatment.This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis.An ensemble model,logistic tree classifier(LTC),is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism.Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression,extra tree classifier,AdaBoost,Gaussian naive Bayes,decision tree,random forest,and k nearest neighbor.In addition,several experiments are carried out using principal component analysis(PCA)and Chi-square(Chi-2)fea-tures to analyze the influence of feature selection on the performance of machine learning classifiers.Results indicate that Chi-2 features show high performance than both PCA features and original features.However,the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction.In addition,the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.展开更多
We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be c...We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.展开更多
We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and ...We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.展开更多
Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-v...Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.展开更多
In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For exampl...In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.展开更多
A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function an...A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function and cumulative distribution function is presented. The statistical features of the Generalized Kumaraswamy Generalized Power Gompertz distribution are systematically derived and adequately studied. The estimation of the model parameters in the absence of censoring and under-right censoring is performed using the method of maximum likelihood. The test statistic for right-censored data, criteria test for GKGPG distribution, estimated matrix Ŵ, Ĉ, and Ĝ, criteria test Y<sup>2</sup>n</sub>, alongside the quadratic form of the test statistic is derived. Mean simulated values of maximum likelihood estimates and their corresponding square mean errors are presented and confirmed to agree closely with the true parameter values. Simulated levels of significance for Y<sup>2</sup>n</sub> (γ) test for the GKGPG model against their theoretical values were recorded. We conclude that the null hypothesis for which simulated samples are fitted by GKGPG distribution is widely validated for the different levels of significance considered. From the summary of the results of the strength of a specific type of braided cord dataset on the GKGPG model, it is observed that the proposed GKGPG model fits the data set for a significance level ε = 0.05.展开更多
Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconc...Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1,2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.展开更多
It is well known that smart thermostats (STs) have become key devices in the implementation of smart homes;thus, they are considered as primary elements for the control of electrical energy consumption in households. ...It is well known that smart thermostats (STs) have become key devices in the implementation of smart homes;thus, they are considered as primary elements for the control of electrical energy consumption in households. Moreover, energy consumption is drastically affected when the end users select unsuitable STs or when they do not use the STs correctly. Furthermore, in future, Mexico will face serious electrical energy challenges that can be considerably resolved if the end users operate the STs in a correct manner. Hence, it is important to carry out an in-depth study and analysis on thermostats, by focusing on social aspects that influence the technological use and performance of the thermostats. This paper proposes the use of a signal detection theory (SDT), fuzzy detection theory (FDT), and chi-square (CS) test in order to understand the perceptions and beliefs of end users about the use of STs in Mexico. This paper extensively shows the perceptions and beliefs about the selected thermostats in Mexico. Besides, it presents an in-depth discussion on the cognitive perceptions and beliefs of end users. Moreover, it shows why the expectations of the end users about STs are not met. It also promotes the technological and social development of STs such that they are relatively more accepted in complex electrical grids such as smart grids.展开更多
In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different const...In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different constructions of the partition of the sample space may lead to different conclusions. Based on an equiprobable partition of sample space, a modified chi^quared test is proposed. A method for constructing the modified chi-squared test is proposed. As an application, the proposed test is used to test whether vectorial data come from an uniformity distribution defined on the hypersphere. Some simulation studies show that the modified chisquared test against different alternative is robust.展开更多
The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of c...The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of classes varying with sample size in the test has attached more and more attention.However,in this situation,there is not theoretical results for the asymptotic property of such chi-squared test statistic.This paper proves the consistency of chi-squared test with varying number of classes under some conditions.Meanwhile,the authors also give a convergence rate of KolmogorovSimirnov distance between the test statistic and corresponding chi-square distributed random variable.In addition,a real example and simulation results validate the reasonability of theoretical result and the superiority of chi-squared test with varying number of classes.展开更多
In an era dominated by information dissemination through various channels like newspapers,social media,radio,and television,the surge in content production,especially on social platforms,has amplified the challenge of...In an era dominated by information dissemination through various channels like newspapers,social media,radio,and television,the surge in content production,especially on social platforms,has amplified the challenge of distinguishing between truthful and deceptive information.Fake news,a prevalent issue,particularly on social media,complicates the assessment of news credibility.The pervasive spread of fake news not only misleads the public but also erodes trust in legitimate news sources,creating confusion and polarizing opinions.As the volume of information grows,individuals increasingly struggle to discern credible content from false narratives,leading to widespread misinformation and potentially harmful consequences.Despite numerous methodologies proposed for fake news detection,including knowledge-based,language-based,and machine-learning approaches,their efficacy often diminishes when confronted with high-dimensional datasets and data riddled with noise or inconsistencies.Our study addresses this challenge by evaluating the synergistic benefits of combining feature extraction and feature selection techniques in fake news detection.We employ multiple feature extraction methods,including Count Vectorizer,Bag of Words,Global Vectors for Word Representation(GloVe),Word to Vector(Word2Vec),and Term Frequency-Inverse Document Frequency(TF-IDF),alongside feature selection techniques such as Information Gain,Chi-Square,Principal Component Analysis(PCA),and Document Frequency.This comprehensive approach enhances the model’s ability to identify and analyze relevant features,leading to more accurate and effective fake news detection.Our findings highlight the importance of a multi-faceted approach,offering a significant improvement in model accuracy and reliability.Moreover,the study emphasizes the adaptability of the proposed ensemble model across diverse datasets,reinforcing its potential for broader application in real-world scenarios.We introduce a pioneering ensemble technique that leverages both machine-learning and deep-learning classifiers.To identify the optimal ensemble configuration,we systematically tested various combinations.Experimental evaluations conducted on three diverse datasets related to fake news demonstrate the exceptional performance of our proposed ensemble model.Achieving remarkable accuracy levels of 97%,99%,and 98%on Dataset 1,Dataset 2,and Dataset 3,respectively,our approach showcases robustness and effectiveness in discerning fake news amidst the complexities of contemporary information landscapes.This research contributes to the advancement of fake news detection methodologies and underscores the significance of integrating feature extraction and feature selection strategies for enhanced performance,especially in the context of intricate,high-dimensional datasets.展开更多
“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Bu...“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Buxa Tiger Reserve(BTR)and its adjoining area in West Bengal State,India,making the area volatile.People’s attitudes towards elephant conservation activity are very crucial to get rid of HEC,because people’s proximity with wild elephants’habitat can trigger the occurrence of HEC.The aim of this study is to conduct an in-depth investigation about the association of people’s attitudes towards HEC with their locational,demographic,and socio-economic characteristics in BTR and its adjoining area by using Pearson’s bivariate chi-square test and binary logistic regression analysis.BTR is one of the constituent parts of Eastern Doors Elephant Reserve(EDER).We interviewed 500 respondents to understand their perceptions to HEC and investigated their locational,demographic,and socio-economic characteristics including location of village,gender,age,ethnicity,religion,caste,poverty level,education level,primary occupation,secondary occupation,household type,and source of firewood.The results indicate that respondents who are living in enclave forest villages(EFVs),peripheral forest villages(PFVs),corridor village(CVs),or forest and corridor villages(FCVs),mainly males,at the age of 18–48 years old,engaged with agriculture occupation,and living in kancha and mixed houses,have more likelihood to witness HEC.Besides,respondents who are illiterate or at primary education level are more likely to regard elephant as a main problematic animal around their villages and refuse to participate in elephant conservation activity.For the sake of a sustainable environment for both human beings and wildlife,people’s attitudes towards elephants must be friendly in a more prudent way,so that the two communities can live in harmony.展开更多
文摘This study introduces a new classifier tailored to address the limitations inherent in conventional classifiers such as K-nearest neighbor(KNN),random forest(RF),decision tree(DT),and support vector machine(SVM)for arrhythmia detection.The proposed classifier leverages the Chi-square distance as a primary metric,providing a specialized and original approach for precise arrhythmia detection.To optimize feature selection and refine the classifier’s performance,particle swarm optimization(PSO)is integrated with the Chi-square distance as a fitness function.This synergistic integration enhances the classifier’s capabilities,resulting in a substantial improvement in accuracy for arrhythmia detection.Experimental results demonstrate the efficacy of the proposed method,achieving a noteworthy accuracy rate of 98% with PSO,higher than 89% achieved without any previous optimization.The classifier outperforms machine learning(ML)and deep learning(DL)techniques,underscoring its reliability and superiority in the realm of arrhythmia classification.The promising results render it an effective method to support both academic and medical communities,offering an advanced and precise solution for arrhythmia detection in electrocardiogram(ECG)data.
文摘In this paper Singular Decompositon Value (SVD) formula and modified Chi-square solution are provided, and the modified Chi-square is combined with FT-IR instrument to control biochemical reaction process. Using the modified Chi-square technique, the unknown concentration of reactants and products in test samples withdrawn from the process is determined. The technique avoids the need for the spectral data to conform to Beer’s Law and the best spectral range is determined automatically.
文摘The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.
基金supported by the Florida Center for Advanced Analytics and Data Science funded by Ernesto.Net(under the Algorithms for Good Grant).
文摘Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the disease since 2000 and an 80%rise in male deaths.If untreated,it results in complications of many vital organs of the human body which may lead to fatality.Early detection of diabetes is a task of significant importance to start timely treatment.This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis.An ensemble model,logistic tree classifier(LTC),is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism.Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression,extra tree classifier,AdaBoost,Gaussian naive Bayes,decision tree,random forest,and k nearest neighbor.In addition,several experiments are carried out using principal component analysis(PCA)and Chi-square(Chi-2)fea-tures to analyze the influence of feature selection on the performance of machine learning classifiers.Results indicate that Chi-2 features show high performance than both PCA features and original features.However,the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction.In addition,the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.
基金the National Natural Science Foundation of China (10571139)
文摘We study the asymptotics tot the statistic of chi-square in type Ⅱ error. By the contraction principle, the large deviations and moderate deviations are obtained, and the rate function of moderate deviations can be calculated explicitly which is a squared function.
文摘We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.
文摘Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.
文摘In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10th, 50th, and 90th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.
文摘A new six-parameter continuous distribution called the Generalized Kumaraswamy Generalized Power Gompertz (GKGPG) distribution is proposed in this study, a graphical illustration of the probability density function and cumulative distribution function is presented. The statistical features of the Generalized Kumaraswamy Generalized Power Gompertz distribution are systematically derived and adequately studied. The estimation of the model parameters in the absence of censoring and under-right censoring is performed using the method of maximum likelihood. The test statistic for right-censored data, criteria test for GKGPG distribution, estimated matrix Ŵ, Ĉ, and Ĝ, criteria test Y<sup>2</sup>n</sub>, alongside the quadratic form of the test statistic is derived. Mean simulated values of maximum likelihood estimates and their corresponding square mean errors are presented and confirmed to agree closely with the true parameter values. Simulated levels of significance for Y<sup>2</sup>n</sub> (γ) test for the GKGPG model against their theoretical values were recorded. We conclude that the null hypothesis for which simulated samples are fitted by GKGPG distribution is widely validated for the different levels of significance considered. From the summary of the results of the strength of a specific type of braided cord dataset on the GKGPG model, it is observed that the proposed GKGPG model fits the data set for a significance level ε = 0.05.
文摘Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1,2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
文摘It is well known that smart thermostats (STs) have become key devices in the implementation of smart homes;thus, they are considered as primary elements for the control of electrical energy consumption in households. Moreover, energy consumption is drastically affected when the end users select unsuitable STs or when they do not use the STs correctly. Furthermore, in future, Mexico will face serious electrical energy challenges that can be considerably resolved if the end users operate the STs in a correct manner. Hence, it is important to carry out an in-depth study and analysis on thermostats, by focusing on social aspects that influence the technological use and performance of the thermostats. This paper proposes the use of a signal detection theory (SDT), fuzzy detection theory (FDT), and chi-square (CS) test in order to understand the perceptions and beliefs of end users about the use of STs in Mexico. This paper extensively shows the perceptions and beliefs about the selected thermostats in Mexico. Besides, it presents an in-depth discussion on the cognitive perceptions and beliefs of end users. Moreover, it shows why the expectations of the end users about STs are not met. It also promotes the technological and social development of STs such that they are relatively more accepted in complex electrical grids such as smart grids.
基金Foundation item: the Natural Science Foundation of Beijing (No. 1062001)Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality(No. 05006011200702).Acknowledgements The authors cordially thank the Associate Editor and Reviewers for their constructive comments which lead to improvement of the manuscript. They are also very grateful to Prof. Adelaide Figueiredo for his help.
文摘In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different constructions of the partition of the sample space may lead to different conclusions. Based on an equiprobable partition of sample space, a modified chi^quared test is proposed. A method for constructing the modified chi-squared test is proposed. As an application, the proposed test is used to test whether vectorial data come from an uniformity distribution defined on the hypersphere. Some simulation studies show that the modified chisquared test against different alternative is robust.
基金supported by the Natural Science Foundation of China under Grant Nos.11071022,11028103,11231010,11471223,BCMIISthe Beijing Municipal Educational Commission Foundation under Grant Nos.KZ201410028030,KM201210028005Jishou University Subject in 2014(No:14JD035)
文摘The classical chi-squared goodness of fit test assumes the number of classes is fixed,meanwhile the test statistic has a limiting chi-square distribution under the null hypothesis.It is well known that the number of classes varying with sample size in the test has attached more and more attention.However,in this situation,there is not theoretical results for the asymptotic property of such chi-squared test statistic.This paper proves the consistency of chi-squared test with varying number of classes under some conditions.Meanwhile,the authors also give a convergence rate of KolmogorovSimirnov distance between the test statistic and corresponding chi-square distributed random variable.In addition,a real example and simulation results validate the reasonability of theoretical result and the superiority of chi-squared test with varying number of classes.
基金supported by the MSIT(Ministry of Science and ICT),Korea,under the ICT Creative Consilience Program(IITP-2024-2020-0-01819)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation).
文摘In an era dominated by information dissemination through various channels like newspapers,social media,radio,and television,the surge in content production,especially on social platforms,has amplified the challenge of distinguishing between truthful and deceptive information.Fake news,a prevalent issue,particularly on social media,complicates the assessment of news credibility.The pervasive spread of fake news not only misleads the public but also erodes trust in legitimate news sources,creating confusion and polarizing opinions.As the volume of information grows,individuals increasingly struggle to discern credible content from false narratives,leading to widespread misinformation and potentially harmful consequences.Despite numerous methodologies proposed for fake news detection,including knowledge-based,language-based,and machine-learning approaches,their efficacy often diminishes when confronted with high-dimensional datasets and data riddled with noise or inconsistencies.Our study addresses this challenge by evaluating the synergistic benefits of combining feature extraction and feature selection techniques in fake news detection.We employ multiple feature extraction methods,including Count Vectorizer,Bag of Words,Global Vectors for Word Representation(GloVe),Word to Vector(Word2Vec),and Term Frequency-Inverse Document Frequency(TF-IDF),alongside feature selection techniques such as Information Gain,Chi-Square,Principal Component Analysis(PCA),and Document Frequency.This comprehensive approach enhances the model’s ability to identify and analyze relevant features,leading to more accurate and effective fake news detection.Our findings highlight the importance of a multi-faceted approach,offering a significant improvement in model accuracy and reliability.Moreover,the study emphasizes the adaptability of the proposed ensemble model across diverse datasets,reinforcing its potential for broader application in real-world scenarios.We introduce a pioneering ensemble technique that leverages both machine-learning and deep-learning classifiers.To identify the optimal ensemble configuration,we systematically tested various combinations.Experimental evaluations conducted on three diverse datasets related to fake news demonstrate the exceptional performance of our proposed ensemble model.Achieving remarkable accuracy levels of 97%,99%,and 98%on Dataset 1,Dataset 2,and Dataset 3,respectively,our approach showcases robustness and effectiveness in discerning fake news amidst the complexities of contemporary information landscapes.This research contributes to the advancement of fake news detection methodologies and underscores the significance of integrating feature extraction and feature selection strategies for enhanced performance,especially in the context of intricate,high-dimensional datasets.
文摘“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Buxa Tiger Reserve(BTR)and its adjoining area in West Bengal State,India,making the area volatile.People’s attitudes towards elephant conservation activity are very crucial to get rid of HEC,because people’s proximity with wild elephants’habitat can trigger the occurrence of HEC.The aim of this study is to conduct an in-depth investigation about the association of people’s attitudes towards HEC with their locational,demographic,and socio-economic characteristics in BTR and its adjoining area by using Pearson’s bivariate chi-square test and binary logistic regression analysis.BTR is one of the constituent parts of Eastern Doors Elephant Reserve(EDER).We interviewed 500 respondents to understand their perceptions to HEC and investigated their locational,demographic,and socio-economic characteristics including location of village,gender,age,ethnicity,religion,caste,poverty level,education level,primary occupation,secondary occupation,household type,and source of firewood.The results indicate that respondents who are living in enclave forest villages(EFVs),peripheral forest villages(PFVs),corridor village(CVs),or forest and corridor villages(FCVs),mainly males,at the age of 18–48 years old,engaged with agriculture occupation,and living in kancha and mixed houses,have more likelihood to witness HEC.Besides,respondents who are illiterate or at primary education level are more likely to regard elephant as a main problematic animal around their villages and refuse to participate in elephant conservation activity.For the sake of a sustainable environment for both human beings and wildlife,people’s attitudes towards elephants must be friendly in a more prudent way,so that the two communities can live in harmony.