We present here an alternative definition of the P-value for statistical hypothesis test of a real-valued parameter for a continuous random variable X. Our approach uses neither the notion of Type I error nor the assu...We present here an alternative definition of the P-value for statistical hypothesis test of a real-valued parameter for a continuous random variable X. Our approach uses neither the notion of Type I error nor the assumption that null hypothesis is true. Instead, the new P-value involves the maximum likelihood estimator, which is usually available for a parameter such as the mean μ or standard deviation σ of a random variable X with a common distribution.展开更多
The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge ...The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.展开更多
Exclusive hypothesis testing is a new and special class of hypothesis testing.This kind of testing can be applied in survival analysis to understand the association between genomics information and clinical informatio...Exclusive hypothesis testing is a new and special class of hypothesis testing.This kind of testing can be applied in survival analysis to understand the association between genomics information and clinical information about the survival time.Besides,it is well known that Cox's proportional hazards model is the most commonly used model for regression analysis of failure time.In this paper,the authors consider doing the exclusive hypothesis testing for Cox's proportional hazards model with right-censored data.The authors propose the comprehensive test statistics to make decision,and show that the corresponding decision rule can control the asymptotic TypeⅠerrors and have good powers in theory.The numerical studies indicate that the proposed approach works well for practical situations and it is applied to a set of real data arising from Rotterdam Breast Cancer Data study that motivated this study.展开更多
Today,coronavirus appears as a serious challenge to the whole world.Epidemiological data of coronavirus is collected through media and web sources for the purpose of analysis.New data on COVID-19 are available daily,y...Today,coronavirus appears as a serious challenge to the whole world.Epidemiological data of coronavirus is collected through media and web sources for the purpose of analysis.New data on COVID-19 are available daily,yet information about the biological aspects of SARS-CoV-2 and epidemiological characteristics of COVID-19 remains limited,and uncertainty remains around nearly all its parameters’values.This research provides the scientic and public health communities better resources,knowledge,and tools to improve their ability to control the infectious diseases.Using the publicly available data on the ongoing pandemic,the present study investigates the incubation period and other time intervals that govern the epidemiological dynamics of the COVID-19 infections.Formulation of the testing hypotheses for different countries with a 95%level of condence,and descriptive statistics have been calculated to analyze in which region will COVID-19 fall according to the tested hypothesized mean of different countries.The results will be helpful in decision making as well as in further mathematical analysis and control strategy.Statistical tools are used to investigate this pandemic,which will be useful for further research.The testing of the hypothesis is done for the differences in various effects including standard errors.Changes in states’variables are observed over time.The rapid outbreak of coronavirus can be stopped by reducing its transmission.Susceptible should maintain safe distance and follow precautionary measures regarding COVID-19 transmission.展开更多
This paper discusses the nonlinearity of fish acoustic signals by using the surrogate data method. We compare the difference of three test statistics - time-irreversibility Trey, correlation dimension D2 and auto mutu...This paper discusses the nonlinearity of fish acoustic signals by using the surrogate data method. We compare the difference of three test statistics - time-irreversibility Trey, correlation dimension D2 and auto mutual information function I between the original data and the surrogate data. We come to the conclusion that there exists nonlinearity in the fish acoustic signals and there exist deterministic nonlinear components; therefore nonlinear dynamic theory can be used to analyze fish acoustic signals.展开更多
We start with a description of the statistical inferential framework and the duality between observed data and the true state of nature that underlies it. We demonstrate here that the usual testing of dueling hypothes...We start with a description of the statistical inferential framework and the duality between observed data and the true state of nature that underlies it. We demonstrate here that the usual testing of dueling hypotheses and the acceptance of one and the rejection of the other is a framework which can often be faulty when such inferences are applied to individual subjects. This follows from noting that the statistical inferential framework is predominantly based on conclusions drawn for aggregates and noting that what is true in the aggregate frequently does not hold for individuals, an ecological fallacy. Such a fallacy is usually seen as problematic when each data record represents aggregate statistics for counties or districts and not data for individuals. Here we demonstrate strong ecological fallacies even when using subject data. Inverted simulations, of trials rightly sized to detect meaningful differences, yielding a statistically significant p-value of 0.000001 (1 in a million) and associated with clinically meaningful differences between a hypothetical new therapy and a standard therapy, had a proportion of instances of subjects with standard therapy effect better than new therapy effects close to 30%. A ―winner take all‖ choice between two hypotheses may not be supported by statistically significant differences based on stochastic data. We also argue the incorrectness across many individuals of other summaries such as correlations, density estimates, standard deviations and predictions based on machine learning models. Despite artifacts we support the use of prospective clinical trials and careful unbiased model building as necessary first steps. In health care, high touch personalized care based on patient level data will remain relevant even as we adopt more high tech data-intensive personalized therapeutic strategies based on aggregates.展开更多
文摘We present here an alternative definition of the P-value for statistical hypothesis test of a real-valued parameter for a continuous random variable X. Our approach uses neither the notion of Type I error nor the assumption that null hypothesis is true. Instead, the new P-value involves the maximum likelihood estimator, which is usually available for a parameter such as the mean μ or standard deviation σ of a random variable X with a common distribution.
文摘The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.
基金supported by the National Natural Science Foundation of China under Grant Nos.11971064,12371262,and 12171374。
文摘Exclusive hypothesis testing is a new and special class of hypothesis testing.This kind of testing can be applied in survival analysis to understand the association between genomics information and clinical information about the survival time.Besides,it is well known that Cox's proportional hazards model is the most commonly used model for regression analysis of failure time.In this paper,the authors consider doing the exclusive hypothesis testing for Cox's proportional hazards model with right-censored data.The authors propose the comprehensive test statistics to make decision,and show that the corresponding decision rule can control the asymptotic TypeⅠerrors and have good powers in theory.The numerical studies indicate that the proposed approach works well for practical situations and it is applied to a set of real data arising from Rotterdam Breast Cancer Data study that motivated this study.
文摘Today,coronavirus appears as a serious challenge to the whole world.Epidemiological data of coronavirus is collected through media and web sources for the purpose of analysis.New data on COVID-19 are available daily,yet information about the biological aspects of SARS-CoV-2 and epidemiological characteristics of COVID-19 remains limited,and uncertainty remains around nearly all its parameters’values.This research provides the scientic and public health communities better resources,knowledge,and tools to improve their ability to control the infectious diseases.Using the publicly available data on the ongoing pandemic,the present study investigates the incubation period and other time intervals that govern the epidemiological dynamics of the COVID-19 infections.Formulation of the testing hypotheses for different countries with a 95%level of condence,and descriptive statistics have been calculated to analyze in which region will COVID-19 fall according to the tested hypothesized mean of different countries.The results will be helpful in decision making as well as in further mathematical analysis and control strategy.Statistical tools are used to investigate this pandemic,which will be useful for further research.The testing of the hypothesis is done for the differences in various effects including standard errors.Changes in states’variables are observed over time.The rapid outbreak of coronavirus can be stopped by reducing its transmission.Susceptible should maintain safe distance and follow precautionary measures regarding COVID-19 transmission.
文摘This paper discusses the nonlinearity of fish acoustic signals by using the surrogate data method. We compare the difference of three test statistics - time-irreversibility Trey, correlation dimension D2 and auto mutual information function I between the original data and the surrogate data. We come to the conclusion that there exists nonlinearity in the fish acoustic signals and there exist deterministic nonlinear components; therefore nonlinear dynamic theory can be used to analyze fish acoustic signals.
文摘We start with a description of the statistical inferential framework and the duality between observed data and the true state of nature that underlies it. We demonstrate here that the usual testing of dueling hypotheses and the acceptance of one and the rejection of the other is a framework which can often be faulty when such inferences are applied to individual subjects. This follows from noting that the statistical inferential framework is predominantly based on conclusions drawn for aggregates and noting that what is true in the aggregate frequently does not hold for individuals, an ecological fallacy. Such a fallacy is usually seen as problematic when each data record represents aggregate statistics for counties or districts and not data for individuals. Here we demonstrate strong ecological fallacies even when using subject data. Inverted simulations, of trials rightly sized to detect meaningful differences, yielding a statistically significant p-value of 0.000001 (1 in a million) and associated with clinically meaningful differences between a hypothetical new therapy and a standard therapy, had a proportion of instances of subjects with standard therapy effect better than new therapy effects close to 30%. A ―winner take all‖ choice between two hypotheses may not be supported by statistically significant differences based on stochastic data. We also argue the incorrectness across many individuals of other summaries such as correlations, density estimates, standard deviations and predictions based on machine learning models. Despite artifacts we support the use of prospective clinical trials and careful unbiased model building as necessary first steps. In health care, high touch personalized care based on patient level data will remain relevant even as we adopt more high tech data-intensive personalized therapeutic strategies based on aggregates.