We start with a description of the statistical inferential framework and the duality between observed data and the true state of nature that underlies it. We demonstrate here that the usual testing of dueling hypothes...We start with a description of the statistical inferential framework and the duality between observed data and the true state of nature that underlies it. We demonstrate here that the usual testing of dueling hypotheses and the acceptance of one and the rejection of the other is a framework which can often be faulty when such inferences are applied to individual subjects. This follows from noting that the statistical inferential framework is predominantly based on conclusions drawn for aggregates and noting that what is true in the aggregate frequently does not hold for individuals, an ecological fallacy. Such a fallacy is usually seen as problematic when each data record represents aggregate statistics for counties or districts and not data for individuals. Here we demonstrate strong ecological fallacies even when using subject data. Inverted simulations, of trials rightly sized to detect meaningful differences, yielding a statistically significant p-value of 0.000001 (1 in a million) and associated with clinically meaningful differences between a hypothetical new therapy and a standard therapy, had a proportion of instances of subjects with standard therapy effect better than new therapy effects close to 30%. A ―winner take all‖ choice between two hypotheses may not be supported by statistically significant differences based on stochastic data. We also argue the incorrectness across many individuals of other summaries such as correlations, density estimates, standard deviations and predictions based on machine learning models. Despite artifacts we support the use of prospective clinical trials and careful unbiased model building as necessary first steps. In health care, high touch personalized care based on patient level data will remain relevant even as we adopt more high tech data-intensive personalized therapeutic strategies based on aggregates.展开更多
Someone or the other is always pointing to a published study to justify a point of view or the need for a change in what we do or how we live. There are so many such studies, many reported in top-notch journals, repor...Someone or the other is always pointing to a published study to justify a point of view or the need for a change in what we do or how we live. There are so many such studies, many reported in top-notch journals, reporting results inconsistent across and often inconsistent within. It is in the interest of increasing the credibility of science, and to safeguard the general public living with its overt and covert influence, to filter good science from bad. Some inferences are good, even when counter-intuitive or seemingly inconsistent, and are likely to withstand scrutiny and some others may represent marginal effects in the aggregate not entirely useful for individual choices or decisions, and are often non-reproducible. The New York Times featured an article in August 2018 debunking some of the reported studies supporting testing for Vitamin D deficiencies and the recommendation of large supplemental doses of Vitamin D. Some of these Vitamin D claims, among other claims, were reported as not holding up on replication in controlled trials [1]. We have noted in Ref. [2] that we need to be wary as individuals about reported signals detected in studies using stochastic data, even when these aggregate signals are of a large magnitude. We demonstrated discordance rates of 30% or higher between subject level assessments of effect and the conclusion drawn in the aggregate. Here we will provide a computation of this discordant proportion as well as post-hoc assessments of aggregate inferences, with emphasis on evaluating studies with time-to-event endpoints such as those in cancer trials. Similar evaluations for continuous, binomial data and correlations are also provided. We also discuss the use of response thresholds.展开更多
文摘We start with a description of the statistical inferential framework and the duality between observed data and the true state of nature that underlies it. We demonstrate here that the usual testing of dueling hypotheses and the acceptance of one and the rejection of the other is a framework which can often be faulty when such inferences are applied to individual subjects. This follows from noting that the statistical inferential framework is predominantly based on conclusions drawn for aggregates and noting that what is true in the aggregate frequently does not hold for individuals, an ecological fallacy. Such a fallacy is usually seen as problematic when each data record represents aggregate statistics for counties or districts and not data for individuals. Here we demonstrate strong ecological fallacies even when using subject data. Inverted simulations, of trials rightly sized to detect meaningful differences, yielding a statistically significant p-value of 0.000001 (1 in a million) and associated with clinically meaningful differences between a hypothetical new therapy and a standard therapy, had a proportion of instances of subjects with standard therapy effect better than new therapy effects close to 30%. A ―winner take all‖ choice between two hypotheses may not be supported by statistically significant differences based on stochastic data. We also argue the incorrectness across many individuals of other summaries such as correlations, density estimates, standard deviations and predictions based on machine learning models. Despite artifacts we support the use of prospective clinical trials and careful unbiased model building as necessary first steps. In health care, high touch personalized care based on patient level data will remain relevant even as we adopt more high tech data-intensive personalized therapeutic strategies based on aggregates.
文摘Someone or the other is always pointing to a published study to justify a point of view or the need for a change in what we do or how we live. There are so many such studies, many reported in top-notch journals, reporting results inconsistent across and often inconsistent within. It is in the interest of increasing the credibility of science, and to safeguard the general public living with its overt and covert influence, to filter good science from bad. Some inferences are good, even when counter-intuitive or seemingly inconsistent, and are likely to withstand scrutiny and some others may represent marginal effects in the aggregate not entirely useful for individual choices or decisions, and are often non-reproducible. The New York Times featured an article in August 2018 debunking some of the reported studies supporting testing for Vitamin D deficiencies and the recommendation of large supplemental doses of Vitamin D. Some of these Vitamin D claims, among other claims, were reported as not holding up on replication in controlled trials [1]. We have noted in Ref. [2] that we need to be wary as individuals about reported signals detected in studies using stochastic data, even when these aggregate signals are of a large magnitude. We demonstrated discordance rates of 30% or higher between subject level assessments of effect and the conclusion drawn in the aggregate. Here we will provide a computation of this discordant proportion as well as post-hoc assessments of aggregate inferences, with emphasis on evaluating studies with time-to-event endpoints such as those in cancer trials. Similar evaluations for continuous, binomial data and correlations are also provided. We also discuss the use of response thresholds.