Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and...Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and outcome by adding control variables. However, this approach may not produce reliable estimates of causal effects. In addition to the shortcomings of the method, this lack of confidence is mainly related to ambiguous formulations in econometrics, such as the definition of selection bias, selection of core control variables, and method of testing for robustness. Within the framework of the causal models, we clarify the assumption of causal inference using regression-based statistical controls, as described in econometrics, and discuss how to select core control variables to satisfy this assumption and conduct robustness tests for regression estimates.展开更多
Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artifi...Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artificial intelligence(XAI)algorithms is one of key steps toward to the artificial intelligence 2.0.With the aim of bringing knowledge of causal inference to scholars of machine learning and artificial intelligence,we invited researchers working on causal inference to write this survey from different aspects of causal inference.This survey includes the following sections:“Estimating average treatment effect:A brief review and beyond”from Dr.Kun Kuang,“Attribution problems in counterfactual inference”from Prof.Lian Li,“The Yule–Simpson paradox and the surrogate paradox”from Prof.Zhi Geng,“Causal potential theory”from Prof.Lei Xu,“Discovering causal information from observational data”from Prof.Kun Zhang,“Formal argumentation in causal reasoning and explanation”from Profs.Beishui Liao and Huaxin Huang,“Causal inference with complex experiments”from Prof.Peng Ding,“Instrumental variables and negative controls for observational studies”from Prof.Wang Miao,and“Causal inference with interference”from Dr.Zhichao Jiang.展开更多
Causal inference prevails in the field of laparoscopic surgery.Once the causality between an intervention and outcome is established,the intervention can be applied to a target population to improve clinical outcomes....Causal inference prevails in the field of laparoscopic surgery.Once the causality between an intervention and outcome is established,the intervention can be applied to a target population to improve clinical outcomes.In many clinical scenarios,interventions are applied longitudinally in response to patients’conditions.Such longitudinal data comprise static variables,such as age,gender,and comorbidities;and dynamic variables,such as the treatment regime,laboratory variables,and vital signs.Some dynamic variables can act as both the confounder and mediator for the effect of an intervention on the outcome;in such cases,simple adjustment with a conventional regression model will bias the effect sizes.To address this,numerous statistical methods are being developed for causal inference;these include,but are not limited to,the structural marginal Cox regression model,dynamic treatment regime,and Cox regression model with time-varying covariates.This technical note provides a gentle introduction to such models and illustrates their use with an example in the field of laparoscopic surgery.展开更多
Statistical approaches for evaluating causal effects and for discovering causal networks are discussed in this paper.A causal relation between two variables is different from an association or correlation between them...Statistical approaches for evaluating causal effects and for discovering causal networks are discussed in this paper.A causal relation between two variables is different from an association or correlation between them.An association measurement between two variables and may be changed dramatically from positive to negative by omitting a third variable,which is called Yule-Simpson paradox.We shall discuss how to evaluate the causal effect of a treatment or exposure on an outcome to avoid the phenomena of Yule-Simpson paradox. Surrogates and intermediate variables are often used to reduce measurement costs or duration when measurement of endpoint variables is expensive,inconvenient,infeasible or unobservable in practice.There have been many criteria for surrogates.However,it is possible that for a surrogate satisfying these criteria,a treatment has a positive effect on the surrogate,which in turn has a positive effect on the outcome,but the treatment has a negative effect on the outcome,which is called the surrogate paradox.We shall discuss criteria for surrogates to avoid the phenomena of the surrogate paradox. Causal networks which describe the causal relationships among a large number of variables have been applied to many research fields.It is important to discover structures of causal networks from observed data.We propose a recursive approach for discovering a causal network in which a structural learning of a large network is decomposed recursively into learning of small networks.Further to discover causal relationships,we present an active learning approach in terms of external interventions on some variables.When we focus on the causes of an interest outcome, instead of discovering a whole network,we propose a local learning approach to discover these causes that affect the outcome.展开更多
The utilization of big Earth data has provided insights into the planet we inhabit in unprecedented dimensions and scales.Unraveling the concealed causal connections within intricate data holds paramount importance fo...The utilization of big Earth data has provided insights into the planet we inhabit in unprecedented dimensions and scales.Unraveling the concealed causal connections within intricate data holds paramount importance for attaining a profound comprehension of the Earth system.Statistical methods founded on correlation have predominated in Earth system science(ESS)for a long time.Nevertheless,correlation does not imply causation,especially when confronted with spurious correlations resulting from big data.Consequently,traditional correlation and regression methods are inadequate for addressing causation related problems in the Earth system.In recent years,propelled by advancements in causal theory and inference methods,particularly the maturity of causal discovery and causal graphical models,causal inference has demonstrated vigorous vitality in various research directions in the Earth system,such as regularities revealing,processes understanding,hypothesis testing,and physical models improving.This paper commences by delving into the origins,connotations,and development of causality,subsequently outlining the principal frameworks of causal inference and the commonly used methods in ESS.Additionally,it reviews the applications of causal inference in the main branches of the Earth system and summarizes the challenges and development directions of causal inference in ESS.In the big Earth data era,as an important method of big data analysis,causal inference,along with physical model and machine learning,can assist the paradigm transformation of ESS from a model-driven paradigm to a paradigm of integration of both mechanism and data.Looking forward,the establishment of a meticulously structured and normalized causal theory can act as a foundational cornerstone for fostering causal cognition in ESS and propel the leap from fragmented research towards a comprehensive understanding of the Earth system.展开更多
BACKGROUND Despite being one of the most prevalent sleep disorders,obstructive sleep apnea hypoventilation syndrome(OSAHS)has limited information on its immunologic foundation.The immunological underpinnings of certai...BACKGROUND Despite being one of the most prevalent sleep disorders,obstructive sleep apnea hypoventilation syndrome(OSAHS)has limited information on its immunologic foundation.The immunological underpinnings of certain major psychiatric diseases have been uncovered in recent years thanks to the extensive use of genome-wide association studies(GWAS)and genotyping techniques using highdensity genetic markers(e.g.,SNP or CNVs).But this tactic hasn't yet been applied to OSAHS.Using a Mendelian randomization analysis,we analyzed the causal link between immune cells and the illness in order to comprehend the immunological bases of OSAHS.AIM To investigate the immune cells'association with OSAHS via genetic methods,guiding future clinical research.METHODS A comprehensive two-sample mendelian randomization study was conducted to investigate the causal relationship between immune cell characteristics and OSAHS.Summary statistics for each immune cell feature were obtained from the GWAS catalog.Information on 731 immune cell properties,such as morphologic parameters,median fluorescence intensity,absolute cellular,and relative cellular,was compiled using publicly available genetic databases.The results'robustness,heterogeneity,and horizontal pleiotropy were confirmed using extensive sensitivity examination.RESULTS Following false discovery rate(FDR)correction,no statistically significant effect of OSAHS on immunophenotypes was observed.However,two lymphocyte subsets were found to have a significant association with the risk of OSAHS:Basophil%CD33dim HLA DR-CD66b-(OR=1.03,95%CI=1.01-1.03,P<0.001);CD38 on IgD+CD24-B cell(OR=1.04,95%CI=1.02-1.04,P=0.019).CONCLUSION This study shows a strong link between immune cells and OSAHS through a gene approach,thus offering direction for potential future medical research.展开更多
This paper reviewed the fruitful achievements in the science of science,sociology of science and economics of science,and their benefits to scientometric research.Then,the causal inference was introduced,which has the...This paper reviewed the fruitful achievements in the science of science,sociology of science and economics of science,and their benefits to scientometric research.Then,the causal inference was introduced,which has the potential to shape scientometric research by determining the cause and effect among variables.In the end,we proposed two detailed reasons why we need causal inference in scientometric research:(1)correlation-based scientometric research is not sufficient to support science&technology policy;(2)Scientometrics needs to go beyond metrics by explaining the mechanisms in science.展开更多
Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and e...Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.展开更多
Multimodal documents combining language and graphs are wide-spread in print media as well as in electronic media. One of the most important tasks to be solved in comprehending graph-text combinations is construction o...Multimodal documents combining language and graphs are wide-spread in print media as well as in electronic media. One of the most important tasks to be solved in comprehending graph-text combinations is construction of causal chains among the meaning entities provided by modalities. In this study we focus on the role of annotation position and shape of graph lines in simple line graphs on causal attributions concerning the event presented by the annotation and the processes (i.e, increases and decreases) and states (no-changes) in the domain value of the graphs presented by the process-lines and state-lines. Based on the experimental investigation of readers' inferences under different conditions, guidelines for the design of multimodal documents including text and statistical information graphics are suggested. One suggestion is that the position and the number of verbal annotations should be selected appropriately, another is that the graph line smoothing should be done cautiously.展开更多
Propensity score (PS) adjustment can control confounding effects and reduce bias when estimating treatment effects in non-randomized trials or observational studies. PS methods are becoming increasingly used to estima...Propensity score (PS) adjustment can control confounding effects and reduce bias when estimating treatment effects in non-randomized trials or observational studies. PS methods are becoming increasingly used to estimate causal effects, including when the sample size is small compared to the number of confounders. With numerous confounders, quasi-complete separation can easily occur in logistic regression used for estimating the PS, but this has not been addressed. We focused on a Bayesian PS method to address the limitations of quasi-complete separation faced by small trials. Bayesian methods are useful because they estimate the PS and causal effects simultaneously while considering the uncertainty of the PS by modelling it as a latent variable. In this study, we conducted simulations to evaluate the performance of Bayesian simultaneous PS estimation by considering the specification of prior distributions for model comparison. We propose a method to improve predictive performance with discrete outcomes in small trials. We found that the specification of prior distributions assigned to logistic regression coefficients was more important in the second step than in the first step, even when there was a quasi-complete separation in the first step. Assigning Cauchy (0, 2.5) to coefficients improved the predictive performance for estimating causal effects and improving the balancing properties of the confounder.展开更多
Modern industrial systems are usually in large scale,consisting of massive components and variables that form a complex system topology.Owing to the interconnections among devices,a fault may occur and propagate to ex...Modern industrial systems are usually in large scale,consisting of massive components and variables that form a complex system topology.Owing to the interconnections among devices,a fault may occur and propagate to exert widespread influences and lead to a variety of alarms.Obtaining the root causes of alarms is beneficial to the decision supports in making corrective alarm responses.Existing data-driven methods for alarm root cause analysis detect causal relations among alarms mainly based on historical alarm event data.To improve the accuracy,this paper proposes a causal fusion inference method for industrial alarm root cause analysis based on process topology and alarm events.A Granger causality inference method considering process topology is exploited to find out the causal relations among alarms.The topological nodes are used as the inputs of the model,and the alarm causal adjacency matrix between alarm variables is obtained by calculating the likelihood of the topological Hawkes process.The root cause is then obtained from the directed acyclic graph(DAG)among alarm variables.The effectiveness of the proposed method is verified by simulations based on both a numerical example and the Tennessee Eastman process(TEP)model.展开更多
The main purpose in many randomized trials is to make an inference about the average causal effect of a treatment. Therefore, on a binary outcome, the null hypothesis for the hypothesis test should be that the causal ...The main purpose in many randomized trials is to make an inference about the average causal effect of a treatment. Therefore, on a binary outcome, the null hypothesis for the hypothesis test should be that the causal risks are equal in the two groups. This null hypothesis is referred to as the weak causal null hypothesis. Nevertheless, at present, hypothesis tests applied in actual randomized trials are not for this null hypothesis;Fisher’s exact test is a test for the sharp causal null hypothesis that the causal effect of treatment is the same for all subjects. In general, the rejection of the sharp causal null hypothesis does not mean that the weak causal null hypothesis is rejected. Recently, Chiba developed new exact tests for the weak causal null hypothesis: a conditional exact test, which requires that a marginal total is fixed, and an unconditional exact test, which does not require that a marginal total is fixed and depends rather on the ratio of random assignment. To apply these exact tests in actual randomized trials, it is inevitable that the sample size calculation must be performed during the study design. In this paper, we present a sample size calculation procedure for these exact tests. Given the sample size, the procedure can derive the exact test power, because it examines all the patterns that can be obtained as observed data under the alternative hypothesis without large sample theories and any assumptions.展开更多
With the advent of digital therapeutics(DTx),the development of software as a medical device(SaMD)for mobile and wearable devices has gained significant attention in recent years.Existing DTx evaluations,such as rando...With the advent of digital therapeutics(DTx),the development of software as a medical device(SaMD)for mobile and wearable devices has gained significant attention in recent years.Existing DTx evaluations,such as randomized clinical trials,mostly focus on verifying the effectiveness of DTx products.To acquire a deeper understanding of DTx engagement and behavioral adherence,beyond efficacy,a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis.In this work,the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets,to investigate contextual patterns associated with DTx usage,and to establish the(causal)relationship between DTx engagement and behavioral adherence.This review of the key components of datadriven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets,which helps to iteratively improve the receptivity of existing DTx.展开更多
Purpose:With the availability of large-scale scholarly datasets,scientists from various domains hope to understand the underlying mechanisms behind science,forming a vibrant area of inquiry in the emerging“science of...Purpose:With the availability of large-scale scholarly datasets,scientists from various domains hope to understand the underlying mechanisms behind science,forming a vibrant area of inquiry in the emerging“science of science”field.As the results from the science of science often has strong policy implications,understanding the causal relationships between variables becomes prominent.However,the most credible quasi-experimental method among all causal inference methods,and a highly valuable tool in the empirical toolkit,Regression Discontinuity Design(RDD)has not been fully exploited in the field of science of science.In this paper,we provide a systematic survey of the RDD method,and its practical applications in the science of science.Design/methodology/approach:First,we introduce the basic assumptions,mathematical notations,and two types of RDD,i.e.,sharp and fuzzy RDD.Second,we use the Web of Science and the Microsoft Academic Graph datasets to study the evolution and citation patterns of RDD papers.Moreover,we provide a systematic survey of the applications of RDD methodologies in various scientific domains,as well as in the science of science.Finally,we demonstrate a case study to estimate the effect of Head Start Funding Proposals on child mortality.Findings:RDD was almost neglected for 30 years after it was first introduced in 1960.Afterward,scientists used mathematical and economic tools to develop the RDD methodology.After 2010,RDD methods showed strong applications in various domains,including medicine,psychology,political science and environmental science.However,we also notice that the RDD method has not been well developed in science of science research.Research Limitations:This work uses a keyword search to obtain RDD papers,which may neglect some related work.Additionally,our work does not aim to develop rigorous mathematical and technical details of RDD but rather focuses on its intuitions and applications.Practical implications:This work proposes how to use the RDD method in science of science research.Originality/value:This work systematically introduces the RDD,and calls for the awareness of using such a method in the field of science of science.展开更多
Forecasting electricity demand is an essential part of the smart grid to ensure a stable and reliable power grid. With the increasing integration of renewable energy resources into the grid, forecasting the demand for...Forecasting electricity demand is an essential part of the smart grid to ensure a stable and reliable power grid. With the increasing integration of renewable energy resources into the grid, forecasting the demand for electricity is critical at all levels, from the distribution to the household. Most existing forecasting methods, however, can be considered black-box models as a result of deep digitalization enablers, such as deep neural networks, which remain difficult to interpret by humans. Moreover, capture of the inter-dependencies among variables presents a significant challenge for multivariate time series forecasting. In this paper we propose eXplainable Causal Graph Neural Network (X-CGNN) for multivariate electricity demand forecasting that overcomes these limitations. As part of this method, we have intrinsic and global explanations based on causal inferences as well as local explanations based on post-hoc analyses. We have performed extensive validation on two real-world electricity demand datasets from both the household and distribution levels to demonstrate that our proposed method achieves state-of-the-art performance.展开更多
The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on...The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on some recent advances in statistical methodologies and models to evaluate programs with high-dimensional data. In particular, four kinds of methods for making valid statistical inferences for treatment effects in high dimensions are addressed. The first one is the so-called doubly robust type estimation, which models the outcome regression and propensity score functions simultaneously. The second one is the covariate balance method to construct the treatment effect estimators. The third one is the sufficient dimension reduction approach for causal inferences. The last one is the machine learning procedure directly or indirectly to make statistical inferences to treatment effect. In such a way, some of these methods and models are closely related to the de-biased Lasso type methods for the regression model with high dimensions in the statistical literature. Finally, some future research topics are also discussed.展开更多
Objective Traditional epidemiological studies have shown that C-reactive protein(CRP)is associated with the risk of cardiovascular diseases(CVDs).However,whether this association is causal remains unclear.Therefore,Me...Objective Traditional epidemiological studies have shown that C-reactive protein(CRP)is associated with the risk of cardiovascular diseases(CVDs).However,whether this association is causal remains unclear.Therefore,Mendelian randomization(MR)was used to explore the causal relationship of CRP with cardiovascular outcomes including ischemic stroke,atrial fibrillation,arrhythmia and congestive heart failure.Methods We performed two-sample MR by using summary-level data obtained from Japanese Encyclopedia of Genetic association by Riken(JENGER),and we selected four single-nucleotide polymorphisms associated with CRP level as instrumental variables.MR estimates were calculated with the inverse-variance weighted(IVW),penalized weighted median and weighted median.MR-Egger regression was used to explore pleiotropy.Results No significant causal association of genetically determined CRP level with ischemic stroke,atrial fibrillation or arrhythmia was found with all four MR methods(all Ps>0.05).The IVW method indicated suggestive evidence of a causal association between CRP and congestive heart failure(OR:1.337,95%CI:1.005–1.780,P=0.046),whereas the other three methods did not.No clear pleiotropy or heterogeneity were observed.Conclusions Suggestive evidence was found only in analysis of congestive heart failure;therefore,further studies are necessary.Furthermore,no causal association was found between CRP and the other three cardiovascular outcomes.展开更多
Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict ...Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict future dynamics. Due to the advances of modern technology, big data becomes increasingly accessible and consequently the problem of reconstructing systems from measured data or time series plays a central role in many scientific disciplines. In recent decades, nonlinear methods rooted in state space reconstruction have been developed, and they do not assume any model equations but can recover the dynamics purely from the measured time series data. In this review, the development of state space reconstruction techniques will be introduced and the recent advances in systems prediction and causality inference using state space reconstruction will be presented. Particularly, the cutting-edge method to deal with short-term time series data will be focused on.Finally, the advantages as well as the remaining problems in this field are discussed.展开更多
It has been evidenced that peer review activities are positively correlated to scientists’bibliometric performance(e.g.,Ortega,2017,2019).However,how the number of paper’reviewing’interacts with a scientist’s’pub...It has been evidenced that peer review activities are positively correlated to scientists’bibliometric performance(e.g.,Ortega,2017,2019).However,how the number of paper’reviewing’interacts with a scientist’s’publishing’has not been addressed in previous studies.This paper attempts to employ the Granger causality inference to explore the directionality between a scientist’s publication performance and his/her review activities.Our dataset comprises scientists’reviewed articles derived from Publons in the Web of Knowledge database,and their publications retrieved from Pub Med.We find that scientists who reviewed less or published less tend to have Granger causality between reviewing and publishing activities.In addition,compared with early-career researchers,reviewing advances publishing for senior scientists.展开更多
Limited evidence exists on the effect of submicronic particulate matter(PM_(1)) on hypertension hospitalization. Evidence based on causal inference and large cohorts is even more scarce. In 2015, 36,271 participants w...Limited evidence exists on the effect of submicronic particulate matter(PM_(1)) on hypertension hospitalization. Evidence based on causal inference and large cohorts is even more scarce. In 2015, 36,271 participants were enrolled in South China and followed up through 2020. Each participant was assigned single-year, lag0–1, and lag0–2 moving average concentration of PM_(1)and fine inhalable particulate matter(PM2.5) simulated based on satellite data at a 1-km resolution. We used an inverse probability weighting approach to balance confounders and utilized a marginal structural Cox model to evaluate the underlying causal links between PM_(1)exposure and hypertension hospitalization, with PM2.5-hypertension association for comparison. Several sensitivity studies and the analyses of effect modification were also conducted. We found that a higher hospitalization risk from both overall(HR: 1.13, 95% CI:1.05–1.22) and essential hypertension(HR: 1.15, 95% CI: 1.06–1.25) was linked to each 1 μg/m3increase in the yearly average PM_(1)concentration. At lag0–1 and lag0–2, we observed a 17%–21% higher risk of hypertension associated with PM_(1). The effect of PM_(1)was 6%–11% higher compared with PM2.5. Linear concentration-exposure associations between PM_(1)exposure and hypertension were identified, without safety thresholds. Women and participants that engaged in physical exercise exhibited higher susceptibility, with 4%–22% greater risk than their counterparts. This large cohort study identified a detrimental relationship between chronic PM_(1)exposure and hypertension hospitalization, which was more pronounced compared with PM2.5and among certain groups.展开更多
基金This research was funded by the National Natural Science Foundation of China(Grant No.72074060).
文摘Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and outcome by adding control variables. However, this approach may not produce reliable estimates of causal effects. In addition to the shortcomings of the method, this lack of confidence is mainly related to ambiguous formulations in econometrics, such as the definition of selection bias, selection of core control variables, and method of testing for robustness. Within the framework of the causal models, we clarify the assumption of causal inference using regression-based statistical controls, as described in econometrics, and discuss how to select core control variables to satisfy this assumption and conduct robustness tests for regression estimates.
文摘Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artificial intelligence(XAI)algorithms is one of key steps toward to the artificial intelligence 2.0.With the aim of bringing knowledge of causal inference to scholars of machine learning and artificial intelligence,we invited researchers working on causal inference to write this survey from different aspects of causal inference.This survey includes the following sections:“Estimating average treatment effect:A brief review and beyond”from Dr.Kun Kuang,“Attribution problems in counterfactual inference”from Prof.Lian Li,“The Yule–Simpson paradox and the surrogate paradox”from Prof.Zhi Geng,“Causal potential theory”from Prof.Lei Xu,“Discovering causal information from observational data”from Prof.Kun Zhang,“Formal argumentation in causal reasoning and explanation”from Profs.Beishui Liao and Huaxin Huang,“Causal inference with complex experiments”from Prof.Peng Ding,“Instrumental variables and negative controls for observational studies”from Prof.Wang Miao,and“Causal inference with interference”from Dr.Zhichao Jiang.
基金funding from the National Natural Science Foundation of China(82272180)Open Foundation of Key Laboratory of Digital Technology in Medical Diagnostics of Zhejiang Province(SZZD202206)+2 种基金funding from the Sichuan Medical Association Scientific Research Project(S21019)funding from the Key Research and Development Project of Zhejiang Province(2021C03071)funding from Zhejiang Medical and Health Science and Technology Project(2017ZD001)。
文摘Causal inference prevails in the field of laparoscopic surgery.Once the causality between an intervention and outcome is established,the intervention can be applied to a target population to improve clinical outcomes.In many clinical scenarios,interventions are applied longitudinally in response to patients’conditions.Such longitudinal data comprise static variables,such as age,gender,and comorbidities;and dynamic variables,such as the treatment regime,laboratory variables,and vital signs.Some dynamic variables can act as both the confounder and mediator for the effect of an intervention on the outcome;in such cases,simple adjustment with a conventional regression model will bias the effect sizes.To address this,numerous statistical methods are being developed for causal inference;these include,but are not limited to,the structural marginal Cox regression model,dynamic treatment regime,and Cox regression model with time-varying covariates.This technical note provides a gentle introduction to such models and illustrates their use with an example in the field of laparoscopic surgery.
文摘Statistical approaches for evaluating causal effects and for discovering causal networks are discussed in this paper.A causal relation between two variables is different from an association or correlation between them.An association measurement between two variables and may be changed dramatically from positive to negative by omitting a third variable,which is called Yule-Simpson paradox.We shall discuss how to evaluate the causal effect of a treatment or exposure on an outcome to avoid the phenomena of Yule-Simpson paradox. Surrogates and intermediate variables are often used to reduce measurement costs or duration when measurement of endpoint variables is expensive,inconvenient,infeasible or unobservable in practice.There have been many criteria for surrogates.However,it is possible that for a surrogate satisfying these criteria,a treatment has a positive effect on the surrogate,which in turn has a positive effect on the outcome,but the treatment has a negative effect on the outcome,which is called the surrogate paradox.We shall discuss criteria for surrogates to avoid the phenomena of the surrogate paradox. Causal networks which describe the causal relationships among a large number of variables have been applied to many research fields.It is important to discover structures of causal networks from observed data.We propose a recursive approach for discovering a causal network in which a structural learning of a large network is decomposed recursively into learning of small networks.Further to discover causal relationships,we present an active learning approach in terms of external interventions on some variables.When we focus on the causes of an interest outcome, instead of discovering a whole network,we propose a local learning approach to discover these causes that affect the outcome.
基金supported by the Basic Science Center for Tibetan Plateau Earth System(BCTPES,NSFC project Grant Nos.41988101)the National Natural Science Foundation of China(Grant No.42101397)。
文摘The utilization of big Earth data has provided insights into the planet we inhabit in unprecedented dimensions and scales.Unraveling the concealed causal connections within intricate data holds paramount importance for attaining a profound comprehension of the Earth system.Statistical methods founded on correlation have predominated in Earth system science(ESS)for a long time.Nevertheless,correlation does not imply causation,especially when confronted with spurious correlations resulting from big data.Consequently,traditional correlation and regression methods are inadequate for addressing causation related problems in the Earth system.In recent years,propelled by advancements in causal theory and inference methods,particularly the maturity of causal discovery and causal graphical models,causal inference has demonstrated vigorous vitality in various research directions in the Earth system,such as regularities revealing,processes understanding,hypothesis testing,and physical models improving.This paper commences by delving into the origins,connotations,and development of causality,subsequently outlining the principal frameworks of causal inference and the commonly used methods in ESS.Additionally,it reviews the applications of causal inference in the main branches of the Earth system and summarizes the challenges and development directions of causal inference in ESS.In the big Earth data era,as an important method of big data analysis,causal inference,along with physical model and machine learning,can assist the paradigm transformation of ESS from a model-driven paradigm to a paradigm of integration of both mechanism and data.Looking forward,the establishment of a meticulously structured and normalized causal theory can act as a foundational cornerstone for fostering causal cognition in ESS and propel the leap from fragmented research towards a comprehensive understanding of the Earth system.
基金Supported by Doctoral Research Fund Project of Henan Provincial Hospital of Traditional Chinese Medicine,No.2022BSJJ10.
文摘BACKGROUND Despite being one of the most prevalent sleep disorders,obstructive sleep apnea hypoventilation syndrome(OSAHS)has limited information on its immunologic foundation.The immunological underpinnings of certain major psychiatric diseases have been uncovered in recent years thanks to the extensive use of genome-wide association studies(GWAS)and genotyping techniques using highdensity genetic markers(e.g.,SNP or CNVs).But this tactic hasn't yet been applied to OSAHS.Using a Mendelian randomization analysis,we analyzed the causal link between immune cells and the illness in order to comprehend the immunological bases of OSAHS.AIM To investigate the immune cells'association with OSAHS via genetic methods,guiding future clinical research.METHODS A comprehensive two-sample mendelian randomization study was conducted to investigate the causal relationship between immune cell characteristics and OSAHS.Summary statistics for each immune cell feature were obtained from the GWAS catalog.Information on 731 immune cell properties,such as morphologic parameters,median fluorescence intensity,absolute cellular,and relative cellular,was compiled using publicly available genetic databases.The results'robustness,heterogeneity,and horizontal pleiotropy were confirmed using extensive sensitivity examination.RESULTS Following false discovery rate(FDR)correction,no statistically significant effect of OSAHS on immunophenotypes was observed.However,two lymphocyte subsets were found to have a significant association with the risk of OSAHS:Basophil%CD33dim HLA DR-CD66b-(OR=1.03,95%CI=1.01-1.03,P<0.001);CD38 on IgD+CD24-B cell(OR=1.04,95%CI=1.02-1.04,P=0.019).CONCLUSION This study shows a strong link between immune cells and OSAHS through a gene approach,thus offering direction for potential future medical research.
文摘This paper reviewed the fruitful achievements in the science of science,sociology of science and economics of science,and their benefits to scientometric research.Then,the causal inference was introduced,which has the potential to shape scientometric research by determining the cause and effect among variables.In the end,we proposed two detailed reasons why we need causal inference in scientometric research:(1)correlation-based scientometric research is not sufficient to support science&technology policy;(2)Scientometrics needs to go beyond metrics by explaining the mechanisms in science.
基金supported by National Key Research and Development Program of China(No.2020AAA0140002)Natural Science Foundation of China(Nos.U1836217,62076240,62006225,61906199,62071468,62176025 and U21B200389)the CAAI-Huawei Mind-spore Open Fund.
文摘Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.
基金Supported in part by DFG(German Science Foundation) in ITRG1247‘Cross-modal Interaction in Natural and Artificial Cognitive Systems’(CI-NACS)
文摘Multimodal documents combining language and graphs are wide-spread in print media as well as in electronic media. One of the most important tasks to be solved in comprehending graph-text combinations is construction of causal chains among the meaning entities provided by modalities. In this study we focus on the role of annotation position and shape of graph lines in simple line graphs on causal attributions concerning the event presented by the annotation and the processes (i.e, increases and decreases) and states (no-changes) in the domain value of the graphs presented by the process-lines and state-lines. Based on the experimental investigation of readers' inferences under different conditions, guidelines for the design of multimodal documents including text and statistical information graphics are suggested. One suggestion is that the position and the number of verbal annotations should be selected appropriately, another is that the graph line smoothing should be done cautiously.
文摘Propensity score (PS) adjustment can control confounding effects and reduce bias when estimating treatment effects in non-randomized trials or observational studies. PS methods are becoming increasingly used to estimate causal effects, including when the sample size is small compared to the number of confounders. With numerous confounders, quasi-complete separation can easily occur in logistic regression used for estimating the PS, but this has not been addressed. We focused on a Bayesian PS method to address the limitations of quasi-complete separation faced by small trials. Bayesian methods are useful because they estimate the PS and causal effects simultaneously while considering the uncertainty of the PS by modelling it as a latent variable. In this study, we conducted simulations to evaluate the performance of Bayesian simultaneous PS estimation by considering the specification of prior distributions for model comparison. We propose a method to improve predictive performance with discrete outcomes in small trials. We found that the specification of prior distributions assigned to logistic regression coefficients was more important in the second step than in the first step, even when there was a quasi-complete separation in the first step. Assigning Cauchy (0, 2.5) to coefficients improved the predictive performance for estimating causal effects and improving the balancing properties of the confounder.
基金supported by the National Natural Science Foundation of China(Nos.61903345 and 61973287)。
文摘Modern industrial systems are usually in large scale,consisting of massive components and variables that form a complex system topology.Owing to the interconnections among devices,a fault may occur and propagate to exert widespread influences and lead to a variety of alarms.Obtaining the root causes of alarms is beneficial to the decision supports in making corrective alarm responses.Existing data-driven methods for alarm root cause analysis detect causal relations among alarms mainly based on historical alarm event data.To improve the accuracy,this paper proposes a causal fusion inference method for industrial alarm root cause analysis based on process topology and alarm events.A Granger causality inference method considering process topology is exploited to find out the causal relations among alarms.The topological nodes are used as the inputs of the model,and the alarm causal adjacency matrix between alarm variables is obtained by calculating the likelihood of the topological Hawkes process.The root cause is then obtained from the directed acyclic graph(DAG)among alarm variables.The effectiveness of the proposed method is verified by simulations based on both a numerical example and the Tennessee Eastman process(TEP)model.
文摘The main purpose in many randomized trials is to make an inference about the average causal effect of a treatment. Therefore, on a binary outcome, the null hypothesis for the hypothesis test should be that the causal risks are equal in the two groups. This null hypothesis is referred to as the weak causal null hypothesis. Nevertheless, at present, hypothesis tests applied in actual randomized trials are not for this null hypothesis;Fisher’s exact test is a test for the sharp causal null hypothesis that the causal effect of treatment is the same for all subjects. In general, the rejection of the sharp causal null hypothesis does not mean that the weak causal null hypothesis is rejected. Recently, Chiba developed new exact tests for the weak causal null hypothesis: a conditional exact test, which requires that a marginal total is fixed, and an unconditional exact test, which does not require that a marginal total is fixed and depends rather on the ratio of random assignment. To apply these exact tests in actual randomized trials, it is inevitable that the sample size calculation must be performed during the study design. In this paper, we present a sample size calculation procedure for these exact tests. Given the sample size, the procedure can derive the exact test power, because it examines all the patterns that can be obtained as observed data under the alternative hypothesis without large sample theories and any assumptions.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Korea government(MSIT)(2020R1A4A1018774)。
文摘With the advent of digital therapeutics(DTx),the development of software as a medical device(SaMD)for mobile and wearable devices has gained significant attention in recent years.Existing DTx evaluations,such as randomized clinical trials,mostly focus on verifying the effectiveness of DTx products.To acquire a deeper understanding of DTx engagement and behavioral adherence,beyond efficacy,a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis.In this work,the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets,to investigate contextual patterns associated with DTx usage,and to establish the(causal)relationship between DTx engagement and behavioral adherence.This review of the key components of datadriven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets,which helps to iteratively improve the receptivity of existing DTx.
基金This work was supported by grants from the National Natural Science Foundation of China under Grant Nos.72004177 and L1924078.
文摘Purpose:With the availability of large-scale scholarly datasets,scientists from various domains hope to understand the underlying mechanisms behind science,forming a vibrant area of inquiry in the emerging“science of science”field.As the results from the science of science often has strong policy implications,understanding the causal relationships between variables becomes prominent.However,the most credible quasi-experimental method among all causal inference methods,and a highly valuable tool in the empirical toolkit,Regression Discontinuity Design(RDD)has not been fully exploited in the field of science of science.In this paper,we provide a systematic survey of the RDD method,and its practical applications in the science of science.Design/methodology/approach:First,we introduce the basic assumptions,mathematical notations,and two types of RDD,i.e.,sharp and fuzzy RDD.Second,we use the Web of Science and the Microsoft Academic Graph datasets to study the evolution and citation patterns of RDD papers.Moreover,we provide a systematic survey of the applications of RDD methodologies in various scientific domains,as well as in the science of science.Finally,we demonstrate a case study to estimate the effect of Head Start Funding Proposals on child mortality.Findings:RDD was almost neglected for 30 years after it was first introduced in 1960.Afterward,scientists used mathematical and economic tools to develop the RDD methodology.After 2010,RDD methods showed strong applications in various domains,including medicine,psychology,political science and environmental science.However,we also notice that the RDD method has not been well developed in science of science research.Research Limitations:This work uses a keyword search to obtain RDD papers,which may neglect some related work.Additionally,our work does not aim to develop rigorous mathematical and technical details of RDD but rather focuses on its intuitions and applications.Practical implications:This work proposes how to use the RDD method in science of science research.Originality/value:This work systematically introduces the RDD,and calls for the awareness of using such a method in the field of science of science.
文摘Forecasting electricity demand is an essential part of the smart grid to ensure a stable and reliable power grid. With the increasing integration of renewable energy resources into the grid, forecasting the demand for electricity is critical at all levels, from the distribution to the household. Most existing forecasting methods, however, can be considered black-box models as a result of deep digitalization enablers, such as deep neural networks, which remain difficult to interpret by humans. Moreover, capture of the inter-dependencies among variables presents a significant challenge for multivariate time series forecasting. In this paper we propose eXplainable Causal Graph Neural Network (X-CGNN) for multivariate electricity demand forecasting that overcomes these limitations. As part of this method, we have intrinsic and global explanations based on causal inferences as well as local explanations based on post-hoc analyses. We have performed extensive validation on two real-world electricity demand datasets from both the household and distribution levels to demonstrate that our proposed method achieves state-of-the-art performance.
基金Supported by the National Natural Science Foundation of China(71631004, 72033008)National Science Foundation for Distinguished Young Scholars(71625001)Science Foundation of Ministry of Education of China(19YJA910003)。
文摘The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on some recent advances in statistical methodologies and models to evaluate programs with high-dimensional data. In particular, four kinds of methods for making valid statistical inferences for treatment effects in high dimensions are addressed. The first one is the so-called doubly robust type estimation, which models the outcome regression and propensity score functions simultaneously. The second one is the covariate balance method to construct the treatment effect estimators. The third one is the sufficient dimension reduction approach for causal inferences. The last one is the machine learning procedure directly or indirectly to make statistical inferences to treatment effect. In such a way, some of these methods and models are closely related to the de-biased Lasso type methods for the regression model with high dimensions in the statistical literature. Finally, some future research topics are also discussed.
基金supported by the China-Australian Collaborative Grant[NSFC 81561128020-NHMRC APP1112767].
文摘Objective Traditional epidemiological studies have shown that C-reactive protein(CRP)is associated with the risk of cardiovascular diseases(CVDs).However,whether this association is causal remains unclear.Therefore,Mendelian randomization(MR)was used to explore the causal relationship of CRP with cardiovascular outcomes including ischemic stroke,atrial fibrillation,arrhythmia and congestive heart failure.Methods We performed two-sample MR by using summary-level data obtained from Japanese Encyclopedia of Genetic association by Riken(JENGER),and we selected four single-nucleotide polymorphisms associated with CRP level as instrumental variables.MR estimates were calculated with the inverse-variance weighted(IVW),penalized weighted median and weighted median.MR-Egger regression was used to explore pleiotropy.Results No significant causal association of genetically determined CRP level with ischemic stroke,atrial fibrillation or arrhythmia was found with all four MR methods(all Ps>0.05).The IVW method indicated suggestive evidence of a causal association between CRP and congestive heart failure(OR:1.337,95%CI:1.005–1.780,P=0.046),whereas the other three methods did not.No clear pleiotropy or heterogeneity were observed.Conclusions Suggestive evidence was found only in analysis of congestive heart failure;therefore,further studies are necessary.Furthermore,no causal association was found between CRP and the other three cardiovascular outcomes.
基金supported by the National Key Research and Development Program of China (Grant No. 2017YFA0505500)Japan Society for the Promotion of Science KAKENHI Program (Grant No. JP15H05707)National Natural Science Foundation of China (Grant Nos. 11771010,31771476,91530320, 91529303,91439103 and 81471047)
文摘Natural systems are typically nonlinear and complex, and it is of great interest to be able to reconstruct a system in order to understand its mechanism, which cannot only recover nonlinear behaviors but also predict future dynamics. Due to the advances of modern technology, big data becomes increasingly accessible and consequently the problem of reconstructing systems from measured data or time series plays a central role in many scientific disciplines. In recent decades, nonlinear methods rooted in state space reconstruction have been developed, and they do not assume any model equations but can recover the dynamics purely from the measured time series data. In this review, the development of state space reconstruction techniques will be introduced and the recent advances in systems prediction and causality inference using state space reconstruction will be presented. Particularly, the cutting-edge method to deal with short-term time series data will be focused on.Finally, the advantages as well as the remaining problems in this field are discussed.
文摘It has been evidenced that peer review activities are positively correlated to scientists’bibliometric performance(e.g.,Ortega,2017,2019).However,how the number of paper’reviewing’interacts with a scientist’s’publishing’has not been addressed in previous studies.This paper attempts to employ the Granger causality inference to explore the directionality between a scientist’s publication performance and his/her review activities.Our dataset comprises scientists’reviewed articles derived from Publons in the Web of Knowledge database,and their publications retrieved from Pub Med.We find that scientists who reviewed less or published less tend to have Granger causality between reviewing and publishing activities.In addition,compared with early-career researchers,reviewing advances publishing for senior scientists.
基金supported by the National Key Research and Development Program of China (2022YFC3600804)the National Natural Science Foundation of China (82204162, 82204154)+4 种基金Young Elite Scientist Sponsorship Program by China Association for Science and Technology (2023QNRC001)Guangdong Provincial Pearl River Talents Program (0920220207)Basic and Applied Basic Research Foundation of Guangdong Province (2022A1515010823)Guangzhou Municipal Science and Technology Bureau (2023A04J2072)Fundamental Research Funds for the Central Universities, Sun Yat-sen University (23qnpy108)。
文摘Limited evidence exists on the effect of submicronic particulate matter(PM_(1)) on hypertension hospitalization. Evidence based on causal inference and large cohorts is even more scarce. In 2015, 36,271 participants were enrolled in South China and followed up through 2020. Each participant was assigned single-year, lag0–1, and lag0–2 moving average concentration of PM_(1)and fine inhalable particulate matter(PM2.5) simulated based on satellite data at a 1-km resolution. We used an inverse probability weighting approach to balance confounders and utilized a marginal structural Cox model to evaluate the underlying causal links between PM_(1)exposure and hypertension hospitalization, with PM2.5-hypertension association for comparison. Several sensitivity studies and the analyses of effect modification were also conducted. We found that a higher hospitalization risk from both overall(HR: 1.13, 95% CI:1.05–1.22) and essential hypertension(HR: 1.15, 95% CI: 1.06–1.25) was linked to each 1 μg/m3increase in the yearly average PM_(1)concentration. At lag0–1 and lag0–2, we observed a 17%–21% higher risk of hypertension associated with PM_(1). The effect of PM_(1)was 6%–11% higher compared with PM2.5. Linear concentration-exposure associations between PM_(1)exposure and hypertension were identified, without safety thresholds. Women and participants that engaged in physical exercise exhibited higher susceptibility, with 4%–22% greater risk than their counterparts. This large cohort study identified a detrimental relationship between chronic PM_(1)exposure and hypertension hospitalization, which was more pronounced compared with PM2.5and among certain groups.