Finding causality merely from observed data is a fundamental problem in science. The most basic form of this causal problem is to determine whether X leads to Y or Y leads to X in the case of joint observation of two ...Finding causality merely from observed data is a fundamental problem in science. The most basic form of this causal problem is to determine whether X leads to Y or Y leads to X in the case of joint observation of two variables X, Y. In statistics, path analysis is used to describe the direct dependence between a set of variables. But in fact, we usually do not know the causal order between variables. However, ignoring the direction of the causal path will prevent researchers from analyzing or using causal models. In this study, we propose a method for estimating causality based on observed data. First, observed variables are cleaned and valid variables are retained. Then, a direct linear non-Gaussian acyclic graph models(DirectLiNGAM) estimates the causal order K between variables. The third step is to estimate the adjacency matrix B of the causal relationship based on K. Next, since B is not convenient for model interpretation, we use adaptive lasso to prune the causal path and variables. Further, a causal path graph and a recursive model are established. Finally, we test and debug the recursive model, obtain a causal model with good fit, and estimate the direct, indirect and total effects between causal variables. This paper overcomes the randomness assigning causal order to variables. This study is different from the researcher’s understanding of his own model by generating some form of simulation data. The simplest and relatively unsmooth statistical learning method used in this study has obvious advantages in the field of interpretable machine learning.展开更多
基金Supported by the National Natural Science Foundation of China(61573266)
文摘Finding causality merely from observed data is a fundamental problem in science. The most basic form of this causal problem is to determine whether X leads to Y or Y leads to X in the case of joint observation of two variables X, Y. In statistics, path analysis is used to describe the direct dependence between a set of variables. But in fact, we usually do not know the causal order between variables. However, ignoring the direction of the causal path will prevent researchers from analyzing or using causal models. In this study, we propose a method for estimating causality based on observed data. First, observed variables are cleaned and valid variables are retained. Then, a direct linear non-Gaussian acyclic graph models(DirectLiNGAM) estimates the causal order K between variables. The third step is to estimate the adjacency matrix B of the causal relationship based on K. Next, since B is not convenient for model interpretation, we use adaptive lasso to prune the causal path and variables. Further, a causal path graph and a recursive model are established. Finally, we test and debug the recursive model, obtain a causal model with good fit, and estimate the direct, indirect and total effects between causal variables. This paper overcomes the randomness assigning causal order to variables. This study is different from the researcher’s understanding of his own model by generating some form of simulation data. The simplest and relatively unsmooth statistical learning method used in this study has obvious advantages in the field of interpretable machine learning.