摘要
探索事物之间的因果关系是数据科学的核心问题。在实际场景中,缺失值的存在给基于约束的方法和基于结构方程模型的方法带来巨大挑战。现有的缺失值因果学习方法虽然可以处理随机缺失数据上的因果结构学习问题,但是对于非随机缺失数据,学习因果结构网络中的因果对和马尔可夫等价类结构以及校正因缺失导致错误因果方向等仍未得到解决。为此,基于结构方程似然框架提出新的缺失值因果学习算法MV-SELF。利用非线性加性噪声模型的条件概率分布可以转换为噪声分布表示性质,设计一种基于最大化似然的评分,实现基于评分的因果结构搜索框架。同时,为解决非随机缺失下的因果结构学习问题,利用逆概率加权校正工具来恢复缺失数据的联合分布,从而校正因缺失导致的冗余边和错误因果方向,实现对缺失数据上的高维因果结构搜索。仿真实验结果表明,相比TD-PC、MVPC、Structure EM算法,MV-SELF的F1值提高了3%~19%,能有效区分马尔可夫等价类。
Exploring causal relationships between entities is crucial in data science.In practical scenarios missing values pose significant challenges to both constraint-based and structural equation model-based methods.Although existing causal learning methods effectively address random missing data,discerning causal structures in non-random missing data remains problematic.Challenges include learning causal pairs,identifying Markov equivalence class structures,and correcting causal direction errors in causal structure networks.To tackle these issues,this paper introduces a novel algorithm,MV-SELF,based on the structural equation likelihood framework.This algorithm transforms the conditional probability distribution of a nonlinear Additive Noise Model(ANM)into a representation of noise distribution.Consequently,it enables a maximum likelihood-based scoring mechanism for causal structure search.Additionally,MV-SELF utilizes Inverse Probability Weight(IPW)correction to counteract non-random deletions.This approach effectively restores the joint distribution of missing data,thereby correcting redundant edges and inaccurate causal directions.It facilitates high-dimensional causal structure searches in datasets with missing values.Simulation experiments reveal that MV-SELF outperforms TD-PC,MVPC,and Structure EM algorithms,achieving a 3%to 19%increase in F1 value.This improvement highlights MV-SELF's effectiveness in distinguishing Markov equivalence classes.
作者
郝志峰
喻建华
乔杰
蔡瑞初
HAO Zhifeng;YU Jianhua;QIAO Jie;CAI Ruichu(School of Computer,Guangdong University of Technology,Guangzhou 510006,China;College of Science,Shantou University,Shantou 515063,Guangdong,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2023年第12期63-70,共8页
Computer Engineering
基金
国家自然科学基金(61876043,61976052,62206064)
国家优秀青年科学基金(62122022)
科技创新2030—“新一代人工智能”重大项目(2021ZD0111501)。
关键词
结构方程似然框架
缺失数据
逆概率加权
因果方向学习
加性噪声模型
structural equation likelihood framework
missing datas
Inverse Probability Weight(IPW)
causal discovery learning
Additive Noise Model(ANM)