摘要
从观测数据中探索和发现蕴含在变量间的因果关系是大数据时代的基本任务之一,它将在未来各种数据驱动应用中发挥关键作用。推断观测数据间因果关系的方向是此任务的一类基础问题。最近研究表明基于最小描述长度MDL(minimum description length)的全局和局部回归(GLR)算法具有较高的推断准确率及较广的适用性。然而,在GLR模型中由于冗余模型的存在而严重限制了该算法的效率。为避免模型冗余,根据模型的不同特征采取分别构建GLR模型的方法,并在此基础上提出一个改进的用于因果定向的ISLOPE算法。实验结果表明,在保持原算法准确率近似不变的前提下,该算法有效地节约了运行时间,进而提升了算法效率。
Exploring and uncovering the causality between variables from the observed data is one of the fundamental tasks in the era of big data, and will play a crucial role in various data driven applications in the future. Inferring the direction of causality between observation data is a basic problem of this task. Recent studies have shown that the global and local regression(GLR) algorithm based on the minimum description length (MDL) has higher inferential accuracy and wider applicability. However, the efficiency of the algorithm is significantly limited due to the redundant models in CLR model. In the paper, we adopted the method of building CLR model separately according to the different characteristics of the model to avoid the model redundancy. And on this basis, we proposed an improved SLOPE algorithm for causal orientation. Experimental results show that ISLOPE can effectively save the running time and improve the efficiency under the premise that the accuracy of the original algorithm is approximately unchanged.
作者
潘孟姣
蔡青松
Pan Mengjiao;Cai Qingsong(School of Computer and Information Engineering,Beijing Technology and Business University,Beijing 100048,China)
出处
《计算机应用与软件》
北大核心
2018年第10期238-244,共7页
Computer Applications and Software
基金
北京市自然科学基金项目(4172013)
关键词
全局/局部回归模型
最小描述长度
模型冗余
因果定向
加性噪声模型
Global/Local regression model
Minimum description length
Model redundancy
Causal-effect orientation
Additive noise model