Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual inform...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.展开更多
Fault diagnostics is important for safe operation of nuclear power plants(NPPs). In recent years, data-driven approaches have been proposed and implemented to tackle the problem, e.g., neural networks, fuzzy and neuro...Fault diagnostics is important for safe operation of nuclear power plants(NPPs). In recent years, data-driven approaches have been proposed and implemented to tackle the problem, e.g., neural networks, fuzzy and neurofuzzy approaches, support vector machine, K-nearest neighbor classifiers and inference methodologies. Among these methods, dynamic uncertain causality graph(DUCG)has been proved effective in many practical cases. However, the causal graph construction behind the DUCG is complicate and, in many cases, results redundant on the symptoms needed to correctly classify the fault. In this paper, we propose a method to simplify causal graph construction in an automatic way. The method consists in transforming the expert knowledge-based DCUG into a fuzzy decision tree(FDT) by extracting from the DUCG a fuzzy rule base that resumes the used symptoms at the basis of the FDT. Genetic algorithm(GA) is, then, used for the optimization of the FDT, by performing a wrapper search around the FDT: the set of symptoms selected during the iterative search are taken as the best set of symptoms for the diagnosis of the faults that can occur in the system. The effectiveness of the approach is shown with respect to a DUCG model initially built to diagnose 23 faults originally using 262 symptoms of Unit-1 in the Ningde NPP of the China Guangdong Nuclear Power Corporation. The results show that the FDT, with GA-optimized symptoms and diagnosis strategy, can drive the construction of DUCG and lower the computational burden without loss of accuracy in diagnosis.展开更多
Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.Traditional causality inference methods have a salient limitation that the model must be linear...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.Traditional causality inference methods have a salient limitation that the model must be linear and with Gaussian noise.Although additive model regression can effectively infer the nonlinear causal relationships of additive nonlinear time series,it suffers from the limitation that contemporaneous causal relationships of variables must be linear and not always valid to test conditional independence relations.This paper provides a nonparametric method that employs both mutual information and conditional mutual information to identify causal structure of a class of nonlinear time series models,which extends the additive nonlinear times series to nonlinear structural vector autoregressive models.An algorithm is developed to learn the contemporaneous and the lagged causal relationships of variables.Simulations demonstrate the effectiveness of the proposed method.展开更多
动态网络链路预测广泛的应用前景,使得其逐渐成为网络科学研究的热点.动态网络链路演化过程中具有复杂的空间相关性和时间依赖性,导致其链路预测任务极具挑战.提出一个基于时序图卷积的动态网络链路预测模型(dynamic network link predi...动态网络链路预测广泛的应用前景,使得其逐渐成为网络科学研究的热点.动态网络链路演化过程中具有复杂的空间相关性和时间依赖性,导致其链路预测任务极具挑战.提出一个基于时序图卷积的动态网络链路预测模型(dynamic network link prediction based on sequential graph convolution, DNLP-SGC).针对网络快照序列不能有效反映动态网络连续性的问题,采用边缘触发机制对原始网络权重矩阵进行修正,弥补了离散快照表示动态网络存在时序信息丢失的不足.从网络演化过程出发,综合考虑节点间的特征相似性以及历史交互信息,采用时序图卷积提取动态网络中节点的特征,该方法融合了节点时空依赖关系.进一步,采用因果卷积网络捕获网络演化过程中潜在的全局时序特征,实现动态网络链路预测.在2个真实的网络数据集上的实验结果表明,DNLP-SGC在precision, recall, AUC指标上均优于对比的基线模型.展开更多
To combat increasingly sophisticated cyber attacks,the security community has proposed and deployed a large body of threat detection approaches to discover malicious behaviors on host systems and attack payloads in ne...To combat increasingly sophisticated cyber attacks,the security community has proposed and deployed a large body of threat detection approaches to discover malicious behaviors on host systems and attack payloads in network traffic.Several studies have begun to focus on threat detection methods based on provenance data of host-level event tracing.On the other side,with the significant development of big data and artificial intelligence technologies,large-scale graph computing has been widely used.To this end,kinds of research try to bridge the gap between threat detection based on host log provenance data and graph algorithm,and propose the threat detection algorithm based on system provenance graph.These approaches usually generate the system provenance graph via tagging and tracking of system events,and then leverage the characteristics of the graph to conduct threat detection and attack investigation.For the purpose of deeply understanding the correctness,effectiveness,and efficiency of different graph-based threat detection algorithms,we pay attention to mainstream threat detection methods based on provenance graphs.We select and implement 5 state-of-the-art threat detection approaches among a large number of studies as evaluation objects for further analysis.To this end,we collect about 40GB of host-level raw log data in a real-world IT environment,and simulate 6 types of cyber attack scenarios in an isolated environment for malicious provenance data to build our evaluation datasets.The crosswise comparison and longitudinal assessment interpret in detail these detection approaches can detect which attack scenarios well and why.Our empirical evaluation provides a solid foundation for the improvement direction of the threat detection approach.展开更多
It is desired to obtain the joint probability distribution(JPD) over a set of random variables with local data, so as to avoid the hard work to collect statistical data in the scale of all variables. A lot of work has...It is desired to obtain the joint probability distribution(JPD) over a set of random variables with local data, so as to avoid the hard work to collect statistical data in the scale of all variables. A lot of work has been done when all variables are in a known directed acyclic graph(DAG). However, steady directed cyclic graphs(DCGs) may be involved when we simply combine modules containing local data together, where a module is composed of a child variable and its parent variables. So far, the physical and statistical meaning of steady DCGs remain unclear and unsolved. This paper illustrates the physical and statistical meaning of steady DCGs, and presents a method to calculate the JPD with local data, given that all variables are in a known single-valued Dynamic Uncertain Causality Graph(S-DUCG), and thus defines a new Bayesian Network with steady DCGs. The so-called single-valued means that only the causes of the true state of a variable are specified, while the false state is the complement of the true state.展开更多
针对自然语言处理中传统因果关系抽取主要用基于模式匹配的方法或机器学习算法进行抽取,结果准确率较低,且只能抽取带有因果提示词的显性因果关系问题,提出一种使用大规模的预训练模型结合图卷积神经网络的算法BERT-GCN.首先,使用BERT(b...针对自然语言处理中传统因果关系抽取主要用基于模式匹配的方法或机器学习算法进行抽取,结果准确率较低,且只能抽取带有因果提示词的显性因果关系问题,提出一种使用大规模的预训练模型结合图卷积神经网络的算法BERT-GCN.首先,使用BERT(bidirectional encoder representation from transformers)对语料进行编码,生成词向量;然后,将生成的词向量放入图卷积神经网络中进行训练;最后,放入Softmax层中完成对因果关系的抽取.实验结果表明,该模型在数据集SEDR-CE上获得了较好的结果,且针对隐式的因果关系效果也较好.展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150, 10926197,61201323
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.
文摘Fault diagnostics is important for safe operation of nuclear power plants(NPPs). In recent years, data-driven approaches have been proposed and implemented to tackle the problem, e.g., neural networks, fuzzy and neurofuzzy approaches, support vector machine, K-nearest neighbor classifiers and inference methodologies. Among these methods, dynamic uncertain causality graph(DUCG)has been proved effective in many practical cases. However, the causal graph construction behind the DUCG is complicate and, in many cases, results redundant on the symptoms needed to correctly classify the fault. In this paper, we propose a method to simplify causal graph construction in an automatic way. The method consists in transforming the expert knowledge-based DCUG into a fuzzy decision tree(FDT) by extracting from the DUCG a fuzzy rule base that resumes the used symptoms at the basis of the FDT. Genetic algorithm(GA) is, then, used for the optimization of the FDT, by performing a wrapper search around the FDT: the set of symptoms selected during the iterative search are taken as the best set of symptoms for the diagnosis of the faults that can occur in the system. The effectiveness of the approach is shown with respect to a DUCG model initially built to diagnose 23 faults originally using 262 symptoms of Unit-1 in the Ningde NPP of the China Guangdong Nuclear Power Corporation. The results show that the FDT, with GA-optimized symptoms and diagnosis strategy, can drive the construction of DUCG and lower the computational burden without loss of accuracy in diagnosis.
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150 and 10926197
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.Traditional causality inference methods have a salient limitation that the model must be linear and with Gaussian noise.Although additive model regression can effectively infer the nonlinear causal relationships of additive nonlinear time series,it suffers from the limitation that contemporaneous causal relationships of variables must be linear and not always valid to test conditional independence relations.This paper provides a nonparametric method that employs both mutual information and conditional mutual information to identify causal structure of a class of nonlinear time series models,which extends the additive nonlinear times series to nonlinear structural vector autoregressive models.An algorithm is developed to learn the contemporaneous and the lagged causal relationships of variables.Simulations demonstrate the effectiveness of the proposed method.
文摘动态网络链路预测广泛的应用前景,使得其逐渐成为网络科学研究的热点.动态网络链路演化过程中具有复杂的空间相关性和时间依赖性,导致其链路预测任务极具挑战.提出一个基于时序图卷积的动态网络链路预测模型(dynamic network link prediction based on sequential graph convolution, DNLP-SGC).针对网络快照序列不能有效反映动态网络连续性的问题,采用边缘触发机制对原始网络权重矩阵进行修正,弥补了离散快照表示动态网络存在时序信息丢失的不足.从网络演化过程出发,综合考虑节点间的特征相似性以及历史交互信息,采用时序图卷积提取动态网络中节点的特征,该方法融合了节点时空依赖关系.进一步,采用因果卷积网络捕获网络演化过程中潜在的全局时序特征,实现动态网络链路预测.在2个真实的网络数据集上的实验结果表明,DNLP-SGC在precision, recall, AUC指标上均优于对比的基线模型.
基金supported by National Natural Science Foundation of China (No. U1736218)National Key R&D Program of China (No. 2018YFB0804704)partially supported by CNCERT/CC
文摘To combat increasingly sophisticated cyber attacks,the security community has proposed and deployed a large body of threat detection approaches to discover malicious behaviors on host systems and attack payloads in network traffic.Several studies have begun to focus on threat detection methods based on provenance data of host-level event tracing.On the other side,with the significant development of big data and artificial intelligence technologies,large-scale graph computing has been widely used.To this end,kinds of research try to bridge the gap between threat detection based on host log provenance data and graph algorithm,and propose the threat detection algorithm based on system provenance graph.These approaches usually generate the system provenance graph via tagging and tracking of system events,and then leverage the characteristics of the graph to conduct threat detection and attack investigation.For the purpose of deeply understanding the correctness,effectiveness,and efficiency of different graph-based threat detection algorithms,we pay attention to mainstream threat detection methods based on provenance graphs.We select and implement 5 state-of-the-art threat detection approaches among a large number of studies as evaluation objects for further analysis.To this end,we collect about 40GB of host-level raw log data in a real-world IT environment,and simulate 6 types of cyber attack scenarios in an isolated environment for malicious provenance data to build our evaluation datasets.The crosswise comparison and longitudinal assessment interpret in detail these detection approaches can detect which attack scenarios well and why.Our empirical evaluation provides a solid foundation for the improvement direction of the threat detection approach.
基金supported by the National Natural Science Foundation of China under Grant 71671103
文摘It is desired to obtain the joint probability distribution(JPD) over a set of random variables with local data, so as to avoid the hard work to collect statistical data in the scale of all variables. A lot of work has been done when all variables are in a known directed acyclic graph(DAG). However, steady directed cyclic graphs(DCGs) may be involved when we simply combine modules containing local data together, where a module is composed of a child variable and its parent variables. So far, the physical and statistical meaning of steady DCGs remain unclear and unsolved. This paper illustrates the physical and statistical meaning of steady DCGs, and presents a method to calculate the JPD with local data, given that all variables are in a known single-valued Dynamic Uncertain Causality Graph(S-DUCG), and thus defines a new Bayesian Network with steady DCGs. The so-called single-valued means that only the causes of the true state of a variable are specified, while the false state is the complement of the true state.
文摘针对自然语言处理中传统因果关系抽取主要用基于模式匹配的方法或机器学习算法进行抽取,结果准确率较低,且只能抽取带有因果提示词的显性因果关系问题,提出一种使用大规模的预训练模型结合图卷积神经网络的算法BERT-GCN.首先,使用BERT(bidirectional encoder representation from transformers)对语料进行编码,生成词向量;然后,将生成的词向量放入图卷积神经网络中进行训练;最后,放入Softmax层中完成对因果关系的抽取.实验结果表明,该模型在数据集SEDR-CE上获得了较好的结果,且针对隐式的因果关系效果也较好.