It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all...It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.展开更多
The original Bell inequality was obtained in a statistical derivation assuming three mutually cross-correlated random variables (four in the later version). Given that observations destroy the particles, the physical ...The original Bell inequality was obtained in a statistical derivation assuming three mutually cross-correlated random variables (four in the later version). Given that observations destroy the particles, the physical realization of three variables from an experiment producing two particles per trial requires two separate trial runs. One assumed variable value (for particle 1) occurs at a fixed instrument setting in both trial runs while a second variable (for particle 2) occurs at alternative instrument settings in the two trial runs. Given that measurements on the two particles occurring in each trial are themselves correlated, measurements from independent realizations at mutually exclusive settings on particle 2 are conditionally independent, i.e., conditionally dependent on particle 1, through probability. This situation is realized from variables defined by Bell using entangled particle pairs. Two correlations have the form that Bell computed from entanglement, but a third correlation from conditionally independent measurements has a different form. When the correlations are computed using quantum probabilities, the Bell inequality is satisfied without recourse to assumptions of non-locality, or non-reality.展开更多
The learning Bayesian network (BN) structure from data is an NP-hard problem and still one of the most exciting chal- lenges in the machine learning. In this work, a novel algorithm is presented which combines ideas...The learning Bayesian network (BN) structure from data is an NP-hard problem and still one of the most exciting chal- lenges in the machine learning. In this work, a novel algorithm is presented which combines ideas from local learning, constraint- based, and search-and-score techniques in a principled and ef- fective way. It first reconstructs the junction tree of a BN and then performs a K2-scoring greedy search to orientate the local edges in the cliques of junction tree. Theoretical and experimental results show the proposed algorithm is capable of handling networks with a large number of variables. Its comparison with the well-known K2 algorithm is also presented.展开更多
We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257...We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257], we present algebraic-geometry characterizations of these two problems, and propose two corresponding algorithms. These algorithms can be realized with any computer algebra system when the number of variables is small.展开更多
In this paper, a module level fault diagnosis method is presented which considers multi-port device or subnetwork as the basic unit. The fault model in this method is quite similar to an actual condition,hence it has ...In this paper, a module level fault diagnosis method is presented which considers multi-port device or subnetwork as the basic unit. The fault model in this method is quite similar to an actual condition,hence it has practical meaning. The equations of moedule level fault diagnosis are derived, and thetestability problem for module-fault diagnosis is discussed in general. The paper then gives severaltoplolgical conditions for module-fault testubility, which are applicable to a general nonreciprocal network by introducing a generalized independent path.展开更多
It's a well-known fact that constraint-based algorithms for learning Bayesian network(BN) structure reckon on a large number of conditional independence(C1) tests.Therefore,it is difficult to learn a BN for indica...It's a well-known fact that constraint-based algorithms for learning Bayesian network(BN) structure reckon on a large number of conditional independence(C1) tests.Therefore,it is difficult to learn a BN for indicating the original causal relations in the true graph.In this paper,a two-phase method for learning equivalence class of BN is introduced.The first phase of the method learns a skeleton of the BN by CI tests.In this way,it reduces the number of tests compared with other existing algorithms and decreases the running time drastically.The second phase of the method orients edges that exist in all BN equivalence classes.Our method is tested on the ALARM network and experimental results show that our approach outperforms the other algorithms.展开更多
Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual inform...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.展开更多
t The rapid advancement of single-cell technologies has shed new light on the complex mechanisms of cellular heterogeneity.However,compared to bulk RNA sequencing(RNA-seq),single-cell RNA-seq(scRNA-seq)suffers from hi...t The rapid advancement of single-cell technologies has shed new light on the complex mechanisms of cellular heterogeneity.However,compared to bulk RNA sequencing(RNA-seq),single-cell RNA-seq(scRNA-seq)suffers from higher noise and lower coverage,which brings new computational difficulties.Based on statistical independence,cell-specific network(CSN)is able to quantify the overall associations between genes for each cell,yet suffering from a problem of overestimation related to indirect effects.To overcome this problem,we propose the c-CSN method,which can construct the conditional cell-specific network(CCSN)for each cell.c-CSN method can measure the direct associations between genes by eliminating the indirect associations.c-CSN can be used for cell clustering and dimension reduction on a network basis of single cells.Intuitively,each CCSN can be viewed as the transformation from less“reliable”gene expression to more“reliable”gene–gene associations in a cell.Based on CCSN,we further design network flow entropy(NFE)to estimate the differentiation potency of a single cell.A number of scRNA-seq datasets were used to demonstrate the advantages of our approach.1)One direct association network is generated for one cell.2)Most existing scRNA-seq methods designed for gene expression matrices are also applicable to c-CSN-transformed degree matrices.3)CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell.c-CSN is publicly available at https://github.com/LinLi-0909/c-CSN.展开更多
Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of l...Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of learning the structure. In most cases, the score of Bayesian network is defined as adding the log-likelihood score and complexity score by using the penalty function. If the penalty function is set unreasonably, it may hurt the performance of structure search. Thus, Bayesian network structure learning is essentially a bi-objective optimization problem. However, the existing bi-objective structure learning algorithms can only be applied to small-scale networks. To this end, this paper proposes a bi-objective evolutionary Bayesian network structure learning algorithm via skeleton constraint (BBS) for the medium-scale networks. To boost the performance of searching, BBS introduces the random order prior (ROP) initial operator. ROP generates a skeleton to constrain the searching space, which is the key to expanding the scale of structure learning problems. Then, the acyclic structures are guaranteed by adding the orders of variables in the initial skeleton. After that, BBS designs the Pareto rank based crossover and skeleton guided mutation operators. The operators operate on the skeleton obtained in ROP to make the search more targeted. Finally, BBS provides a strategy to choose the final solution. The experimental results show that BBS can always find the structure which is closer to the ground truth compared with the single-objective structure learning methods. Furthermore, compared with the existing bi-objective structure learning methods, BBS is scalable and can be applied to medium-scale Bayesian network datasets. On the educational problem of discovering the influencing factors of students’ academic performance, BBS provides higher quality solutions and is featured with the flexibility of solution selection compared with the widely-used Bayesian network structure learning methods.展开更多
Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms fo...Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms for this problem. Considering the unreliability of high order condition independence(CI) tests, and to improve the efficiency of a dependency analysis algorithm, the key steps are to use few numbers of CI tests and reduce the sizes of conditioning sets as much as possible. Based on these reasons and inspired by the algorithm PC, we present an algorithm, named fast and efficient PC(FEPC), for learning the adjacent neighbourhood of every variable. FEPC implements the CI tests by three kinds of orders, which reduces the high order CI tests significantly. Compared with current algorithm proposals, the experiment results show that FEPC has better accuracy with fewer numbers of condition independence tests and smaller size of conditioning sets. The highest reduction percentage of CI test is 83.3% by EFPC compared with PC algorithm.展开更多
impson's paradox reminds people that the statistical inference in a low-dimensional space probably distorts the reality in a high one seriously.To study the paradox with respect to Yule's measure,this paper di...impson's paradox reminds people that the statistical inference in a low-dimensional space probably distorts the reality in a high one seriously.To study the paradox with respect to Yule's measure,this paper discusses simple collapsibility,strong collapsibility and consecutive collapsibility,and presents necessary and sufficient conditions of them.In fact,these conditions are of great importance for observational and experimental designs,eliminating confounding bias,categorizing discrete variables and so on.展开更多
基金supported by the Fundamental Research Funds for the Central Universities(17CX02035A)supported by NNSF of China(11601197,11461029,61563018)+2 种基金China Postdoctoral Science Foundation funded project(2016M600511,2017T100475)NSF of Jiangxi Province(20171ACB21030,20161BAB201024,20161ACB200009)the Key Science Fund Project of Jiangxi provincial education department(GJJ150439)
文摘It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.
文摘The original Bell inequality was obtained in a statistical derivation assuming three mutually cross-correlated random variables (four in the later version). Given that observations destroy the particles, the physical realization of three variables from an experiment producing two particles per trial requires two separate trial runs. One assumed variable value (for particle 1) occurs at a fixed instrument setting in both trial runs while a second variable (for particle 2) occurs at alternative instrument settings in the two trial runs. Given that measurements on the two particles occurring in each trial are themselves correlated, measurements from independent realizations at mutually exclusive settings on particle 2 are conditionally independent, i.e., conditionally dependent on particle 1, through probability. This situation is realized from variables defined by Bell using entangled particle pairs. Two correlations have the form that Bell computed from entanglement, but a third correlation from conditionally independent measurements has a different form. When the correlations are computed using quantum probabilities, the Bell inequality is satisfied without recourse to assumptions of non-locality, or non-reality.
基金supported by the National Natural Science Fundation of China (6097408261075055)the Fundamental Research Funds for the Central Universities (K50510700004)
文摘The learning Bayesian network (BN) structure from data is an NP-hard problem and still one of the most exciting chal- lenges in the machine learning. In this work, a novel algorithm is presented which combines ideas from local learning, constraint- based, and search-and-score techniques in a principled and ef- fective way. It first reconstructs the junction tree of a BN and then performs a K2-scoring greedy search to orientate the local edges in the cliques of junction tree. Theoretical and experimental results show the proposed algorithm is capable of handling networks with a large number of variables. Its comparison with the well-known K2 algorithm is also presented.
基金The authors wish to thank the referees for very helpful comments which greatly improved the presentation of this paper. This work was partially supported by the National Natural Science Foundation of China (Grant No. 11025102), Program for Changjiang Scholars and Innovative Research Team in University, and the Jilin Project (20100401).
文摘We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257], we present algebraic-geometry characterizations of these two problems, and propose two corresponding algorithms. These algorithms can be realized with any computer algebra system when the number of variables is small.
文摘In this paper, a module level fault diagnosis method is presented which considers multi-port device or subnetwork as the basic unit. The fault model in this method is quite similar to an actual condition,hence it has practical meaning. The equations of moedule level fault diagnosis are derived, and thetestability problem for module-fault diagnosis is discussed in general. The paper then gives severaltoplolgical conditions for module-fault testubility, which are applicable to a general nonreciprocal network by introducing a generalized independent path.
文摘It's a well-known fact that constraint-based algorithms for learning Bayesian network(BN) structure reckon on a large number of conditional independence(C1) tests.Therefore,it is difficult to learn a BN for indicating the original causal relations in the true graph.In this paper,a two-phase method for learning equivalence class of BN is introduced.The first phase of the method learns a skeleton of the BN by CI tests.In this way,it reduces the number of tests compared with other existing algorithms and decreases the running time drastically.The second phase of the method orients edges that exist in all BN equivalence classes.Our method is tested on the ALARM network and experimental results show that our approach outperforms the other algorithms.
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150, 10926197,61201323
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.
基金the National Key R&D Program of China(Grant No.2017YFA0505500)the National Natural Science Foundation of China(Grant Nos.31771476 and 31930022)the Shanghai Municipal Science and Technology Major Project,China(Grant No.2017SHZDZX01).
文摘t The rapid advancement of single-cell technologies has shed new light on the complex mechanisms of cellular heterogeneity.However,compared to bulk RNA sequencing(RNA-seq),single-cell RNA-seq(scRNA-seq)suffers from higher noise and lower coverage,which brings new computational difficulties.Based on statistical independence,cell-specific network(CSN)is able to quantify the overall associations between genes for each cell,yet suffering from a problem of overestimation related to indirect effects.To overcome this problem,we propose the c-CSN method,which can construct the conditional cell-specific network(CCSN)for each cell.c-CSN method can measure the direct associations between genes by eliminating the indirect associations.c-CSN can be used for cell clustering and dimension reduction on a network basis of single cells.Intuitively,each CCSN can be viewed as the transformation from less“reliable”gene expression to more“reliable”gene–gene associations in a cell.Based on CCSN,we further design network flow entropy(NFE)to estimate the differentiation potency of a single cell.A number of scRNA-seq datasets were used to demonstrate the advantages of our approach.1)One direct association network is generated for one cell.2)Most existing scRNA-seq methods designed for gene expression matrices are also applicable to c-CSN-transformed degree matrices.3)CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell.c-CSN is publicly available at https://github.com/LinLi-0909/c-CSN.
基金supported by the Fundamental Research Funds for the Central Universities,the Science and Technology Commission of Shanghai Municipality(No.19511120601)the Scientific and Technological Innovation 2030 Major Projects(No.2018AAA0100902)+1 种基金the CCF-AFSG Research Fund(No.CCF-AFSG RF20220205)the“Chenguang Program”sponsored by Shanghai Education Development Foundation and Shanghai Municipal Education Commission(No.21CGA32).
文摘Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of learning the structure. In most cases, the score of Bayesian network is defined as adding the log-likelihood score and complexity score by using the penalty function. If the penalty function is set unreasonably, it may hurt the performance of structure search. Thus, Bayesian network structure learning is essentially a bi-objective optimization problem. However, the existing bi-objective structure learning algorithms can only be applied to small-scale networks. To this end, this paper proposes a bi-objective evolutionary Bayesian network structure learning algorithm via skeleton constraint (BBS) for the medium-scale networks. To boost the performance of searching, BBS introduces the random order prior (ROP) initial operator. ROP generates a skeleton to constrain the searching space, which is the key to expanding the scale of structure learning problems. Then, the acyclic structures are guaranteed by adding the orders of variables in the initial skeleton. After that, BBS designs the Pareto rank based crossover and skeleton guided mutation operators. The operators operate on the skeleton obtained in ROP to make the search more targeted. Finally, BBS provides a strategy to choose the final solution. The experimental results show that BBS can always find the structure which is closer to the ground truth compared with the single-objective structure learning methods. Furthermore, compared with the existing bi-objective structure learning methods, BBS is scalable and can be applied to medium-scale Bayesian network datasets. On the educational problem of discovering the influencing factors of students’ academic performance, BBS provides higher quality solutions and is featured with the flexibility of solution selection compared with the widely-used Bayesian network structure learning methods.
基金Supported by the National Natural Science Foundation of China(61403290,11301408,11401454)the Foundation for Youths of Shaanxi Province(2014JQ1020)+1 种基金the Foundation of Baoji City(2013R7-3)the Foundation of Baoji University of Arts and Sciences(ZK15081)
文摘Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms for this problem. Considering the unreliability of high order condition independence(CI) tests, and to improve the efficiency of a dependency analysis algorithm, the key steps are to use few numbers of CI tests and reduce the sizes of conditioning sets as much as possible. Based on these reasons and inspired by the algorithm PC, we present an algorithm, named fast and efficient PC(FEPC), for learning the adjacent neighbourhood of every variable. FEPC implements the CI tests by three kinds of orders, which reduces the high order CI tests significantly. Compared with current algorithm proposals, the experiment results show that FEPC has better accuracy with fewer numbers of condition independence tests and smaller size of conditioning sets. The highest reduction percentage of CI test is 83.3% by EFPC compared with PC algorithm.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 19831010 and10001008) the National Excellent Youth Science Foundation of China.
文摘impson's paradox reminds people that the statistical inference in a low-dimensional space probably distorts the reality in a high one seriously.To study the paradox with respect to Yule's measure,this paper discusses simple collapsibility,strong collapsibility and consecutive collapsibility,and presents necessary and sufficient conditions of them.In fact,these conditions are of great importance for observational and experimental designs,eliminating confounding bias,categorizing discrete variables and so on.