It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all...It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.展开更多
The original Bell inequality was obtained in a statistical derivation assuming three mutually cross-correlated random variables (four in the later version). Given that observations destroy the particles, the physical ...The original Bell inequality was obtained in a statistical derivation assuming three mutually cross-correlated random variables (four in the later version). Given that observations destroy the particles, the physical realization of three variables from an experiment producing two particles per trial requires two separate trial runs. One assumed variable value (for particle 1) occurs at a fixed instrument setting in both trial runs while a second variable (for particle 2) occurs at alternative instrument settings in the two trial runs. Given that measurements on the two particles occurring in each trial are themselves correlated, measurements from independent realizations at mutually exclusive settings on particle 2 are conditionally independent, i.e., conditionally dependent on particle 1, through probability. This situation is realized from variables defined by Bell using entangled particle pairs. Two correlations have the form that Bell computed from entanglement, but a third correlation from conditionally independent measurements has a different form. When the correlations are computed using quantum probabilities, the Bell inequality is satisfied without recourse to assumptions of non-locality, or non-reality.展开更多
In social network analysis, logistic regression models have been widely used to establish the relationship between the response variable and covariates. However, such models often require the network relationships to ...In social network analysis, logistic regression models have been widely used to establish the relationship between the response variable and covariates. However, such models often require the network relationships to be mutually independent, after controlling for a set of covariates. To assess the validity of this assumption,we propose test statistics, under the logistic regression setting, for three important social network drivers. They are, respectively, reciprocity, centrality, and transitivity. The asymptotic distributions of those test statistics are obtained. Extensive simulation studies are also presented to demonstrate their finite sample performance and usefulness.展开更多
When learning the structure of a Bayesian network,the search space expands significantly as the network size and the number of nodes increase,leading to a noticeable decrease in algorithm efficiency.Traditional constr...When learning the structure of a Bayesian network,the search space expands significantly as the network size and the number of nodes increase,leading to a noticeable decrease in algorithm efficiency.Traditional constraint-based methods typically rely on the results of conditional independence tests.However,excessive reliance on these test results can lead to a series of problems,including increased computational complexity and inaccurate results,especially when dealing with large-scale networks where performance bottlenecks are particularly evident.To overcome these challenges,we propose a Markov blanket discovery algorithm based on constrained local neighborhoods for constructing undirected independence graphs.This method uses the Markov blanket discovery algorithm to refine the constraints in the initial search space,sets an appropriate constraint radius,thereby reducing the initial computational cost of the algorithm and effectively narrowing the initial solution range.Specifically,the method first determines the local neighborhood space to limit the search range,thereby reducing the number of possible graph structures that need to be considered.This process not only improves the accuracy of the search space constraints but also significantly reduces the number of conditional independence tests.By performing conditional independence tests within the local neighborhood of each node,the method avoids comprehensive tests across the entire network,greatly reducing computational complexity.At the same time,the setting of the constraint radius further improves computational efficiency while ensuring accuracy.Compared to other algorithms,this method can quickly and efficiently construct undirected independence graphs while maintaining high accuracy.Experimental simulation results show that,this method has significant advantages in obtaining the structure of undirected independence graphs,not only maintaining an accuracy of over 96%but also reducing the number of conditional independence tests by at least 50%.This significant performance improvement is due to the effective constraint on the search space and the fine control of computational costs.展开更多
The conditional kernel correlation is proposed to measure the relationship between two random variables under covariates for multivariate data.Relying on the framework of reproducing kernel Hilbert spaces,we give the ...The conditional kernel correlation is proposed to measure the relationship between two random variables under covariates for multivariate data.Relying on the framework of reproducing kernel Hilbert spaces,we give the definitions of the conditional kernel covariance and conditional kernel correlation.We also provide their respective sample estimators and give the asymptotic properties,which help us construct a conditional independence test.According to the numerical results,the proposed test is more effective compared to the existing one under the considered scenarios.A real data is further analyzed to illustrate the efficacy of the proposed method.展开更多
Resources potential assessment is one of the fields in geosciences,which is able to take great advantage of GIS technology as a substitution of traditional working methods.The gold resources potential in the eastern K...Resources potential assessment is one of the fields in geosciences,which is able to take great advantage of GIS technology as a substitution of traditional working methods.The gold resources potential in the eastern Kunlun Mountains,Qinghai Province,China was assessed by combining weights-of-evidence model with GIS spatial analysis technique.All the data sets used in this paper were derived from an established multi-source geological spatial database,which contains geological,geophysical,geochemical and remote sensing data.Three multi-class variables,i.e.,structural intersection,Indosinian k-feldspar granite and regional fault,were used in proximity analysis to examine their spatial association with known gold deposits.A prospectivity map was produced by weights-of-evidence model based on seven binary evidential maps,all of which had passed a conditional independence test.The study area was divided into three target zones of high potential,moderate potential and low potential areas,among which high potential areas and moderate potential areas accounted for 20% of the total area and contained 32 of the 43 gold deposits.The results show that the gold resources potential assessment in the eastern Kunlun Mountains has a higher precision.展开更多
This paper discusses the application of the model in predicting for hydrothermal Cu, Ag, Au and Pb-Zn occurrences in northwestern Yunnan. Geochemical, lineament and lithology data were the selected recognition criteri...This paper discusses the application of the model in predicting for hydrothermal Cu, Ag, Au and Pb-Zn occurrences in northwestern Yunnan. Geochemical, lineament and lithology data were the selected recognition criteria. The mentioned criteria varied against 75 known hydrothermal occurrences; the geochemical data had a weight of (W^+= 1. 209 7, W^- =-0. 748 1) being the maximum among the three and the rest lineament and lithology have (W^+= 0.7424, W^-= -0.449 6), (W^+= 0.378 7,W^-=-0.6243) respectively. The application was successful since the predicted results covers about 70% of the known deposits and predicted unknown areas.展开更多
The learning Bayesian network (BN) structure from data is an NP-hard problem and still one of the most exciting chal- lenges in the machine learning. In this work, a novel algorithm is presented which combines ideas...The learning Bayesian network (BN) structure from data is an NP-hard problem and still one of the most exciting chal- lenges in the machine learning. In this work, a novel algorithm is presented which combines ideas from local learning, constraint- based, and search-and-score techniques in a principled and ef- fective way. It first reconstructs the junction tree of a BN and then performs a K2-scoring greedy search to orientate the local edges in the cliques of junction tree. Theoretical and experimental results show the proposed algorithm is capable of handling networks with a large number of variables. Its comparison with the well-known K2 algorithm is also presented.展开更多
We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257...We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257], we present algebraic-geometry characterizations of these two problems, and propose two corresponding algorithms. These algorithms can be realized with any computer algebra system when the number of variables is small.展开更多
In this paper, a module level fault diagnosis method is presented which considers multi-port device or subnetwork as the basic unit. The fault model in this method is quite similar to an actual condition,hence it has ...In this paper, a module level fault diagnosis method is presented which considers multi-port device or subnetwork as the basic unit. The fault model in this method is quite similar to an actual condition,hence it has practical meaning. The equations of moedule level fault diagnosis are derived, and thetestability problem for module-fault diagnosis is discussed in general. The paper then gives severaltoplolgical conditions for module-fault testubility, which are applicable to a general nonreciprocal network by introducing a generalized independent path.展开更多
t The rapid advancement of single-cell technologies has shed new light on the complex mechanisms of cellular heterogeneity.However,compared to bulk RNA sequencing(RNA-seq),single-cell RNA-seq(scRNA-seq)suffers from hi...t The rapid advancement of single-cell technologies has shed new light on the complex mechanisms of cellular heterogeneity.However,compared to bulk RNA sequencing(RNA-seq),single-cell RNA-seq(scRNA-seq)suffers from higher noise and lower coverage,which brings new computational difficulties.Based on statistical independence,cell-specific network(CSN)is able to quantify the overall associations between genes for each cell,yet suffering from a problem of overestimation related to indirect effects.To overcome this problem,we propose the c-CSN method,which can construct the conditional cell-specific network(CCSN)for each cell.c-CSN method can measure the direct associations between genes by eliminating the indirect associations.c-CSN can be used for cell clustering and dimension reduction on a network basis of single cells.Intuitively,each CCSN can be viewed as the transformation from less“reliable”gene expression to more“reliable”gene–gene associations in a cell.Based on CCSN,we further design network flow entropy(NFE)to estimate the differentiation potency of a single cell.A number of scRNA-seq datasets were used to demonstrate the advantages of our approach.1)One direct association network is generated for one cell.2)Most existing scRNA-seq methods designed for gene expression matrices are also applicable to c-CSN-transformed degree matrices.3)CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell.c-CSN is publicly available at https://github.com/LinLi-0909/c-CSN.展开更多
Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual inform...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.展开更多
Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of l...Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of learning the structure. In most cases, the score of Bayesian network is defined as adding the log-likelihood score and complexity score by using the penalty function. If the penalty function is set unreasonably, it may hurt the performance of structure search. Thus, Bayesian network structure learning is essentially a bi-objective optimization problem. However, the existing bi-objective structure learning algorithms can only be applied to small-scale networks. To this end, this paper proposes a bi-objective evolutionary Bayesian network structure learning algorithm via skeleton constraint (BBS) for the medium-scale networks. To boost the performance of searching, BBS introduces the random order prior (ROP) initial operator. ROP generates a skeleton to constrain the searching space, which is the key to expanding the scale of structure learning problems. Then, the acyclic structures are guaranteed by adding the orders of variables in the initial skeleton. After that, BBS designs the Pareto rank based crossover and skeleton guided mutation operators. The operators operate on the skeleton obtained in ROP to make the search more targeted. Finally, BBS provides a strategy to choose the final solution. The experimental results show that BBS can always find the structure which is closer to the ground truth compared with the single-objective structure learning methods. Furthermore, compared with the existing bi-objective structure learning methods, BBS is scalable and can be applied to medium-scale Bayesian network datasets. On the educational problem of discovering the influencing factors of students’ academic performance, BBS provides higher quality solutions and is featured with the flexibility of solution selection compared with the widely-used Bayesian network structure learning methods.展开更多
Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linea...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linear and with Gaussian noise. Although additive model regression can effectively infer the nonlinear causal relationships of additive nonlinear time series, it suffers from the limitation that contemporaneous causal relationships of variables must be linear and not always valid to test conditional independence relations. This paper provides a nonparametric method that employs both mutual information and conditional mutual information to identify causal structure of a class of nonlinear time series models, which extends the additive nonlinear times series to nonlinear structural vector autoregressive models. An algorithm is developed to learn the contemporaneous and the lagged causal relationships of variables. Simulations demonstrate the effectiveness of the nroosed method.展开更多
The conditional independence structure of a common probability measure is a structural model. In this paper, we solve an open problem posed by Studeny [Probabilistic Conditional Independence Structures, Theme 9, p. 20...The conditional independence structure of a common probability measure is a structural model. In this paper, we solve an open problem posed by Studeny [Probabilistic Conditional Independence Structures, Theme 9, p. 206]. A new approach is proposed to decompose a directed acyclic graph and its optimal properties are studied. We interpret this approach from the perspective of the decomposition of the corresponding conditional independence model and provide an algorithm for identifying the maximal prime subgraphs in a directed acyclic graph.展开更多
Simpson' s paradox reminds people that the statistical inference in a low-dimensional space probably distorts the reality in a high one seriously.To study the paradox with respect to Yule's measure, this paper...Simpson' s paradox reminds people that the statistical inference in a low-dimensional space probably distorts the reality in a high one seriously.To study the paradox with respect to Yule's measure, this paper discusses simple collapsibility, strong collapsibility and consecutive collapsibility, and presents necessary and sufficient conditions of them.In fact, these conditions are of great importance for observational and experimental designs, eliminating confounding bias, categorizing discrete variables and so on.展开更多
Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms fo...Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms for this problem. Considering the unreliability of high order condition independence(CI) tests, and to improve the efficiency of a dependency analysis algorithm, the key steps are to use few numbers of CI tests and reduce the sizes of conditioning sets as much as possible. Based on these reasons and inspired by the algorithm PC, we present an algorithm, named fast and efficient PC(FEPC), for learning the adjacent neighbourhood of every variable. FEPC implements the CI tests by three kinds of orders, which reduces the high order CI tests significantly. Compared with current algorithm proposals, the experiment results show that FEPC has better accuracy with fewer numbers of condition independence tests and smaller size of conditioning sets. The highest reduction percentage of CI test is 83.3% by EFPC compared with PC algorithm.展开更多
Inferring gene regulatory networks (GRNs) is a challenging task in Bioinformatics. In this paper, an algorithm, PCHMS, is introduced to infer GRNs. This method applies the path consistency (PC) algorithm based on ...Inferring gene regulatory networks (GRNs) is a challenging task in Bioinformatics. In this paper, an algorithm, PCHMS, is introduced to infer GRNs. This method applies the path consistency (PC) algorithm based on conditional mutual information test (PCA-CMI). In the PC-based algorithms the separator set is determined to detect the dependency between variables. The PCHMS algorithm attempts to select the set in the smart way. For this purpose, the edges of resulted skeleton are directed based on PC algorithm direction rule and mutual information test (MIT) score. Then the separator set is selected according to the directed network by considering a suitable sequential order of genes. The effectiveness of this method is benchmarked through several networks from the DREAM challenge and the widely used SOS DNA repair network of Escherichia coll. Results show that applying the PCHMS algorithm improves the precision of learning the structure of the GRNs in comparison with current popular approaches.展开更多
基金supported by the Fundamental Research Funds for the Central Universities(17CX02035A)supported by NNSF of China(11601197,11461029,61563018)+2 种基金China Postdoctoral Science Foundation funded project(2016M600511,2017T100475)NSF of Jiangxi Province(20171ACB21030,20161BAB201024,20161ACB200009)the Key Science Fund Project of Jiangxi provincial education department(GJJ150439)
文摘It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.
文摘The original Bell inequality was obtained in a statistical derivation assuming three mutually cross-correlated random variables (four in the later version). Given that observations destroy the particles, the physical realization of three variables from an experiment producing two particles per trial requires two separate trial runs. One assumed variable value (for particle 1) occurs at a fixed instrument setting in both trial runs while a second variable (for particle 2) occurs at alternative instrument settings in the two trial runs. Given that measurements on the two particles occurring in each trial are themselves correlated, measurements from independent realizations at mutually exclusive settings on particle 2 are conditionally independent, i.e., conditionally dependent on particle 1, through probability. This situation is realized from variables defined by Bell using entangled particle pairs. Two correlations have the form that Bell computed from entanglement, but a third correlation from conditionally independent measurements has a different form. When the correlations are computed using quantum probabilities, the Bell inequality is satisfied without recourse to assumptions of non-locality, or non-reality.
文摘In social network analysis, logistic regression models have been widely used to establish the relationship between the response variable and covariates. However, such models often require the network relationships to be mutually independent, after controlling for a set of covariates. To assess the validity of this assumption,we propose test statistics, under the logistic regression setting, for three important social network drivers. They are, respectively, reciprocity, centrality, and transitivity. The asymptotic distributions of those test statistics are obtained. Extensive simulation studies are also presented to demonstrate their finite sample performance and usefulness.
基金This work is supported by the National Natural Science Foundation of China(62262016,61961160706,62231010)14th Five-Year Plan Civil Aerospace Technology Preliminary Research Project(D040405)the National Key Laboratory Foundation 2022-JCJQ-LB-006(Grant No.6142411212201).
文摘When learning the structure of a Bayesian network,the search space expands significantly as the network size and the number of nodes increase,leading to a noticeable decrease in algorithm efficiency.Traditional constraint-based methods typically rely on the results of conditional independence tests.However,excessive reliance on these test results can lead to a series of problems,including increased computational complexity and inaccurate results,especially when dealing with large-scale networks where performance bottlenecks are particularly evident.To overcome these challenges,we propose a Markov blanket discovery algorithm based on constrained local neighborhoods for constructing undirected independence graphs.This method uses the Markov blanket discovery algorithm to refine the constraints in the initial search space,sets an appropriate constraint radius,thereby reducing the initial computational cost of the algorithm and effectively narrowing the initial solution range.Specifically,the method first determines the local neighborhood space to limit the search range,thereby reducing the number of possible graph structures that need to be considered.This process not only improves the accuracy of the search space constraints but also significantly reduces the number of conditional independence tests.By performing conditional independence tests within the local neighborhood of each node,the method avoids comprehensive tests across the entire network,greatly reducing computational complexity.At the same time,the setting of the constraint radius further improves computational efficiency while ensuring accuracy.Compared to other algorithms,this method can quickly and efficiently construct undirected independence graphs while maintaining high accuracy.Experimental simulation results show that,this method has significant advantages in obtaining the structure of undirected independence graphs,not only maintaining an accuracy of over 96%but also reducing the number of conditional independence tests by at least 50%.This significant performance improvement is due to the effective constraint on the search space and the fine control of computational costs.
基金partially supported by Knowledge Innovation Program of Hubei Province(No.2019CFB810)partially supported by NSFC(No.12325110)the CAS Project for Young Scientists in Basic Research(No.YSBR-034)。
文摘The conditional kernel correlation is proposed to measure the relationship between two random variables under covariates for multivariate data.Relying on the framework of reproducing kernel Hilbert spaces,we give the definitions of the conditional kernel covariance and conditional kernel correlation.We also provide their respective sample estimators and give the asymptotic properties,which help us construct a conditional independence test.According to the numerical results,the proposed test is more effective compared to the existing one under the considered scenarios.A real data is further analyzed to illustrate the efficacy of the proposed method.
基金Under the auspices of National High-tech R & D Program of China(No.2007AA12Z227)National Natural Science Foundation of China(No.40701146)
文摘Resources potential assessment is one of the fields in geosciences,which is able to take great advantage of GIS technology as a substitution of traditional working methods.The gold resources potential in the eastern Kunlun Mountains,Qinghai Province,China was assessed by combining weights-of-evidence model with GIS spatial analysis technique.All the data sets used in this paper were derived from an established multi-source geological spatial database,which contains geological,geophysical,geochemical and remote sensing data.Three multi-class variables,i.e.,structural intersection,Indosinian k-feldspar granite and regional fault,were used in proximity analysis to examine their spatial association with known gold deposits.A prospectivity map was produced by weights-of-evidence model based on seven binary evidential maps,all of which had passed a conditional independence test.The study area was divided into three target zones of high potential,moderate potential and low potential areas,among which high potential areas and moderate potential areas accounted for 20% of the total area and contained 32 of the 43 gold deposits.The results show that the gold resources potential assessment in the eastern Kunlun Mountains has a higher precision.
文摘This paper discusses the application of the model in predicting for hydrothermal Cu, Ag, Au and Pb-Zn occurrences in northwestern Yunnan. Geochemical, lineament and lithology data were the selected recognition criteria. The mentioned criteria varied against 75 known hydrothermal occurrences; the geochemical data had a weight of (W^+= 1. 209 7, W^- =-0. 748 1) being the maximum among the three and the rest lineament and lithology have (W^+= 0.7424, W^-= -0.449 6), (W^+= 0.378 7,W^-=-0.6243) respectively. The application was successful since the predicted results covers about 70% of the known deposits and predicted unknown areas.
基金supported by the National Natural Science Fundation of China (6097408261075055)the Fundamental Research Funds for the Central Universities (K50510700004)
文摘The learning Bayesian network (BN) structure from data is an NP-hard problem and still one of the most exciting chal- lenges in the machine learning. In this work, a novel algorithm is presented which combines ideas from local learning, constraint- based, and search-and-score techniques in a principled and ef- fective way. It first reconstructs the junction tree of a BN and then performs a K2-scoring greedy search to orientate the local edges in the cliques of junction tree. Theoretical and experimental results show the proposed algorithm is capable of handling networks with a large number of variables. Its comparison with the well-known K2 algorithm is also presented.
基金The authors wish to thank the referees for very helpful comments which greatly improved the presentation of this paper. This work was partially supported by the National Natural Science Foundation of China (Grant No. 11025102), Program for Changjiang Scholars and Innovative Research Team in University, and the Jilin Project (20100401).
文摘We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257], we present algebraic-geometry characterizations of these two problems, and propose two corresponding algorithms. These algorithms can be realized with any computer algebra system when the number of variables is small.
文摘In this paper, a module level fault diagnosis method is presented which considers multi-port device or subnetwork as the basic unit. The fault model in this method is quite similar to an actual condition,hence it has practical meaning. The equations of moedule level fault diagnosis are derived, and thetestability problem for module-fault diagnosis is discussed in general. The paper then gives severaltoplolgical conditions for module-fault testubility, which are applicable to a general nonreciprocal network by introducing a generalized independent path.
基金the National Key R&D Program of China(Grant No.2017YFA0505500)the National Natural Science Foundation of China(Grant Nos.31771476 and 31930022)the Shanghai Municipal Science and Technology Major Project,China(Grant No.2017SHZDZX01).
文摘t The rapid advancement of single-cell technologies has shed new light on the complex mechanisms of cellular heterogeneity.However,compared to bulk RNA sequencing(RNA-seq),single-cell RNA-seq(scRNA-seq)suffers from higher noise and lower coverage,which brings new computational difficulties.Based on statistical independence,cell-specific network(CSN)is able to quantify the overall associations between genes for each cell,yet suffering from a problem of overestimation related to indirect effects.To overcome this problem,we propose the c-CSN method,which can construct the conditional cell-specific network(CCSN)for each cell.c-CSN method can measure the direct associations between genes by eliminating the indirect associations.c-CSN can be used for cell clustering and dimension reduction on a network basis of single cells.Intuitively,each CCSN can be viewed as the transformation from less“reliable”gene expression to more“reliable”gene–gene associations in a cell.Based on CCSN,we further design network flow entropy(NFE)to estimate the differentiation potency of a single cell.A number of scRNA-seq datasets were used to demonstrate the advantages of our approach.1)One direct association network is generated for one cell.2)Most existing scRNA-seq methods designed for gene expression matrices are also applicable to c-CSN-transformed degree matrices.3)CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell.c-CSN is publicly available at https://github.com/LinLi-0909/c-CSN.
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150, 10926197,61201323
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.
基金supported by the Fundamental Research Funds for the Central Universities,the Science and Technology Commission of Shanghai Municipality(No.19511120601)the Scientific and Technological Innovation 2030 Major Projects(No.2018AAA0100902)+1 种基金the CCF-AFSG Research Fund(No.CCF-AFSG RF20220205)the“Chenguang Program”sponsored by Shanghai Education Development Foundation and Shanghai Municipal Education Commission(No.21CGA32).
文摘Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of learning the structure. In most cases, the score of Bayesian network is defined as adding the log-likelihood score and complexity score by using the penalty function. If the penalty function is set unreasonably, it may hurt the performance of structure search. Thus, Bayesian network structure learning is essentially a bi-objective optimization problem. However, the existing bi-objective structure learning algorithms can only be applied to small-scale networks. To this end, this paper proposes a bi-objective evolutionary Bayesian network structure learning algorithm via skeleton constraint (BBS) for the medium-scale networks. To boost the performance of searching, BBS introduces the random order prior (ROP) initial operator. ROP generates a skeleton to constrain the searching space, which is the key to expanding the scale of structure learning problems. Then, the acyclic structures are guaranteed by adding the orders of variables in the initial skeleton. After that, BBS designs the Pareto rank based crossover and skeleton guided mutation operators. The operators operate on the skeleton obtained in ROP to make the search more targeted. Finally, BBS provides a strategy to choose the final solution. The experimental results show that BBS can always find the structure which is closer to the ground truth compared with the single-objective structure learning methods. Furthermore, compared with the existing bi-objective structure learning methods, BBS is scalable and can be applied to medium-scale Bayesian network datasets. On the educational problem of discovering the influencing factors of students’ academic performance, BBS provides higher quality solutions and is featured with the flexibility of solution selection compared with the widely-used Bayesian network structure learning methods.
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150 and 10926197
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linear and with Gaussian noise. Although additive model regression can effectively infer the nonlinear causal relationships of additive nonlinear time series, it suffers from the limitation that contemporaneous causal relationships of variables must be linear and not always valid to test conditional independence relations. This paper provides a nonparametric method that employs both mutual information and conditional mutual information to identify causal structure of a class of nonlinear time series models, which extends the additive nonlinear times series to nonlinear structural vector autoregressive models. An algorithm is developed to learn the contemporaneous and the lagged causal relationships of variables. Simulations demonstrate the effectiveness of the nroosed method.
文摘The conditional independence structure of a common probability measure is a structural model. In this paper, we solve an open problem posed by Studeny [Probabilistic Conditional Independence Structures, Theme 9, p. 206]. A new approach is proposed to decompose a directed acyclic graph and its optimal properties are studied. We interpret this approach from the perspective of the decomposition of the corresponding conditional independence model and provide an algorithm for identifying the maximal prime subgraphs in a directed acyclic graph.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 19831010 and10001008) the National Excellent Youth Science Foundation of China.
文摘Simpson' s paradox reminds people that the statistical inference in a low-dimensional space probably distorts the reality in a high one seriously.To study the paradox with respect to Yule's measure, this paper discusses simple collapsibility, strong collapsibility and consecutive collapsibility, and presents necessary and sufficient conditions of them.In fact, these conditions are of great importance for observational and experimental designs, eliminating confounding bias, categorizing discrete variables and so on.
基金Supported by the National Natural Science Foundation of China(61403290,11301408,11401454)the Foundation for Youths of Shaanxi Province(2014JQ1020)+1 种基金the Foundation of Baoji City(2013R7-3)the Foundation of Baoji University of Arts and Sciences(ZK15081)
文摘Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms for this problem. Considering the unreliability of high order condition independence(CI) tests, and to improve the efficiency of a dependency analysis algorithm, the key steps are to use few numbers of CI tests and reduce the sizes of conditioning sets as much as possible. Based on these reasons and inspired by the algorithm PC, we present an algorithm, named fast and efficient PC(FEPC), for learning the adjacent neighbourhood of every variable. FEPC implements the CI tests by three kinds of orders, which reduces the high order CI tests significantly. Compared with current algorithm proposals, the experiment results show that FEPC has better accuracy with fewer numbers of condition independence tests and smaller size of conditioning sets. The highest reduction percentage of CI test is 83.3% by EFPC compared with PC algorithm.
文摘Inferring gene regulatory networks (GRNs) is a challenging task in Bioinformatics. In this paper, an algorithm, PCHMS, is introduced to infer GRNs. This method applies the path consistency (PC) algorithm based on conditional mutual information test (PCA-CMI). In the PC-based algorithms the separator set is determined to detect the dependency between variables. The PCHMS algorithm attempts to select the set in the smart way. For this purpose, the edges of resulted skeleton are directed based on PC algorithm direction rule and mutual information test (MIT) score. Then the separator set is selected according to the directed network by considering a suitable sequential order of genes. The effectiveness of this method is benchmarked through several networks from the DREAM challenge and the widely used SOS DNA repair network of Escherichia coll. Results show that applying the PCHMS algorithm improves the precision of learning the structure of the GRNs in comparison with current popular approaches.