Test of independence between random vectors X and Y is an essential task in statistical inference.One type of testing methods is based on the minimal spanning tree of variables X and Y.The main idea is to generate the...Test of independence between random vectors X and Y is an essential task in statistical inference.One type of testing methods is based on the minimal spanning tree of variables X and Y.The main idea is to generate the minimal spanning tree for one random vector X,and for each edges in minimal spanning tree,the corresponding rank number can be calculated based on another random vector Y.The resulting test statistics are constructed by these rank numbers.However,the existed statistics are not symmetrical tests about the random vectors X and Y such that the power performance from minimal spanning tree of X is not the same as that from minimal spanning tree of Y.In addition,the conclusion from minimal spanning tree of X might conflict with that from minimal spanning tree of Y.In order to solve these problems,we propose several symmetrical independence tests for X and Y.The exact distributions of test statistics are investigated when the sample size is small.Also,we study the asymptotic properties of the statistics.A permutation method is introduced for getting critical values of the statistics.Compared with the existing methods,our proposed methods are more efficient demonstrated by numerical analysis.展开更多
Next to excessive nutrient loading,intensive aquaculture is one of the major anthropogenic impacts threatening lake ecosystems.In China,particularly in the shallow lakes of mid-lower Changjiang(Yangtze) River,continuo...Next to excessive nutrient loading,intensive aquaculture is one of the major anthropogenic impacts threatening lake ecosystems.In China,particularly in the shallow lakes of mid-lower Changjiang(Yangtze) River,continuous overstocking of the Chinese mitten crab(Eriocheir sinensis) could deteriorate water quality and exhaust natural resources.A series of crab yield models and a general optimum-stocking rate model have been established,which seek to benefit both crab culture and the environment.In this research,independent investigations were carried out to evaluate the crab yield models and modify the optimum-stocking model.Low percentage errors(average 47%,median 36%) between observed and calculated crab yields were obtained.Specific values were defined for adult crab body mass(135 g/ind.) and recapture rate(18%and 30%in lakes with submerged macrophyte biomass above and below 1 000 g/m^2)to modify the optimum-stocking model.Analysis based on the modified optimum-stocking model indicated that the actual stocking rates in most lakes were much higher than the calculated optimum-stocking rates.This implies that,for most lakes,the current stocking rates should be greatly reduced to maintain healthy lake ecosystems.展开更多
Cui and Zhong(2019),(Computational Statistics&Data Analysis,139,117–133)proposed a test based on the mean variance(MV)index to test independence between a categorical random variable Y with R categories and a con...Cui and Zhong(2019),(Computational Statistics&Data Analysis,139,117–133)proposed a test based on the mean variance(MV)index to test independence between a categorical random variable Y with R categories and a continuous random variable X.They ingeniously proved the asymptotic normality of the MV test statistic when R diverges to infinity,which brings many merits to the MV test,including making it more convenient for independence testing when R is large.This paper considers a new test called the integral Pearson chi-square(IPC)test,whose test statistic can be viewed as a modified MV test statistic.A central limit theorem of the martin-gale difference is used to show that the asymptotic null distribution of the standardized IPC test statistic when R is diverging is also a normal distribution,rendering the IPC test sharing many merits with the MV test.As an application of such a theoretical finding,the IPC test is extended to test independence between continuous random variables.The finite sample performance of the proposed test is assessed by Monte Carlo simulations,and a real data example is presented for illustration.展开更多
This paper proposes the corrected likelihood ratio test (LRT) and large-dimensional trace criterion to test the independence of two large sets of multivariate variables of dimensions P1 and P2 when the dimensions P ...This paper proposes the corrected likelihood ratio test (LRT) and large-dimensional trace criterion to test the independence of two large sets of multivariate variables of dimensions P1 and P2 when the dimensions P = P1 + P2 and the sample size n tend to infinity simultaneously and proportionally. Both theoretical and simulation results demonstrate that the traditional X2 approximation of the LRT performs poorly when the dimension p is large relative to the sample size n, while the corrected LRT and large-dimensional trace criterion behave well when the dimension is either small or large relative to the sample size. Moreover, the trace criterion can be used in the case of p 〉 n, while the corrected LRT is unfeasible due to the loss of definition.展开更多
B ased on the data of phytoplankton and environmental factors in the Bohai Bay, the dependence between the concentration of phytoplankton and environmental factors is analysed by linear correlation coefficient, rank c...B ased on the data of phytoplankton and environmental factors in the Bohai Bay, the dependence between the concentration of phytoplankton and environmental factors is analysed by linear correlation coefficient, rank correlation coefficient and Hoeffding test of independence .The result shows that wind-speed, air-pressure, surface temperature, field pH, salinity, DO, silicate and NO3- have a great impact on the concentration of phytoplankton.展开更多
In this thesis,we construct test statistic for association test and independence test in high dimension,respectively,and study the corresponding theoretical properties under some regularity conditions.Meanwhile,we pro...In this thesis,we construct test statistic for association test and independence test in high dimension,respectively,and study the corresponding theoretical properties under some regularity conditions.Meanwhile,we propose a nonparametric variable screening procedure for sparse additive model with multivariate response in untra-high dimension and established some screening properties.展开更多
In this paper,we propose a new numerical scheme for the coupled Stokes-Darcy model with the Beavers-Joseph-Saffman interface condition.We use the weak Galerkin method to discretize the Stokes equation and the mixed fi...In this paper,we propose a new numerical scheme for the coupled Stokes-Darcy model with the Beavers-Joseph-Saffman interface condition.We use the weak Galerkin method to discretize the Stokes equation and the mixed finite element method to discretize the Darcy equation.A discrete inf-sup condition is proved and the optimal error estimates are also derived.Numerical experiments validate the theoretical analysis.展开更多
Pseudo-random number generators have always been important in experimental design, computer simulation, cryptography and statistical analysis. This paper presents a method of comparing the degree of independence exhib...Pseudo-random number generators have always been important in experimental design, computer simulation, cryptography and statistical analysis. This paper presents a method of comparing the degree of independence exhibited by various random number generators, a procedure, based on consideration of the largest (in modulus) non-unit eigenvalue of the observed Markov transition matrix, is used to assess the 'randomness' of a random number generator.展开更多
Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of l...Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of learning the structure. In most cases, the score of Bayesian network is defined as adding the log-likelihood score and complexity score by using the penalty function. If the penalty function is set unreasonably, it may hurt the performance of structure search. Thus, Bayesian network structure learning is essentially a bi-objective optimization problem. However, the existing bi-objective structure learning algorithms can only be applied to small-scale networks. To this end, this paper proposes a bi-objective evolutionary Bayesian network structure learning algorithm via skeleton constraint (BBS) for the medium-scale networks. To boost the performance of searching, BBS introduces the random order prior (ROP) initial operator. ROP generates a skeleton to constrain the searching space, which is the key to expanding the scale of structure learning problems. Then, the acyclic structures are guaranteed by adding the orders of variables in the initial skeleton. After that, BBS designs the Pareto rank based crossover and skeleton guided mutation operators. The operators operate on the skeleton obtained in ROP to make the search more targeted. Finally, BBS provides a strategy to choose the final solution. The experimental results show that BBS can always find the structure which is closer to the ground truth compared with the single-objective structure learning methods. Furthermore, compared with the existing bi-objective structure learning methods, BBS is scalable and can be applied to medium-scale Bayesian network datasets. On the educational problem of discovering the influencing factors of students’ academic performance, BBS provides higher quality solutions and is featured with the flexibility of solution selection compared with the widely-used Bayesian network structure learning methods.展开更多
Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms fo...Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms for this problem. Considering the unreliability of high order condition independence(CI) tests, and to improve the efficiency of a dependency analysis algorithm, the key steps are to use few numbers of CI tests and reduce the sizes of conditioning sets as much as possible. Based on these reasons and inspired by the algorithm PC, we present an algorithm, named fast and efficient PC(FEPC), for learning the adjacent neighbourhood of every variable. FEPC implements the CI tests by three kinds of orders, which reduces the high order CI tests significantly. Compared with current algorithm proposals, the experiment results show that FEPC has better accuracy with fewer numbers of condition independence tests and smaller size of conditioning sets. The highest reduction percentage of CI test is 83.3% by EFPC compared with PC algorithm.展开更多
This paper describes a brain-inspired simultaneous localization and mapping (SLAM) system using oriented features from accelerated segment test and rotated binary robust independent elementary (ORB) features of R...This paper describes a brain-inspired simultaneous localization and mapping (SLAM) system using oriented features from accelerated segment test and rotated binary robust independent elementary (ORB) features of RGB (red, green, blue) sensor for a mobile robot. The core SLAM system, dubbed RatSLAM, can construct a cognitive map using information of raw odometry and visual scenes in the path traveled. Different from existing RatSLAM system which only uses a simple vector to represent features of visual image, in this paper, we employ an efficient and very fast descriptor method, called ORB, to extract features from RCB images. Experiments show that these features are suitable to recognize the sequences of familiar visual scenes. Thus, while loop closure errors are detected, the descriptive features will help to modify the pose estimation by driving loop closure and localization in a map correction algorithm. Efficiency and robustness of our method are also demonstrated by comparing with different visual processing algorithms.展开更多
Inferring gene regulatory networks (GRNs) is a challenging task in Bioinformatics. In this paper, an algorithm, PCHMS, is introduced to infer GRNs. This method applies the path consistency (PC) algorithm based on ...Inferring gene regulatory networks (GRNs) is a challenging task in Bioinformatics. In this paper, an algorithm, PCHMS, is introduced to infer GRNs. This method applies the path consistency (PC) algorithm based on conditional mutual information test (PCA-CMI). In the PC-based algorithms the separator set is determined to detect the dependency between variables. The PCHMS algorithm attempts to select the set in the smart way. For this purpose, the edges of resulted skeleton are directed based on PC algorithm direction rule and mutual information test (MIT) score. Then the separator set is selected according to the directed network by considering a suitable sequential order of genes. The effectiveness of this method is benchmarked through several networks from the DREAM challenge and the widely used SOS DNA repair network of Escherichia coll. Results show that applying the PCHMS algorithm improves the precision of learning the structure of the GRNs in comparison with current popular approaches.展开更多
基金Beijing Natural Science Foundation(Grant No.Z200001)National Natural Science Foundation of China(Grant Nos.11871001,11971478 and 11971001)the Fundamental Research Funds for the Central Universities(Grant No.2019NTSS18)。
文摘Test of independence between random vectors X and Y is an essential task in statistical inference.One type of testing methods is based on the minimal spanning tree of variables X and Y.The main idea is to generate the minimal spanning tree for one random vector X,and for each edges in minimal spanning tree,the corresponding rank number can be calculated based on another random vector Y.The resulting test statistics are constructed by these rank numbers.However,the existed statistics are not symmetrical tests about the random vectors X and Y such that the power performance from minimal spanning tree of X is not the same as that from minimal spanning tree of Y.In addition,the conclusion from minimal spanning tree of X might conflict with that from minimal spanning tree of Y.In order to solve these problems,we propose several symmetrical independence tests for X and Y.The exact distributions of test statistics are investigated when the sample size is small.Also,we study the asymptotic properties of the statistics.A permutation method is introduced for getting critical values of the statistics.Compared with the existing methods,our proposed methods are more efficient demonstrated by numerical analysis.
基金Supported by the State Key Laboratory of Freshwater Ecology and Biotechnology(Nos.2014FB14,2011FBZ14)the Hubei Province(No.2001AA201A05)+2 种基金the National Basic Research Program of China(973Program)(No.2008CB418006)the Chinese Academy of Sciences(No.KZCX1-SW-12)supported by the Youth Innovation Association of Chinese Academy of Sciences(No.2014312)
文摘Next to excessive nutrient loading,intensive aquaculture is one of the major anthropogenic impacts threatening lake ecosystems.In China,particularly in the shallow lakes of mid-lower Changjiang(Yangtze) River,continuous overstocking of the Chinese mitten crab(Eriocheir sinensis) could deteriorate water quality and exhaust natural resources.A series of crab yield models and a general optimum-stocking rate model have been established,which seek to benefit both crab culture and the environment.In this research,independent investigations were carried out to evaluate the crab yield models and modify the optimum-stocking model.Low percentage errors(average 47%,median 36%) between observed and calculated crab yields were obtained.Specific values were defined for adult crab body mass(135 g/ind.) and recapture rate(18%and 30%in lakes with submerged macrophyte biomass above and below 1 000 g/m^2)to modify the optimum-stocking model.Analysis based on the modified optimum-stocking model indicated that the actual stocking rates in most lakes were much higher than the calculated optimum-stocking rates.This implies that,for most lakes,the current stocking rates should be greatly reduced to maintain healthy lake ecosystems.
基金National Natural Science Foundation of China[Grant numbers 12271286,11931001 and 11771241].
文摘Cui and Zhong(2019),(Computational Statistics&Data Analysis,139,117–133)proposed a test based on the mean variance(MV)index to test independence between a categorical random variable Y with R categories and a continuous random variable X.They ingeniously proved the asymptotic normality of the MV test statistic when R diverges to infinity,which brings many merits to the MV test,including making it more convenient for independence testing when R is large.This paper considers a new test called the integral Pearson chi-square(IPC)test,whose test statistic can be viewed as a modified MV test statistic.A central limit theorem of the martin-gale difference is used to show that the asymptotic null distribution of the standardized IPC test statistic when R is diverging is also a normal distribution,rendering the IPC test sharing many merits with the MV test.As an application of such a theoretical finding,the IPC test is extended to test independence between continuous random variables.The finite sample performance of the proposed test is assessed by Monte Carlo simulations,and a real data example is presented for illustration.
基金supported by National Natural Science Foundation of China(Grant Nos.11101181,11171057,11171058 and 11071035)Research Fund for the Doctoral Program of Higher Education of China(Grant No.20110061120005)+1 种基金NECT-11-0616,PCSIRTthe Fundamental Research Funds for the Central Universities
文摘This paper proposes the corrected likelihood ratio test (LRT) and large-dimensional trace criterion to test the independence of two large sets of multivariate variables of dimensions P1 and P2 when the dimensions P = P1 + P2 and the sample size n tend to infinity simultaneously and proportionally. Both theoretical and simulation results demonstrate that the traditional X2 approximation of the LRT performs poorly when the dimension p is large relative to the sample size n, while the corrected LRT and large-dimensional trace criterion behave well when the dimension is either small or large relative to the sample size. Moreover, the trace criterion can be used in the case of p 〉 n, while the corrected LRT is unfeasible due to the loss of definition.
基金funded by the National Natural Science Foundation of China(grant NO.10472077)
文摘B ased on the data of phytoplankton and environmental factors in the Bohai Bay, the dependence between the concentration of phytoplankton and environmental factors is analysed by linear correlation coefficient, rank correlation coefficient and Hoeffding test of independence .The result shows that wind-speed, air-pressure, surface temperature, field pH, salinity, DO, silicate and NO3- have a great impact on the concentration of phytoplankton.
文摘In this thesis,we construct test statistic for association test and independence test in high dimension,respectively,and study the corresponding theoretical properties under some regularity conditions.Meanwhile,we propose a nonparametric variable screening procedure for sparse additive model with multivariate response in untra-high dimension and established some screening properties.
基金National Natural Science Foundation of China(Grant Nos.11901006 and 11601008)Natural Science Foundation of Anhui Province(Grant No.1908085QA06)。
文摘In this paper,we propose a new numerical scheme for the coupled Stokes-Darcy model with the Beavers-Joseph-Saffman interface condition.We use the weak Galerkin method to discretize the Stokes equation and the mixed finite element method to discretize the Darcy equation.A discrete inf-sup condition is proved and the optimal error estimates are also derived.Numerical experiments validate the theoretical analysis.
文摘Pseudo-random number generators have always been important in experimental design, computer simulation, cryptography and statistical analysis. This paper presents a method of comparing the degree of independence exhibited by various random number generators, a procedure, based on consideration of the largest (in modulus) non-unit eigenvalue of the observed Markov transition matrix, is used to assess the 'randomness' of a random number generator.
基金supported by the Fundamental Research Funds for the Central Universities,the Science and Technology Commission of Shanghai Municipality(No.19511120601)the Scientific and Technological Innovation 2030 Major Projects(No.2018AAA0100902)+1 种基金the CCF-AFSG Research Fund(No.CCF-AFSG RF20220205)the“Chenguang Program”sponsored by Shanghai Education Development Foundation and Shanghai Municipal Education Commission(No.21CGA32).
文摘Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of learning the structure. In most cases, the score of Bayesian network is defined as adding the log-likelihood score and complexity score by using the penalty function. If the penalty function is set unreasonably, it may hurt the performance of structure search. Thus, Bayesian network structure learning is essentially a bi-objective optimization problem. However, the existing bi-objective structure learning algorithms can only be applied to small-scale networks. To this end, this paper proposes a bi-objective evolutionary Bayesian network structure learning algorithm via skeleton constraint (BBS) for the medium-scale networks. To boost the performance of searching, BBS introduces the random order prior (ROP) initial operator. ROP generates a skeleton to constrain the searching space, which is the key to expanding the scale of structure learning problems. Then, the acyclic structures are guaranteed by adding the orders of variables in the initial skeleton. After that, BBS designs the Pareto rank based crossover and skeleton guided mutation operators. The operators operate on the skeleton obtained in ROP to make the search more targeted. Finally, BBS provides a strategy to choose the final solution. The experimental results show that BBS can always find the structure which is closer to the ground truth compared with the single-objective structure learning methods. Furthermore, compared with the existing bi-objective structure learning methods, BBS is scalable and can be applied to medium-scale Bayesian network datasets. On the educational problem of discovering the influencing factors of students’ academic performance, BBS provides higher quality solutions and is featured with the flexibility of solution selection compared with the widely-used Bayesian network structure learning methods.
基金Supported by the National Natural Science Foundation of China(61403290,11301408,11401454)the Foundation for Youths of Shaanxi Province(2014JQ1020)+1 种基金the Foundation of Baoji City(2013R7-3)the Foundation of Baoji University of Arts and Sciences(ZK15081)
文摘Learning Bayesian network structure is one of the most exciting challenges in machine learning. Discovering a correct skeleton of a directed acyclic graph(DAG) is the foundation for dependency analysis algorithms for this problem. Considering the unreliability of high order condition independence(CI) tests, and to improve the efficiency of a dependency analysis algorithm, the key steps are to use few numbers of CI tests and reduce the sizes of conditioning sets as much as possible. Based on these reasons and inspired by the algorithm PC, we present an algorithm, named fast and efficient PC(FEPC), for learning the adjacent neighbourhood of every variable. FEPC implements the CI tests by three kinds of orders, which reduces the high order CI tests significantly. Compared with current algorithm proposals, the experiment results show that FEPC has better accuracy with fewer numbers of condition independence tests and smaller size of conditioning sets. The highest reduction percentage of CI test is 83.3% by EFPC compared with PC algorithm.
基金supported by National Natural Science Foundation of China(No.61673283)
文摘This paper describes a brain-inspired simultaneous localization and mapping (SLAM) system using oriented features from accelerated segment test and rotated binary robust independent elementary (ORB) features of RGB (red, green, blue) sensor for a mobile robot. The core SLAM system, dubbed RatSLAM, can construct a cognitive map using information of raw odometry and visual scenes in the path traveled. Different from existing RatSLAM system which only uses a simple vector to represent features of visual image, in this paper, we employ an efficient and very fast descriptor method, called ORB, to extract features from RCB images. Experiments show that these features are suitable to recognize the sequences of familiar visual scenes. Thus, while loop closure errors are detected, the descriptive features will help to modify the pose estimation by driving loop closure and localization in a map correction algorithm. Efficiency and robustness of our method are also demonstrated by comparing with different visual processing algorithms.
文摘Inferring gene regulatory networks (GRNs) is a challenging task in Bioinformatics. In this paper, an algorithm, PCHMS, is introduced to infer GRNs. This method applies the path consistency (PC) algorithm based on conditional mutual information test (PCA-CMI). In the PC-based algorithms the separator set is determined to detect the dependency between variables. The PCHMS algorithm attempts to select the set in the smart way. For this purpose, the edges of resulted skeleton are directed based on PC algorithm direction rule and mutual information test (MIT) score. Then the separator set is selected according to the directed network by considering a suitable sequential order of genes. The effectiveness of this method is benchmarked through several networks from the DREAM challenge and the widely used SOS DNA repair network of Escherichia coll. Results show that applying the PCHMS algorithm improves the precision of learning the structure of the GRNs in comparison with current popular approaches.