Aiming to provide an appropriate number K of clusters, in this paper, we propose a new criterion function - H criterion function, whose three properties have also been proved. We validate the performance of the H crit...Aiming to provide an appropriate number K of clusters, in this paper, we propose a new criterion function - H criterion function, whose three properties have also been proved. We validate the performance of the H criterion function on one artificial dataset and three real-world datasets, and the results are almostly consistent with a previous method. The nonparametric criterion we proposed is intuitive, simple and the computational cost is acceptable.展开更多
In this article, using the likelihood score theory extended to nuisance parameters we derive a new homogeneity score test for comparing linkage disequilibrium across several strata. Power and sample size formulae are...In this article, using the likelihood score theory extended to nuisance parameters we derive a new homogeneity score test for comparing linkage disequilibrium across several strata. Power and sample size formulae are also obtained.展开更多
The generalized estimating equations(GEE) approach is perhaps one of the most widely used methods for longitudinal data analysis. While the GEE method guarantees the consistency of its estimators under working correla...The generalized estimating equations(GEE) approach is perhaps one of the most widely used methods for longitudinal data analysis. While the GEE method guarantees the consistency of its estimators under working correlation structure misspecification, the corresponding efficiency can be severely affected. In this paper, we propose a new two-step estimation method in which the correlation matrix is assumed to be a linear combination of some known working matrices. Asymptotic properties of the new estimators are developed.Simulation studies are conducted to examine the performance of the proposed estimators. We illustrate the methodology with an epileptic data set.展开更多
We use the functional principal component analysis(FPCA) to model and predict the weight growth in children.In particular,we examine how the approach can help discern growth patterns of underweight children relative t...We use the functional principal component analysis(FPCA) to model and predict the weight growth in children.In particular,we examine how the approach can help discern growth patterns of underweight children relative to their normal counterparts,and whether a commonly used transformation to normality plays any constructive roles in a predictive model based on the FPCA.Our work supplements the conditional growth charts developed by Wei and He(2006) by constructing a predictive growth model based on a small number of principal components scores on individual's past.展开更多
Ranked-set sampling(RSS) often provides more efficient inference than simple random sampling(SRS).In this article,we propose a systematic nonparametric technique,RSS-EL,for hypoth-esis testing and interval estimation ...Ranked-set sampling(RSS) often provides more efficient inference than simple random sampling(SRS).In this article,we propose a systematic nonparametric technique,RSS-EL,for hypoth-esis testing and interval estimation with balanced RSS data using empirical likelihood(EL).We detail the approach for interval estimation and hypothesis testing in one-sample and two-sample problems and general estimating equations.In all three cases,RSS is shown to provide more efficient inference than SRS of the same size.Moreover,the RSS-EL method does not require any easily violated assumptions needed by existing rank-based nonparametric methods for RSS data,such as perfect ranking,identical ranking scheme in two groups,and location shift between two population distributions.The merit of the RSS-EL method is also demonstrated through simulation studies.展开更多
Abstract In this paper, we investigate the effective condition numbers for the generalized Sylvester equation (AX - YB, DX - YE) = (C,F), where A,D ∈ Rm×m B,E ∈ Rn×n and C,F ∈ Rm×n. We apply the ...Abstract In this paper, we investigate the effective condition numbers for the generalized Sylvester equation (AX - YB, DX - YE) = (C,F), where A,D ∈ Rm×m B,E ∈ Rn×n and C,F ∈ Rm×n. We apply the small sample statistical method for the fast condition estimation of the generalized Sylvester equation, which requires (9(m2n + mn2) flops, comparing with (-O(m3 + n3) flops for the generalized Schur and generalized Hessenberg- Schur methods for solving the generalized Sylvester equation. Numerical examples illustrate the sharpness of our perturbation bounds.展开更多
Most of the previous researches about portfolio analysis focus on short-selling. In fact, no short-selling is also important because short-selling is not allowed in stock markets of some countries. This paper gives th...Most of the previous researches about portfolio analysis focus on short-selling. In fact, no short-selling is also important because short-selling is not allowed in stock markets of some countries. This paper gives the sufficient and necessary conditions and proposes an optimal algorithm for Markowitz’s mean-variance models and Sharpe’s ratio with no short-selling. The optimal algorithm makes it easier to obtain the efficient frontiers with no short-selling.展开更多
The conditional independence structure of a common probability measure is a structural model. In this paper, we solve an open problem posed by Studeny [Probabilistic Conditional Independence Structures, Theme 9, p. 20...The conditional independence structure of a common probability measure is a structural model. In this paper, we solve an open problem posed by Studeny [Probabilistic Conditional Independence Structures, Theme 9, p. 206]. A new approach is proposed to decompose a directed acyclic graph and its optimal properties are studied. We interpret this approach from the perspective of the decomposition of the corresponding conditional independence model and provide an algorithm for identifying the maximal prime subgraphs in a directed acyclic graph.展开更多
We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257...We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257], we present algebraic-geometry characterizations of these two problems, and propose two corresponding algorithms. These algorithms can be realized with any computer algebra system when the number of variables is small.展开更多
In genetic studies of complex diseases, particularly mental illnesses, and behavior disorders, two distinct characteristics have emerged in some data sets. First, genetic data sets are collected with a large number of...In genetic studies of complex diseases, particularly mental illnesses, and behavior disorders, two distinct characteristics have emerged in some data sets. First, genetic data sets are collected with a large number of phenotypes that are potentially related to the complex disease under study. Second, each phenotype is collected from the same subject repeatedly over time. In this study, we present a nonparametric regression approach to study multivariate and time-repeated phenotypes together by using the technique of the multivariate adaptive regression splines for analysis of longitudinal data (MASAL), which makes it possible to identify genes, gene-gene and gene-environment, including time, interactions associated with the phenotypes of interest. Furthermore, we propose a permutation test to assess the associations between the phenotypes and selected markers. Through simulation, we demonstrate that our proposed approach has advantages over the existing methods that examine each longitudinal phenotype separately or analyze the summarized values of phenotypes by compressing them into one-time-point phenotypes. Application of the proposed method to the Framingham Heart Study illustrates that the use of multivariate longitudinal phenotypes enhanced the significance of the association test.展开更多
This paper studies non-convex programming problems. It is known that, in statistical inference, many constrained estimation problems may be expressed as convex programming problems. However, in many practical problems...This paper studies non-convex programming problems. It is known that, in statistical inference, many constrained estimation problems may be expressed as convex programming problems. However, in many practical problems, the objective functions are not convex. In this paper, we give a definition of a semi-convex objective function and discuss the corresponding non-convex programming problems. A two-step iterative algorithm called the alternating iterative method is proposed for finding solutions for such problems. The method is illustrated by three examples in constrained estimation problems given in Sasabuchi et al. (Biometrika, 72, 465472 (1983)), Shi N. Z. (J. Multivariate Anal., 50, 282-293 (1994)) and El Barmi H. and Dykstra R. (Ann. Statist., 26, 1878 1893 (1998)).展开更多
The method of generalized estimating equations (GEE) introduced by K. Y. Liang and S. L. Zeger has been widely used to analyze longitudinal data. Recently, this method has been criticized for a failure to protect ag...The method of generalized estimating equations (GEE) introduced by K. Y. Liang and S. L. Zeger has been widely used to analyze longitudinal data. Recently, this method has been criticized for a failure to protect against misspecification of working correlation models, which in some cases leads to loss of efficiency or infeasibility of solutions. In this paper, we present a new method named as 'weighted estimating equations (WEE)' for estimating the correlation parameters. The new estimates of correlation parameters are obtained as the solutions of these weighted estimating equations. For some commonly assumed correlation structures, we show that there exists a unique feasible solution to these weighted estimating equations regardless the correlation structure is correctly specified or not. The new feasible estimates of correlation parameters are consistent when the working correlation structure is correctly specified. Simulation results suggest that the new method works well in finite samples.展开更多
Observations of sampling are often subject to rounding, but are modeled as though they were unrounded. This paper examines the impact of rounding errors on parameter estimation with multi-layer ranked set sampling. It...Observations of sampling are often subject to rounding, but are modeled as though they were unrounded. This paper examines the impact of rounding errors on parameter estimation with multi-layer ranked set sampling. It shows that the rounding errors seriously distort the behavior of covariance matrix estimate, and lead to inconsistent estimation. Taking this into account, we present a new approach to implement the estimation for this model, and further establish the strong consistency and asymptotic normality of the proposed estimators. Simulation experiments show that our estimates based on rounded multi-layer ranked set sampling are always more efficient than those based on rounded simple random sampling.展开更多
文摘Aiming to provide an appropriate number K of clusters, in this paper, we propose a new criterion function - H criterion function, whose three properties have also been proved. We validate the performance of the H criterion function on one artificial dataset and three real-world datasets, and the results are almostly consistent with a previous method. The nonparametric criterion we proposed is intuitive, simple and the computational cost is acceptable.
基金The NNSF (10371015, 10329102) of China, and the Science Foundation (20060101) for Young Teachers of Northeast Normal University.
文摘In this article, using the likelihood score theory extended to nuisance parameters we derive a new homogeneity score test for comparing linkage disequilibrium across several strata. Power and sample size formulae are also obtained.
基金Supported by the National Natural Science Foundation of China(No.11471068)
文摘The generalized estimating equations(GEE) approach is perhaps one of the most widely used methods for longitudinal data analysis. While the GEE method guarantees the consistency of its estimators under working correlation structure misspecification, the corresponding efficiency can be severely affected. In this paper, we propose a new two-step estimation method in which the correlation matrix is assumed to be a linear combination of some known working matrices. Asymptotic properties of the new estimators are developed.Simulation studies are conducted to examine the performance of the proposed estimators. We illustrate the methodology with an epileptic data set.
基金supported by National Natural Science Foundation of China (Grant No. 10828102)a Changjiang Visiting Professorship, the Training Fund of Northeast Normal University’s Scientific Innovation Project (Grant No. NENU-STC07002)the National Institutes of Health Grant of USA (Grant No. R01GM080503-01A1)
文摘We use the functional principal component analysis(FPCA) to model and predict the weight growth in children.In particular,we examine how the approach can help discern growth patterns of underweight children relative to their normal counterparts,and whether a commonly used transformation to normality plays any constructive roles in a predictive model based on the FPCA.Our work supplements the conditional growth charts developed by Wei and He(2006) by constructing a predictive growth model based on a small number of principal components scores on individual's past.
基金supported by National Natural Science Foundation of China (Grant No. 10871037)
文摘Ranked-set sampling(RSS) often provides more efficient inference than simple random sampling(SRS).In this article,we propose a systematic nonparametric technique,RSS-EL,for hypoth-esis testing and interval estimation with balanced RSS data using empirical likelihood(EL).We detail the approach for interval estimation and hypothesis testing in one-sample and two-sample problems and general estimating equations.In all three cases,RSS is shown to provide more efficient inference than SRS of the same size.Moreover,the RSS-EL method does not require any easily violated assumptions needed by existing rank-based nonparametric methods for RSS data,such as perfect ranking,identical ranking scheme in two groups,and location shift between two population distributions.The merit of the RSS-EL method is also demonstrated through simulation studies.
基金supported by National Natural Science Foundation of China(Grant Nos.11001045,10926107 and 11271084)Specialized Research Fund for the Doctoral Program of Higher Education of MOE(Grant No. 20090043120008)+4 种基金Training Fund of NENU’S Scientific Innovation Project of Northeast Normal University(Grant No. NENU-STC08009)Program for Changjiang Scholars and Innovative Research Team in Universitythe Programme for Cultivating Innovative Students in Key Disciplines of Fudan University(973 Program Project)(Grant No. 2010CB327900)Doctoral Program of the Ministry of Education(Grant No.20090071110003)Shanghai Science & Technology Committee and Shanghai Education Committee(Dawn Project)
文摘Abstract In this paper, we investigate the effective condition numbers for the generalized Sylvester equation (AX - YB, DX - YE) = (C,F), where A,D ∈ Rm×m B,E ∈ Rn×n and C,F ∈ Rm×n. We apply the small sample statistical method for the fast condition estimation of the generalized Sylvester equation, which requires (9(m2n + mn2) flops, comparing with (-O(m3 + n3) flops for the generalized Schur and generalized Hessenberg- Schur methods for solving the generalized Sylvester equation. Numerical examples illustrate the sharpness of our perturbation bounds.
基金the National Natural Science Foundation of China (Grant Nos. 10501005, 10701021)Northeast Normal University (Grant No. NENU-STC07001)
文摘Most of the previous researches about portfolio analysis focus on short-selling. In fact, no short-selling is also important because short-selling is not allowed in stock markets of some countries. This paper gives the sufficient and necessary conditions and proposes an optimal algorithm for Markowitz’s mean-variance models and Sharpe’s ratio with no short-selling. The optimal algorithm makes it easier to obtain the efficient frontiers with no short-selling.
文摘The conditional independence structure of a common probability measure is a structural model. In this paper, we solve an open problem posed by Studeny [Probabilistic Conditional Independence Structures, Theme 9, p. 206]. A new approach is proposed to decompose a directed acyclic graph and its optimal properties are studied. We interpret this approach from the perspective of the decomposition of the corresponding conditional independence model and provide an algorithm for identifying the maximal prime subgraphs in a directed acyclic graph.
基金The authors wish to thank the referees for very helpful comments which greatly improved the presentation of this paper. This work was partially supported by the National Natural Science Foundation of China (Grant No. 11025102), Program for Changjiang Scholars and Innovative Research Team in University, and the Jilin Project (20100401).
文摘We consider the problems of semi-graphoid inference and of independence implication from a set of conditional-independence statements. Based on ideas from R. Hemmecke et al. [Combin. Probab. Comput., 2008, 17:239 257], we present algebraic-geometry characterizations of these two problems, and propose two corresponding algorithms. These algorithms can be realized with any computer algebra system when the number of variables is small.
基金The authors thank two anonymous referees for their constructive comments and suggestions. This work was supported by grant R01 DA016750-09 from the National Institute on Drug Abuse. Zhu's work was also supported by the National Natural Science Foundation of China (Grant No. 11001044), the Yhndamental Research ~nds for the Central Universities (11CXPY007, 10JCXK001), the Natural Science Foundation of Jilin Province (Grant No. 201215007), the Scientific Research Foundation for Returned Scholars, MOE of China, and the Program for Changjiang Scholars and Innovative Research Team in University. The Framingham Heart Study project is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (N01 HC25195). The Framingham data used for the analyses described in this manuscript were obtained through dbGaP (phs000128.v3.p3).
文摘In genetic studies of complex diseases, particularly mental illnesses, and behavior disorders, two distinct characteristics have emerged in some data sets. First, genetic data sets are collected with a large number of phenotypes that are potentially related to the complex disease under study. Second, each phenotype is collected from the same subject repeatedly over time. In this study, we present a nonparametric regression approach to study multivariate and time-repeated phenotypes together by using the technique of the multivariate adaptive regression splines for analysis of longitudinal data (MASAL), which makes it possible to identify genes, gene-gene and gene-environment, including time, interactions associated with the phenotypes of interest. Furthermore, we propose a permutation test to assess the associations between the phenotypes and selected markers. Through simulation, we demonstrate that our proposed approach has advantages over the existing methods that examine each longitudinal phenotype separately or analyze the summarized values of phenotypes by compressing them into one-time-point phenotypes. Application of the proposed method to the Framingham Heart Study illustrates that the use of multivariate longitudinal phenotypes enhanced the significance of the association test.
基金the National Natural Science Foundation of China (Nos.10431010,10501005)Science Foundation for Young Teachers of NENU (No.20070103)
文摘This paper studies non-convex programming problems. It is known that, in statistical inference, many constrained estimation problems may be expressed as convex programming problems. However, in many practical problems, the objective functions are not convex. In this paper, we give a definition of a semi-convex objective function and discuss the corresponding non-convex programming problems. A two-step iterative algorithm called the alternating iterative method is proposed for finding solutions for such problems. The method is illustrated by three examples in constrained estimation problems given in Sasabuchi et al. (Biometrika, 72, 465472 (1983)), Shi N. Z. (J. Multivariate Anal., 50, 282-293 (1994)) and El Barmi H. and Dykstra R. (Ann. Statist., 26, 1878 1893 (1998)).
文摘The method of generalized estimating equations (GEE) introduced by K. Y. Liang and S. L. Zeger has been widely used to analyze longitudinal data. Recently, this method has been criticized for a failure to protect against misspecification of working correlation models, which in some cases leads to loss of efficiency or infeasibility of solutions. In this paper, we present a new method named as 'weighted estimating equations (WEE)' for estimating the correlation parameters. The new estimates of correlation parameters are obtained as the solutions of these weighted estimating equations. For some commonly assumed correlation structures, we show that there exists a unique feasible solution to these weighted estimating equations regardless the correlation structure is correctly specified or not. The new feasible estimates of correlation parameters are consistent when the working correlation structure is correctly specified. Simulation results suggest that the new method works well in finite samples.
基金The second author is supported by National Natural Science Foundation of China (Grant No. 10871036)
文摘Observations of sampling are often subject to rounding, but are modeled as though they were unrounded. This paper examines the impact of rounding errors on parameter estimation with multi-layer ranked set sampling. It shows that the rounding errors seriously distort the behavior of covariance matrix estimate, and lead to inconsistent estimation. Taking this into account, we present a new approach to implement the estimation for this model, and further establish the strong consistency and asymptotic normality of the proposed estimators. Simulation experiments show that our estimates based on rounded multi-layer ranked set sampling are always more efficient than those based on rounded simple random sampling.