Additive hazards model with random effects is proposed for modelling the correlated failure time data when focus is on comparing the failure times within clusters and on estimating the correlation between failure time...Additive hazards model with random effects is proposed for modelling the correlated failure time data when focus is on comparing the failure times within clusters and on estimating the correlation between failure times from the same cluster, as well as the marginal regression parameters. Our model features that, when marginalized over the random effect variable, it still enjoys the structure of the additive hazards model. We develop the estimating equations for inferring the regression parameters. The proposed estimators are shown to be consistent and asymptotically normal under appropriate regularity conditions. Furthermore, the estimator of the baseline hazards function is proposed and its asymptotic properties are also established. We propose a class of diagnostic methods to assess the overall fitting adequacy of the additive hazards model with random effects. We conduct simulation studies to evaluate the finite sample behaviors of the proposed estimators in various scenarios. Analysis of the Diabetic Retinopathy Study is provided as an illustration for the proposed method.展开更多
Recurrent event data often arises in biomedical studies, and individuals within a cluster might not be independent. We propose a semiparametric additive rates model for clustered recurrent event data, wherein the cova...Recurrent event data often arises in biomedical studies, and individuals within a cluster might not be independent. We propose a semiparametric additive rates model for clustered recurrent event data, wherein the covariates are assumed to add to the unspecified baseline rate. For the inference on the model parameters, estimating equation approaches are developed, and both large and finite sample properties of the proposed estimators are established.展开更多
This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics ...This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics of a typical clinical dataset. Typically, when a large clinical dataset is presented, it consists of challenges such as missing values, high dimensionality, and unbalanced classes. These pose an inherent problem when implementing feature selection and classification algorithms. With most clinical datasets, an initial exploration of the dataset is carried out, and those attributes with more than a certain percentage of missing values are eliminated from the dataset. Later, with the help of missing value imputation, feature selection and classification algorithms, prognostic and diagnostic models are developed. This paper has two main conclusions: 1) Despite the nature of clinical datasets, and their large size, methods for missing value imputation do not affect the final performance. What is crucial is that the dataset is an accurate representation of the clinical problem and those methods of imputing missing values are not critical for developing classifiers and prognostic/diagnostic models. 2) Supervised learning has proven to be more suitable for mining clinical data than unsupervised methods. It is also shown that non-parametric classifiers such as decision trees give better results when compared to parametric classifiers such as radial basis function networks(RBFNs).展开更多
基金Supported by National Natural Science Foundation of China(Grant Nos.11171263,11201350 and 11371299)Doctoral Fund of Ministry of Education of China(Grant Nos.20110141110004 and 20110141120004)Fundamental Research Funds for the Central Universities
文摘Additive hazards model with random effects is proposed for modelling the correlated failure time data when focus is on comparing the failure times within clusters and on estimating the correlation between failure times from the same cluster, as well as the marginal regression parameters. Our model features that, when marginalized over the random effect variable, it still enjoys the structure of the additive hazards model. We develop the estimating equations for inferring the regression parameters. The proposed estimators are shown to be consistent and asymptotically normal under appropriate regularity conditions. Furthermore, the estimator of the baseline hazards function is proposed and its asymptotic properties are also established. We propose a class of diagnostic methods to assess the overall fitting adequacy of the additive hazards model with random effects. We conduct simulation studies to evaluate the finite sample behaviors of the proposed estimators in various scenarios. Analysis of the Diabetic Retinopathy Study is provided as an illustration for the proposed method.
基金supported by International Cooperation Projects (2010DFA31790) of Chinese Ministry of Science and Technologythe fund of Central China Normal University for Ph.D students (No. 2009023)+2 种基金supported by the National Natural Science Foundation of China Grants(No. 10731010, 10971015 and 11021161)the National Basic Research Program of China (973 Program) (No.2007CB814902)Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics& Systems Science, Chinese Academy of Sciences (No. 2008DP173182)
文摘Recurrent event data often arises in biomedical studies, and individuals within a cluster might not be independent. We propose a semiparametric additive rates model for clustered recurrent event data, wherein the covariates are assumed to add to the unspecified baseline rate. For the inference on the model parameters, estimating equation approaches are developed, and both large and finite sample properties of the proposed estimators are established.
文摘This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics of a typical clinical dataset. Typically, when a large clinical dataset is presented, it consists of challenges such as missing values, high dimensionality, and unbalanced classes. These pose an inherent problem when implementing feature selection and classification algorithms. With most clinical datasets, an initial exploration of the dataset is carried out, and those attributes with more than a certain percentage of missing values are eliminated from the dataset. Later, with the help of missing value imputation, feature selection and classification algorithms, prognostic and diagnostic models are developed. This paper has two main conclusions: 1) Despite the nature of clinical datasets, and their large size, methods for missing value imputation do not affect the final performance. What is crucial is that the dataset is an accurate representation of the clinical problem and those methods of imputing missing values are not critical for developing classifiers and prognostic/diagnostic models. 2) Supervised learning has proven to be more suitable for mining clinical data than unsupervised methods. It is also shown that non-parametric classifiers such as decision trees give better results when compared to parametric classifiers such as radial basis function networks(RBFNs).