Researchers in bioinformatics, biostatistics and other related fields seek biomarkers for many purposes, including risk assessment, disease diagnosis and prognosis, which can be formulated as a patient classification....Researchers in bioinformatics, biostatistics and other related fields seek biomarkers for many purposes, including risk assessment, disease diagnosis and prognosis, which can be formulated as a patient classification. In this paper, a new method of using a tree regression to improve logistic classification model is introduced in biomarker data analysis. The numerical results show that the linear logistic model can be significantly improved by a tree regression on the residuals. Although the classification problem of binary responses is discussed in this research, the idea is easy to extend to the classification of multinomial responses.展开更多
The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing...The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing methods in the literature heavily depend on an unverifiable assumption of the missing data mechanism, and they fail when the assumption is violated. This paper proposes a missing data mechanism that is as generally applicable as possible, which includes both ignorable and nonignorable missing data cases, as well as both scenarios of missing values in response and covariate.Under this general missing data mechanism, the authors adopt an approximate conditional likelihood method to estimate unknown parameters. The authors rigorously establish the regularity conditions under which the unknown parameters are identifiable under the approximate conditional likelihood approach. For parameters that are identifiable, the authors prove the asymptotic normality of the estimators obtained by maximizing the approximate conditional likelihood. Some simulation studies are conducted to evaluate finite sample performance of the proposed estimators as well as estimators from some existing methods. Finally, the authors present a biomarker analysis in prostate cancer study to illustrate the proposed method.展开更多
文摘Researchers in bioinformatics, biostatistics and other related fields seek biomarkers for many purposes, including risk assessment, disease diagnosis and prognosis, which can be formulated as a patient classification. In this paper, a new method of using a tree regression to improve logistic classification model is introduced in biomarker data analysis. The numerical results show that the linear logistic model can be significantly improved by a tree regression on the residuals. Although the classification problem of binary responses is discussed in this research, the idea is easy to extend to the classification of multinomial responses.
基金supported by the Chinese 111 Project B14019the US National Science Foundation under Grant Nos.DMS-1305474 and DMS-1612873the US National Institutes of Health Award UL1TR001412
文摘The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing methods in the literature heavily depend on an unverifiable assumption of the missing data mechanism, and they fail when the assumption is violated. This paper proposes a missing data mechanism that is as generally applicable as possible, which includes both ignorable and nonignorable missing data cases, as well as both scenarios of missing values in response and covariate.Under this general missing data mechanism, the authors adopt an approximate conditional likelihood method to estimate unknown parameters. The authors rigorously establish the regularity conditions under which the unknown parameters are identifiable under the approximate conditional likelihood approach. For parameters that are identifiable, the authors prove the asymptotic normality of the estimators obtained by maximizing the approximate conditional likelihood. Some simulation studies are conducted to evaluate finite sample performance of the proposed estimators as well as estimators from some existing methods. Finally, the authors present a biomarker analysis in prostate cancer study to illustrate the proposed method.