We use the functional principal component analysis(FPCA) to model and predict the weight growth in children.In particular,we examine how the approach can help discern growth patterns of underweight children relative t...We use the functional principal component analysis(FPCA) to model and predict the weight growth in children.In particular,we examine how the approach can help discern growth patterns of underweight children relative to their normal counterparts,and whether a commonly used transformation to normality plays any constructive roles in a predictive model based on the FPCA.Our work supplements the conditional growth charts developed by Wei and He(2006) by constructing a predictive growth model based on a small number of principal components scores on individual's past.展开更多
In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be direc...In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be directly used for the clustering of functional data. In this paper, we propose a new unsupervised clustering algorithm based on adaptive weights. In the absence of initialization parameter, we use entropy-type penalty terms and fuzzy partition matrix to find the optimal number of clusters. At the same time, we introduce a measure based on adaptive weights to reflect the difference in information content between different clustering metrics. Simulation experiments show that the proposed algorithm has higher purity than some algorithms.展开更多
Panicle swarm optimization (PSO) is an optimization algorithm based on the swarm intelligent principle. In this paper the modified PSO is applied to a kernel principal component analysis ( KPCA ) for an optimal ke...Panicle swarm optimization (PSO) is an optimization algorithm based on the swarm intelligent principle. In this paper the modified PSO is applied to a kernel principal component analysis ( KPCA ) for an optimal kernel function parameter. We first comprehensively considered within-class scatter and between-class scatter of the sample features. Then, the fitness function of an optimized kernel function parameter is constructed, and the particle swarm optimization algorithm with adaptive acceleration (CPSO) is applied to optimizing it. It is used for gearbox condi- tion recognition, and the result is compared with the recognized results based on principal component analysis (PCA). The results show that KPCA optimized by CPSO can effectively recognize fault conditions of the gearbox by reducing bind set-up of the kernel function parameter, and its results of fault recognition outperform those of PCA. We draw the conclusion that KPCA based on CPSO has an advantage in nonlinear feature extraction of mechanical failure, and is helpful for fault condition recognition of complicated machines.展开更多
The unified weighing scheme for the local-linear smoother in analysing functional data can deal with data that are dense,sparse or of neither type.In this paper,we focus on the convergence rate of functional principal...The unified weighing scheme for the local-linear smoother in analysing functional data can deal with data that are dense,sparse or of neither type.In this paper,we focus on the convergence rate of functional principal component analysis using this method.Almost sure asymptotic consistency and rates of convergence for the estimators of eigenvalues and eigenfunctions have been established.We also provide the convergence rate of the variance estimation of the measurement error.Based on the results,the number of observations within each curve can be of any rate relative to the sample size,which is consistent with the earlier conclusions about the asymptotic properties of the mean and covariance estimators.展开更多
In this paper,we consider the clustering of bivariate functional data where each random surface consists of a set of curves recorded repeatedly for each subject.The k-centres surface clustering method based on margina...In this paper,we consider the clustering of bivariate functional data where each random surface consists of a set of curves recorded repeatedly for each subject.The k-centres surface clustering method based on marginal functional principal component analysis is proposed for the bivariate functional data,and a novel clustering criterion is presented where both the random surface and its partial derivative function in two directions are considered.In addition,we also consider two other clustering methods,k-centres surface clustering methods based on product functional principal component analysis or double functional principal component analysis.Simulation results indicate that the proposed methods have a nice performance in terms of both the correct classification rate and the adjusted rand index.The approaches are further illustrated through empirical analysis of human mortality data.展开更多
Existing methods for analyzing semi-functional linear models usually assumed that random errors are not serially correlated or serially correlated with the known order.However,in some applications,these assumptions on...Existing methods for analyzing semi-functional linear models usually assumed that random errors are not serially correlated or serially correlated with the known order.However,in some applications,these assumptions on random errors may be unreasonable or questionable.To this end,this paper aims at testing error correlation in a semi-functional linear model(SFLM).Based on the empirical likelihood approach,the authors construct an empirical likelihood ratio statistic to test the serial correlation of random errors and identify the order of autocorrelation if the serial correlation holds.The proposed test statistic does not need to estimate the variance as it is data adaptive and possesses the nonparametric version of Wilks'theorem.Simulation studies are conducted to investigate the performance of the proposed test procedure.Two real examples are illustrated by the proposed test method.展开更多
As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiab...As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiable everywhere. This regression also enables effective estimation of the expectiles of a response variable when potential explanatory variables are given. In this study, we propose the partial functional linear expectile regression model. The slope function and constant coefficients are estimated by using the functional principal component basis. The convergence rate of the slope function and the asymptotic normality of the parameter vector are established. To inspect the effect of the parametric component on the response variable, we develop Wald-type and expectile rank score tests and establish their asymptotic properties. The finite performance of the proposed estimators and test statistics are evaluated through simulation study. Results indicate that the proposed estimators are comparable to competing estimation methods and the newly proposed expectile rank score test is useful. The methodologies are illustrated by using two real data examples.展开更多
This paper investigates the hypothesis test of the parametric component in partial functional linear regression models.Based on a rank score function,the authors develop a rank test using functional principal componen...This paper investigates the hypothesis test of the parametric component in partial functional linear regression models.Based on a rank score function,the authors develop a rank test using functional principal component analysis,and establish the asymptotic properties of the resulting test under null and local alternative hypotheses.A simulation study shows that the proposed test procedure has good size and power with finite sample sizes.The authors also present an illustration through fitting the Berkeley Growth Data and testing the effect of gender on the height of kids.展开更多
To better describe and understand the time dynamics in functional data analysis,it is often desirable to recover the partial derivatives of the random surface.A novel approach is proposed based on marginal functional ...To better describe and understand the time dynamics in functional data analysis,it is often desirable to recover the partial derivatives of the random surface.A novel approach is proposed based on marginal functional principal component analysis to derive the representation for partial derivatives.To obtain the Karhunen-Lo`eve expansion of the partial derivatives,an adaptive estimation is explored.Asymptotic results of the proposed estimates are established.Simulation studies show that the proposed methods perform well in finite samples.Application to the human mortality data reveals informative time dynamics in mortality rates.展开更多
Emerging integrative analysis of genomic and anatomical imaging data which has not been well developed, provides invaluable information for the holistic discovery of the genomic structure of disease and has the potent...Emerging integrative analysis of genomic and anatomical imaging data which has not been well developed, provides invaluable information for the holistic discovery of the genomic structure of disease and has the potential to open a new avenue for discovering novel disease susceptibility genes which cannot be identified if they are analyzed separately. A key issue to the success of imaging and genomic data analysis is how to reduce their dimensions. Most previous methods for imaging information extraction and RNA-seq data reduction do not explore imaging spatial information and often ignore gene expression variation at the genomic positional level. To overcome these limitations, we extend functional principle component analysis from one dimension to two dimensions (2DFPCA) for representing imaging data and develop a multiple functional linear model (MFLM) in which functional principal scores of images are taken as multiple quantitative traits and RNA-seq profile across a gene is taken as a function predictor for assessing the association of gene expression with images. The developed method has been applied to image and RNA- seq data of ovarian cancer and kidney renal clear cell carcinoma (KIRC) studies. We identified 24 and 84 genes whose expressions were associated with imaging variations in ovarian cancer and KIRC studies, respectively. Our results showed that many significantly associated genes with images were not differentially expressed, but revealed their morphological and metabolic functions. The results also demonstrated that the peaks of the estimated regression coefficient function in the MFLM often allowed the discovery of splicing sites and multiple isoforms of gene expressions.展开更多
Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-ou...Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-outcome model to decompose the total variation of multiple functional outcomes into variation explained by independent variables with time-varying coefficient functions,by latent factors and by noise.The latent factors are the hidden common factors that influence the multiple outcomes and are found through the combined functional principal component analysis approach.Through the coefficients of the latent factors one may further explore the association of the multiple outcomes.This method is applied to the multivariate growth data of infants in a real medical study in Shanghai and produces interpretable results.Convergence rates for the proposed estimates of the varying coefficient and covariance functions of the model are derived under mild conditions.展开更多
Currently,working with partially observed functional data has attracted a greatly increasing attention,since there are many applications in which each functional curve may be observed only on a subset of a common doma...Currently,working with partially observed functional data has attracted a greatly increasing attention,since there are many applications in which each functional curve may be observed only on a subset of a common domain,and the incompleteness makes most existing methods for functional data analysis ineffective.In this paper,motivated by the appealing characteristics of conditional quantile regression,the authors consider the functional linear quantile regression,assuming the explanatory functions are observed partially on dense but discrete point grids of some random subintervals of the domain.A functional principal component analysis(FPCA)based estimator is proposed for the slope function,and the convergence rate of the estimator is investigated.In addition,the finite sample performance of the proposed estimator is evaluated through simulation studies and a real data application.展开更多
基金supported by National Natural Science Foundation of China (Grant No. 10828102)a Changjiang Visiting Professorship, the Training Fund of Northeast Normal University’s Scientific Innovation Project (Grant No. NENU-STC07002)the National Institutes of Health Grant of USA (Grant No. R01GM080503-01A1)
文摘We use the functional principal component analysis(FPCA) to model and predict the weight growth in children.In particular,we examine how the approach can help discern growth patterns of underweight children relative to their normal counterparts,and whether a commonly used transformation to normality plays any constructive roles in a predictive model based on the FPCA.Our work supplements the conditional growth charts developed by Wei and He(2006) by constructing a predictive growth model based on a small number of principal components scores on individual's past.
文摘In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be directly used for the clustering of functional data. In this paper, we propose a new unsupervised clustering algorithm based on adaptive weights. In the absence of initialization parameter, we use entropy-type penalty terms and fuzzy partition matrix to find the optimal number of clusters. At the same time, we introduce a measure based on adaptive weights to reflect the difference in information content between different clustering metrics. Simulation experiments show that the proposed algorithm has higher purity than some algorithms.
基金supported by National Natural Science Foundation under Grant No.50875247Shanxi Province Natural Science Foundation under Grant No.2009011026-1
文摘Panicle swarm optimization (PSO) is an optimization algorithm based on the swarm intelligent principle. In this paper the modified PSO is applied to a kernel principal component analysis ( KPCA ) for an optimal kernel function parameter. We first comprehensively considered within-class scatter and between-class scatter of the sample features. Then, the fitness function of an optimized kernel function parameter is constructed, and the particle swarm optimization algorithm with adaptive acceleration (CPSO) is applied to optimizing it. It is used for gearbox condi- tion recognition, and the result is compared with the recognized results based on principal component analysis (PCA). The results show that KPCA optimized by CPSO can effectively recognize fault conditions of the gearbox by reducing bind set-up of the kernel function parameter, and its results of fault recognition outperform those of PCA. We draw the conclusion that KPCA based on CPSO has an advantage in nonlinear feature extraction of mechanical failure, and is helpful for fault condition recognition of complicated machines.
基金supported by National Natural Science Foundation of China(project number:11771146,11831008,81530086,11771145)the National Social Science Foundation Key Program(17ZDA091)+2 种基金the 111 Project(B14019)Programof Shanghai Subject Chief Scientist(14XD1401600)supported by the China Postdoctoral Science Foundation(2018M630393).
文摘The unified weighing scheme for the local-linear smoother in analysing functional data can deal with data that are dense,sparse or of neither type.In this paper,we focus on the convergence rate of functional principal component analysis using this method.Almost sure asymptotic consistency and rates of convergence for the estimators of eigenvalues and eigenfunctions have been established.We also provide the convergence rate of the variance estimation of the measurement error.Based on the results,the number of observations within each curve can be of any rate relative to the sample size,which is consistent with the earlier conclusions about the asymptotic properties of the mean and covariance estimators.
基金supported by National Natural Science Foundation of China (Grant Nos.12261007)Natural Science Foundation of Guangxi Province (Grant No.2020GXNSFAA297225)。
文摘In this paper,we consider the clustering of bivariate functional data where each random surface consists of a set of curves recorded repeatedly for each subject.The k-centres surface clustering method based on marginal functional principal component analysis is proposed for the bivariate functional data,and a novel clustering criterion is presented where both the random surface and its partial derivative function in two directions are considered.In addition,we also consider two other clustering methods,k-centres surface clustering methods based on product functional principal component analysis or double functional principal component analysis.Simulation results indicate that the proposed methods have a nice performance in terms of both the correct classification rate and the adjusted rand index.The approaches are further illustrated through empirical analysis of human mortality data.
基金This research was supported by the National Natural Science Foundation of China under Grant Nos.11861074,11731011,11731015 and 12261051Applied Basic Research Project of Yunnan Province under Grant No.2019FB138.
文摘Existing methods for analyzing semi-functional linear models usually assumed that random errors are not serially correlated or serially correlated with the known order.However,in some applications,these assumptions on random errors may be unreasonable or questionable.To this end,this paper aims at testing error correlation in a semi-functional linear model(SFLM).Based on the empirical likelihood approach,the authors construct an empirical likelihood ratio statistic to test the serial correlation of random errors and identify the order of autocorrelation if the serial correlation holds.The proposed test statistic does not need to estimate the variance as it is data adaptive and possesses the nonparametric version of Wilks'theorem.Simulation studies are conducted to investigate the performance of the proposed test procedure.Two real examples are illustrated by the proposed test method.
基金supported by National Natural Science Foundation of China(Grant No.11771032)Natural Science Foundation of Shanxi Province of China(Grant No.201901D111279)+1 种基金the Research Grant Council of the Hong Kong Special Administration Region(Grant Nos.14301918 and 14302519)。
文摘As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiable everywhere. This regression also enables effective estimation of the expectiles of a response variable when potential explanatory variables are given. In this study, we propose the partial functional linear expectile regression model. The slope function and constant coefficients are estimated by using the functional principal component basis. The convergence rate of the slope function and the asymptotic normality of the parameter vector are established. To inspect the effect of the parametric component on the response variable, we develop Wald-type and expectile rank score tests and establish their asymptotic properties. The finite performance of the proposed estimators and test statistics are evaluated through simulation study. Results indicate that the proposed estimators are comparable to competing estimation methods and the newly proposed expectile rank score test is useful. The methodologies are illustrated by using two real data examples.
基金supported by the National Natural Science Foundation of China under Grant Nos.1177103211571340 and 11701020+1 种基金the Science and Technology Project of Beijing Municipal Education Commission under Grant Nos.KM201710005032 and KM201910005015the International Research Cooperation Seed Fund of Beijing University of Technology under Grant No.006000514118553。
文摘This paper investigates the hypothesis test of the parametric component in partial functional linear regression models.Based on a rank score function,the authors develop a rank test using functional principal component analysis,and establish the asymptotic properties of the resulting test under null and local alternative hypotheses.A simulation study shows that the proposed test procedure has good size and power with finite sample sizes.The authors also present an illustration through fitting the Berkeley Growth Data and testing the effect of gender on the height of kids.
基金supported by National Natural Science Foundation of China(Grant Nos.11861014,11561006 and 11971404)Natural Science Foundation of Guangxi Province(Grant No.2018GXNSFAA281145)+1 种基金Humanity and Social Science Youth Foundation of Ministry of Education of China(Grant No.19YJC910010)the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development,National Institutes of Health,USA。
文摘To better describe and understand the time dynamics in functional data analysis,it is often desirable to recover the partial derivatives of the random surface.A novel approach is proposed based on marginal functional principal component analysis to derive the representation for partial derivatives.To obtain the Karhunen-Lo`eve expansion of the partial derivatives,an adaptive estimation is explored.Asymptotic results of the proposed estimates are established.Simulation studies show that the proposed methods perform well in finite samples.Application to the human mortality data reveals informative time dynamics in mortality rates.
文摘Emerging integrative analysis of genomic and anatomical imaging data which has not been well developed, provides invaluable information for the holistic discovery of the genomic structure of disease and has the potential to open a new avenue for discovering novel disease susceptibility genes which cannot be identified if they are analyzed separately. A key issue to the success of imaging and genomic data analysis is how to reduce their dimensions. Most previous methods for imaging information extraction and RNA-seq data reduction do not explore imaging spatial information and often ignore gene expression variation at the genomic positional level. To overcome these limitations, we extend functional principle component analysis from one dimension to two dimensions (2DFPCA) for representing imaging data and develop a multiple functional linear model (MFLM) in which functional principal scores of images are taken as multiple quantitative traits and RNA-seq profile across a gene is taken as a function predictor for assessing the association of gene expression with images. The developed method has been applied to image and RNA- seq data of ovarian cancer and kidney renal clear cell carcinoma (KIRC) studies. We identified 24 and 84 genes whose expressions were associated with imaging variations in ovarian cancer and KIRC studies, respectively. Our results showed that many significantly associated genes with images were not differentially expressed, but revealed their morphological and metabolic functions. The results also demonstrated that the peaks of the estimated regression coefficient function in the MFLM often allowed the discovery of splicing sites and multiple isoforms of gene expressions.
基金supported by the National Natural Science Foundation of China under Grant Nos.11771146,11831008,81530086,11771145,11871252the 111 Project(B14019)Program of Shanghai Subject Chief Scientist under Grant No.14XD1401600。
文摘Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-outcome model to decompose the total variation of multiple functional outcomes into variation explained by independent variables with time-varying coefficient functions,by latent factors and by noise.The latent factors are the hidden common factors that influence the multiple outcomes and are found through the combined functional principal component analysis approach.Through the coefficients of the latent factors one may further explore the association of the multiple outcomes.This method is applied to the multivariate growth data of infants in a real medical study in Shanghai and produces interpretable results.Convergence rates for the proposed estimates of the varying coefficient and covariance functions of the model are derived under mild conditions.
基金supported by the National Natural Science Foundation of China under Grant No.11771032。
文摘Currently,working with partially observed functional data has attracted a greatly increasing attention,since there are many applications in which each functional curve may be observed only on a subset of a common domain,and the incompleteness makes most existing methods for functional data analysis ineffective.In this paper,motivated by the appealing characteristics of conditional quantile regression,the authors consider the functional linear quantile regression,assuming the explanatory functions are observed partially on dense but discrete point grids of some random subintervals of the domain.A functional principal component analysis(FPCA)based estimator is proposed for the slope function,and the convergence rate of the estimator is investigated.In addition,the finite sample performance of the proposed estimator is evaluated through simulation studies and a real data application.