In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be direc...In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be directly used for the clustering of functional data. In this paper, we propose a new unsupervised clustering algorithm based on adaptive weights. In the absence of initialization parameter, we use entropy-type penalty terms and fuzzy partition matrix to find the optimal number of clusters. At the same time, we introduce a measure based on adaptive weights to reflect the difference in information content between different clustering metrics. Simulation experiments show that the proposed algorithm has higher purity than some algorithms.展开更多
Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities tu...Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities turn up dangerous contaminants in our surroundings. This study investigated two years’ worth of air quality and outlier detection data from two Indian cities. Studies on air pollution have used numerous types of methodologies, with various gases being seen as a vector whose components include gas concentration values for each observation per-formed. We use curves to represent the monthly average of daily gas emissions in our technique. The approach, which is based on functional depth, was used to find outliers in the city of Delhi and Kolkata’s gas emissions, and the outcomes were compared to those from the traditional method. In the evaluation and comparison of these models’ performances, the functional approach model studied well.展开更多
In this paper,we consider the clustering of bivariate functional data where each random surface consists of a set of curves recorded repeatedly for each subject.The k-centres surface clustering method based on margina...In this paper,we consider the clustering of bivariate functional data where each random surface consists of a set of curves recorded repeatedly for each subject.The k-centres surface clustering method based on marginal functional principal component analysis is proposed for the bivariate functional data,and a novel clustering criterion is presented where both the random surface and its partial derivative function in two directions are considered.In addition,we also consider two other clustering methods,k-centres surface clustering methods based on product functional principal component analysis or double functional principal component analysis.Simulation results indicate that the proposed methods have a nice performance in terms of both the correct classification rate and the adjusted rand index.The approaches are further illustrated through empirical analysis of human mortality data.展开更多
We propose a methodology for testing two-sample means in high-dimensional functional data that requires no decaying pattern on eigenvalues of the functional data.To the best of our knowledge,we are the first to consid...We propose a methodology for testing two-sample means in high-dimensional functional data that requires no decaying pattern on eigenvalues of the functional data.To the best of our knowledge,we are the first to consider and address such a problem.To be specific,we devise a confidence region for the mean curve difference between two samples,which directly establishes a rigorous inferential procedure based on the multiplier bootstrap.In addition,the proposed test permits the functional observations in each sample to have mutually different distributions and arbitrary correlation structures,which is regarded as the desired property of distribution/correlation-free,leading to a more challenging scenario for theoretical development.Other desired properties include the allowance for highly unequal sample sizes,exponentially growing data dimension in sample sizes and consistent power behavior under fairly general alternatives.The proposed test is shown uniformly convergent to the prescribed significance,and its finite sample performance is evaluated via the simulation study and an application to electroencephalography data.展开更多
We propose a two-sample test for the mean functions of functional data when the number of bases is much lager than the sample size.The novel test is based on U-statistics which avoids estimating the covariance operato...We propose a two-sample test for the mean functions of functional data when the number of bases is much lager than the sample size.The novel test is based on U-statistics which avoids estimating the covariance operator accurately under the high dimensional situation.We further prove the asymptotic normality of our test statistic under both null hypothesis and a local alternative hypothesis.An extensive simulation study is presented which shows that the proposed test works well in comparison with several other methods under the high dimensional situation.An application to egg-laying trajectories of Mediterranean fruit flies data set demonstrates the applicability of the method.展开更多
Chlorophyll-a(Chl-a)concentration is a primary indicator for marine environmental monitoring.The spatio-temporal variations of sea surface Chl-a concentration in the Yellow Sea(YS)and the East China Sea(ECS)in 2001-20...Chlorophyll-a(Chl-a)concentration is a primary indicator for marine environmental monitoring.The spatio-temporal variations of sea surface Chl-a concentration in the Yellow Sea(YS)and the East China Sea(ECS)in 2001-2020 were investigated by reconstructing the MODIS Level 3 products with the data interpolation empirical orthogonal function(DINEOF)method.The reconstructed results by interpolating the combined MODIS daily+8-day datasets were found better than those merely by interpolating daily or 8-day data.Chl-a concentration in the YS and the ECS reached its maximum in spring,with blooms occurring,decreased in summer and autumn,and increased in late autumn and early winter.By performing empirical orthogonal function(EOF)decomposition of the reconstructed data fields and correlation analysis with several potential environmental factors,we found that the sea surface temperature(SST)plays a significant role in the seasonal variation of Chl a,especially during spring and summer.The increase of SST in spring and the upper-layer nutrients mixed up during the last winter might favor the occurrence of spring blooms.The high sea surface temperature(SST)throughout the summer would strengthen the vertical stratification and prevent nutrients supply from deep water,resulting in low surface Chl-a concentrations.The sea surface Chl-a concentration in the YS was found decreased significantly from 2012 to 2020,which was possibly related to the Pacific Decadal Oscillation(PDO).展开更多
This paper deals with the conditional density estimator of a real response variable given a functional random variable(i.e.,takes values in an infinite-dimensional space).Specifically,we focus on the functional index ...This paper deals with the conditional density estimator of a real response variable given a functional random variable(i.e.,takes values in an infinite-dimensional space).Specifically,we focus on the functional index model,and this approach represents a good compromise between nonparametric and parametric models.Then we give under general conditions and when the variables are independent,the quadratic error and asymptotic normality of estimator by local linear method,based on the single-index structure.Finally,wecomplete these theoretical advances by some simulation studies showing both the practical result of the local linear method and the good behaviour for finite sample sizes of the estimator and of the Monte Carlo methods to create functional pseudo-confidence area.展开更多
Background:The accurate estimation of temporal patterns of influenza may help in utilizing hospital resources and guiding influenza surveillance.This paper proposes functional data analysis(FDA)to improve the predicti...Background:The accurate estimation of temporal patterns of influenza may help in utilizing hospital resources and guiding influenza surveillance.This paper proposes functional data analysis(FDA)to improve the prediction of temporal patterns of influenza.Methods:We illustrate FDA methods using the weekly Influenza-like Illness(ILI)activity level data from the U.S.We propose to use the Fourier basis function for transforming discrete weekly data to the smoothed functional ILI activities.Functional analysis of variance(FANOVA)is used to examine the regional differences in temporal patterns and the impact of state's political orientation.Results:The ILI activity has a very distinct peak at the beginning and end of the year.There are significant differences in average level of ILI activities among geographic regions.However,the temporal patterns in terms of the peak and flat time are quite consistent across regions.The geographic and temporal patterns of ILI activities also depend on the political make-up of states.The states affiliated with Republicans had higher ILI activities than those affiliated with Democrats across the whole year.The influence of political party affiliation on temporal pattern is quite different among geographic regions.Conclusions:Functional data analysis can help us to reveal the temporal variability in average ILI levels,rate of change in ILI levels,and the effect of geographical regions.Consideration should be given to wider application of FDA to generate more accurate estimates in public health and biomedical research.展开更多
The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to a...The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data.展开更多
Fuzzy clustering theory is widely used in data mining of full-face tunnel boring machine.However,the traditional fuzzy clustering algorithm based on objective function is difficult to effectively cluster functional da...Fuzzy clustering theory is widely used in data mining of full-face tunnel boring machine.However,the traditional fuzzy clustering algorithm based on objective function is difficult to effectively cluster functional data.We propose a new Fuzzy clustering algorithm,namely FCM-ANN algorithm.The algorithm replaces the clustering prototype of the FCM algorithm with the predicted value of the artificial neural network.This makes the algorithm not only satisfy the clustering based on the traditional similarity criterion,but also can effectively cluster the functional data.In this paper,we first use the t-test as an evaluation index and apply the FCM-ANN algorithm to the synthetic datasets for validity testing.Then the algorithm is applied to TBM operation data and combined with the crossvalidation method to predict the tunneling speed.The predicted results are evaluated by RMSE and R^(2).According to the experimental results on the synthetic datasets,we obtain the relationship among the membership threshold,the number of samples,the number of attributes and the noise.Accordingly,the datasets can be effectively adjusted.Applying the FCM-ANN algorithm to the TBM operation data can accurately predict the tunneling speed.The FCM-ANN algorithm has improved the traditional fuzzy clustering algorithm,which can be used not only for the prediction of tunneling speed of TBM but also for clustering or prediction of other functional data.展开更多
It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when th...It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when the covariates of the nonparametric component are functional,the robust estimates for the regression parameter and regression operator are introduced.The main propose of the paper is to consider data-driven methods of selecting the number of neighbors in order to make the proposed processes fully automatic.We use thek Nearest Neighbors procedure(kNN)to construct the kernel estimator of the proposed robust model.Under some regularity conditions,we state consistency results for kNN functional estimators,which are uniform in the number of neighbors(UINN).Furthermore,a simulation study and an empirical application to a real data analysis of octane gasoline predictions are carried out to illustrate the higher predictive performances and the usefulness of the kNN approach.展开更多
The study of estimation of conditional extreme quantile in incomplete data frameworks is of growing interest. Specially, the estimation of the extreme value index in a censorship framework has been the purpose of many...The study of estimation of conditional extreme quantile in incomplete data frameworks is of growing interest. Specially, the estimation of the extreme value index in a censorship framework has been the purpose of many inves<span style="font-family:Verdana;">tigations when finite dimension covariate information has been considered. In this paper, the estimation of the conditional extreme quantile of a </span><span style="font-family:Verdana;">heavy-tailed distribution is discussed when some functional random covariate (</span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;"> valued in some infinite-dimensional space) information is available and the scalar response variable is right-censored. A Weissman-type estimator of conditional extreme quantiles is proposed and its asymptotic normality is established under mild assumptions. A simulation study is conducted to assess the finite-sample behavior of the proposed estimator and a comparison with two simple estimations strategies is provided.</span>展开更多
A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberran...A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.展开更多
Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-ou...Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-outcome model to decompose the total variation of multiple functional outcomes into variation explained by independent variables with time-varying coefficient functions,by latent factors and by noise.The latent factors are the hidden common factors that influence the multiple outcomes and are found through the combined functional principal component analysis approach.Through the coefficients of the latent factors one may further explore the association of the multiple outcomes.This method is applied to the multivariate growth data of infants in a real medical study in Shanghai and produces interpretable results.Convergence rates for the proposed estimates of the varying coefficient and covariance functions of the model are derived under mild conditions.展开更多
To better describe and understand the time dynamics in functional data analysis,it is often desirable to recover the partial derivatives of the random surface.A novel approach is proposed based on marginal functional ...To better describe and understand the time dynamics in functional data analysis,it is often desirable to recover the partial derivatives of the random surface.A novel approach is proposed based on marginal functional principal component analysis to derive the representation for partial derivatives.To obtain the Karhunen-Lo`eve expansion of the partial derivatives,an adaptive estimation is explored.Asymptotic results of the proposed estimates are established.Simulation studies show that the proposed methods perform well in finite samples.Application to the human mortality data reveals informative time dynamics in mortality rates.展开更多
The problem of predicting continuous scalar outcomes from functional predictors has received high levels of interest in recent years in many fields,especially in the food industry.The k-nearest neighbor(k-NN)method of...The problem of predicting continuous scalar outcomes from functional predictors has received high levels of interest in recent years in many fields,especially in the food industry.The k-nearest neighbor(k-NN)method of Near-Infrared Reflectance(NIR)analysis is practical,relatively easy to implement,and becoming one of the most popular methods for conducting food quality based on NIR data.The k-NN is often named k nearest neighbor classifier when it is used for classifying categorical variables,while it is called k-nearest neighbor regression when it is applied for predicting noncategorical variables.The objective of this paper is to use the functional Near-Infrared Reflectance(NIR)spectroscopy approach to predict some chemical components with some modern statistical models based on the kernel and k-Nearest Neighbour procedures.In this paper,three NIR spectroscopy datasets are used as examples,namely Cookie dough,sugar,and tecator data.Specifically,we propose three models for this kind of data which are Functional Nonparametric Regression,Functional Robust Regression,and Functional Relative Error Regression,with both kernel and k-NN approaches to compare between them.The experimental result shows the higher efficiency of k-NN predictor over the kernel predictor.The predictive power of the k-NN method was compared with that of the kernel method,and several real data sets were used to determine the predictive power of both methods.展开更多
In this paper,we consider composite quantile regression for partial functional linear regression model with polynomial spline approximation.Under some mild conditions,the convergence rates of the estimators and mean s...In this paper,we consider composite quantile regression for partial functional linear regression model with polynomial spline approximation.Under some mild conditions,the convergence rates of the estimators and mean squared prediction error,and asymptotic normality of parameter vector are obtained.Simulation studies demonstrate that the proposed new estimation method is robust and works much better than the least-squares based method when there are outliers in the dataset or the random error follows heavy-tailed distributions.Finally,we apply the proposed methodology to a spectroscopic data sets to illustrate its usefulness in practice.展开更多
The increasing richness of data encourages a comprehensive understanding of economic and financial activities,where variables of interest may include not only scalar(point-like)indicators,but also functional(curve-lik...The increasing richness of data encourages a comprehensive understanding of economic and financial activities,where variables of interest may include not only scalar(point-like)indicators,but also functional(curve-like)and compositional(pie-like)ones.In many research topics,the variables are also chronologically collected across individuals,which falls into the paradigm of longitudinal analysis.The complicated nature of data,however,increases the difficulty of modeling these variables under the classic longitudinal frame-work.In this study,we investigate the linear mixed-effects model(LMM)for such complex data.Different types of variables arefirst consistently represented using the corresponding basis expansions so that the classic LMM can then be conducted on them,which gener-alizes the theoretical framework of LMM to complex data analysis.A number of simulation studies indicate the feasibility and effectiveness of the proposed model.We further illustrate its practical utility in a real data study on Chinese stock market and show that the proposed method can enhance the performance and interpretability of the regression for complex data with diversified characteristics.展开更多
This paper presents a robust estimation procedure by using modal regression for the partial functional linear regression,which combines the common linear model with the functional linear regression model.The outstandi...This paper presents a robust estimation procedure by using modal regression for the partial functional linear regression,which combines the common linear model with the functional linear regression model.The outstanding merit of the new method is that it is robust against outliers or heavy-tail error distributions while performs no worse than the least-square-based estimation method for normal error cases.The slope function is fitted by B-spline.Under suitable conditions,the authors obtain the convergence rates and asymptotic normality of the estimators.Finally,simulation studies and a real data example are conducted to examine the finite sample performance of the proposed method.Both the simulation results and the real data analysis confirm that the newly proposed method works very well.展开更多
Currently,working with partially observed functional data has attracted a greatly increasing attention,since there are many applications in which each functional curve may be observed only on a subset of a common doma...Currently,working with partially observed functional data has attracted a greatly increasing attention,since there are many applications in which each functional curve may be observed only on a subset of a common domain,and the incompleteness makes most existing methods for functional data analysis ineffective.In this paper,motivated by the appealing characteristics of conditional quantile regression,the authors consider the functional linear quantile regression,assuming the explanatory functions are observed partially on dense but discrete point grids of some random subintervals of the domain.A functional principal component analysis(FPCA)based estimator is proposed for the slope function,and the convergence rate of the estimator is investigated.In addition,the finite sample performance of the proposed estimator is evaluated through simulation studies and a real data application.展开更多
文摘In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be directly used for the clustering of functional data. In this paper, we propose a new unsupervised clustering algorithm based on adaptive weights. In the absence of initialization parameter, we use entropy-type penalty terms and fuzzy partition matrix to find the optimal number of clusters. At the same time, we introduce a measure based on adaptive weights to reflect the difference in information content between different clustering metrics. Simulation experiments show that the proposed algorithm has higher purity than some algorithms.
文摘Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities turn up dangerous contaminants in our surroundings. This study investigated two years’ worth of air quality and outlier detection data from two Indian cities. Studies on air pollution have used numerous types of methodologies, with various gases being seen as a vector whose components include gas concentration values for each observation per-formed. We use curves to represent the monthly average of daily gas emissions in our technique. The approach, which is based on functional depth, was used to find outliers in the city of Delhi and Kolkata’s gas emissions, and the outcomes were compared to those from the traditional method. In the evaluation and comparison of these models’ performances, the functional approach model studied well.
基金supported by National Natural Science Foundation of China (Grant Nos.12261007)Natural Science Foundation of Guangxi Province (Grant No.2020GXNSFAA297225)。
文摘In this paper,we consider the clustering of bivariate functional data where each random surface consists of a set of curves recorded repeatedly for each subject.The k-centres surface clustering method based on marginal functional principal component analysis is proposed for the bivariate functional data,and a novel clustering criterion is presented where both the random surface and its partial derivative function in two directions are considered.In addition,we also consider two other clustering methods,k-centres surface clustering methods based on product functional principal component analysis or double functional principal component analysis.Simulation results indicate that the proposed methods have a nice performance in terms of both the correct classification rate and the adjusted rand index.The approaches are further illustrated through empirical analysis of human mortality data.
基金supported by National Natural Science Foundation of China (Grant No.11901313)Fundamental Research Funds for the Central Universities+1 种基金Key Laboratory for Medical Data Analysis and Statistical Research of TianjinKey Laboratory of Pure Mathematics and Combinatorics.
文摘We propose a methodology for testing two-sample means in high-dimensional functional data that requires no decaying pattern on eigenvalues of the functional data.To the best of our knowledge,we are the first to consider and address such a problem.To be specific,we devise a confidence region for the mean curve difference between two samples,which directly establishes a rigorous inferential procedure based on the multiplier bootstrap.In addition,the proposed test permits the functional observations in each sample to have mutually different distributions and arbitrary correlation structures,which is regarded as the desired property of distribution/correlation-free,leading to a more challenging scenario for theoretical development.Other desired properties include the allowance for highly unequal sample sizes,exponentially growing data dimension in sample sizes and consistent power behavior under fairly general alternatives.The proposed test is shown uniformly convergent to the prescribed significance,and its finite sample performance is evaluated via the simulation study and an application to electroencephalography data.
基金Supported by the National Natural Science Foundation of China(Grant Nos.11671268 and 12271370)the Guangdong Basic and Applied Basic Research Foundation(Grant No.2020A1515010821)+1 种基金the Fundamental Research Funds for the Central Universities(Grant No.12619624)Supported by the Research Start-up Fund for new young Teachers of Capital University of Economics and Business(Grant No.00592254417068)。
文摘We propose a two-sample test for the mean functions of functional data when the number of bases is much lager than the sample size.The novel test is based on U-statistics which avoids estimating the covariance operator accurately under the high dimensional situation.We further prove the asymptotic normality of our test statistic under both null hypothesis and a local alternative hypothesis.An extensive simulation study is presented which shows that the proposed test works well in comparison with several other methods under the high dimensional situation.An application to egg-laying trajectories of Mediterranean fruit flies data set demonstrates the applicability of the method.
基金Supported by the Fundamental Research Funds for the Central Universities(Nos.202341017,202313024)。
文摘Chlorophyll-a(Chl-a)concentration is a primary indicator for marine environmental monitoring.The spatio-temporal variations of sea surface Chl-a concentration in the Yellow Sea(YS)and the East China Sea(ECS)in 2001-2020 were investigated by reconstructing the MODIS Level 3 products with the data interpolation empirical orthogonal function(DINEOF)method.The reconstructed results by interpolating the combined MODIS daily+8-day datasets were found better than those merely by interpolating daily or 8-day data.Chl-a concentration in the YS and the ECS reached its maximum in spring,with blooms occurring,decreased in summer and autumn,and increased in late autumn and early winter.By performing empirical orthogonal function(EOF)decomposition of the reconstructed data fields and correlation analysis with several potential environmental factors,we found that the sea surface temperature(SST)plays a significant role in the seasonal variation of Chl a,especially during spring and summer.The increase of SST in spring and the upper-layer nutrients mixed up during the last winter might favor the occurrence of spring blooms.The high sea surface temperature(SST)throughout the summer would strengthen the vertical stratification and prevent nutrients supply from deep water,resulting in low surface Chl-a concentrations.The sea surface Chl-a concentration in the YS was found decreased significantly from 2012 to 2020,which was possibly related to the Pacific Decadal Oscillation(PDO).
文摘This paper deals with the conditional density estimator of a real response variable given a functional random variable(i.e.,takes values in an infinite-dimensional space).Specifically,we focus on the functional index model,and this approach represents a good compromise between nonparametric and parametric models.Then we give under general conditions and when the variables are independent,the quadratic error and asymptotic normality of estimator by local linear method,based on the single-index structure.Finally,wecomplete these theoretical advances by some simulation studies showing both the practical result of the local linear method and the good behaviour for finite sample sizes of the estimator and of the Monte Carlo methods to create functional pseudo-confidence area.
基金Authors acknowledged the Canadian Institute for Health Research(CIHR)Children's Hospital Research Institute of Manitoba(CHRIM)Foundation+1 种基金Visual and Automated Disease Analytics(VADA)graduate training program of Natural Sciences and Engineering Research Council of Canada(NSERC)for providing the funding opportunities to conduct this research.
文摘Background:The accurate estimation of temporal patterns of influenza may help in utilizing hospital resources and guiding influenza surveillance.This paper proposes functional data analysis(FDA)to improve the prediction of temporal patterns of influenza.Methods:We illustrate FDA methods using the weekly Influenza-like Illness(ILI)activity level data from the U.S.We propose to use the Fourier basis function for transforming discrete weekly data to the smoothed functional ILI activities.Functional analysis of variance(FANOVA)is used to examine the regional differences in temporal patterns and the impact of state's political orientation.Results:The ILI activity has a very distinct peak at the beginning and end of the year.There are significant differences in average level of ILI activities among geographic regions.However,the temporal patterns in terms of the peak and flat time are quite consistent across regions.The geographic and temporal patterns of ILI activities also depend on the political make-up of states.The states affiliated with Republicans had higher ILI activities than those affiliated with Democrats across the whole year.The influence of political party affiliation on temporal pattern is quite different among geographic regions.Conclusions:Functional data analysis can help us to reveal the temporal variability in average ILI levels,rate of change in ILI levels,and the effect of geographical regions.Consideration should be given to wider application of FDA to generate more accurate estimates in public health and biomedical research.
基金the National Social Science Foundation of China(Grant No.22BTJ035).
文摘The classification of functional data has drawn much attention in recent years.The main challenge is representing infinite-dimensional functional data by finite-dimensional features while utilizing those features to achieve better classification accuracy.In this paper,we propose a mean-variance-based(MV)feature weighting method for classifying functional data or functional curves.In the feature extraction stage,each sample curve is approximated by B-splines to transfer features to the coefficients of the spline basis.After that,a feature weighting approach based on statistical principles is introduced by comprehensively considering the between-class differences and within-class variations of the coefficients.We also introduce a scaling parameter to adjust the gap between the weights of features.The new feature weighting approach can adaptively enhance noteworthy local features while mitigating the impact of confusing features.The algorithms for feature weighted K-nearest neighbor and support vector machine classifiers are both provided.Moreover,the new approach can be well integrated into existing functional data classifiers,such as the generalized functional linear model and functional linear discriminant analysis,resulting in a more accurate classification.The performance of the mean-variance-based classifiers is evaluated by simulation studies and real data.The results show that the newfeatureweighting approach significantly improves the classification accuracy for complex functional data.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFB1700704 and 2018YFB1702502)the Study on the Key Management and Privacy Preservation in VANET,The Innovation Foundation of Science and Technology of Dalian(2018J12GX045).
文摘Fuzzy clustering theory is widely used in data mining of full-face tunnel boring machine.However,the traditional fuzzy clustering algorithm based on objective function is difficult to effectively cluster functional data.We propose a new Fuzzy clustering algorithm,namely FCM-ANN algorithm.The algorithm replaces the clustering prototype of the FCM algorithm with the predicted value of the artificial neural network.This makes the algorithm not only satisfy the clustering based on the traditional similarity criterion,but also can effectively cluster the functional data.In this paper,we first use the t-test as an evaluation index and apply the FCM-ANN algorithm to the synthetic datasets for validity testing.Then the algorithm is applied to TBM operation data and combined with the crossvalidation method to predict the tunneling speed.The predicted results are evaluated by RMSE and R^(2).According to the experimental results on the synthetic datasets,we obtain the relationship among the membership threshold,the number of samples,the number of attributes and the noise.Accordingly,the datasets can be effectively adjusted.Applying the FCM-ANN algorithm to the TBM operation data can accurately predict the tunneling speed.The FCM-ANN algorithm has improved the traditional fuzzy clustering algorithm,which can be used not only for the prediction of tunneling speed of TBM but also for clustering or prediction of other functional data.
文摘It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when the covariates of the nonparametric component are functional,the robust estimates for the regression parameter and regression operator are introduced.The main propose of the paper is to consider data-driven methods of selecting the number of neighbors in order to make the proposed processes fully automatic.We use thek Nearest Neighbors procedure(kNN)to construct the kernel estimator of the proposed robust model.Under some regularity conditions,we state consistency results for kNN functional estimators,which are uniform in the number of neighbors(UINN).Furthermore,a simulation study and an empirical application to a real data analysis of octane gasoline predictions are carried out to illustrate the higher predictive performances and the usefulness of the kNN approach.
文摘The study of estimation of conditional extreme quantile in incomplete data frameworks is of growing interest. Specially, the estimation of the extreme value index in a censorship framework has been the purpose of many inves<span style="font-family:Verdana;">tigations when finite dimension covariate information has been considered. In this paper, the estimation of the conditional extreme quantile of a </span><span style="font-family:Verdana;">heavy-tailed distribution is discussed when some functional random covariate (</span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;"> valued in some infinite-dimensional space) information is available and the scalar response variable is right-censored. A Weissman-type estimator of conditional extreme quantiles is proposed and its asymptotic normality is established under mild assumptions. A simulation study is conducted to assess the finite-sample behavior of the proposed estimator and a comparison with two simple estimations strategies is provided.</span>
文摘A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.
基金supported by the National Natural Science Foundation of China under Grant Nos.11771146,11831008,81530086,11771145,11871252the 111 Project(B14019)Program of Shanghai Subject Chief Scientist under Grant No.14XD1401600。
文摘Motivated by a medical study that attempts to analyze the relationship between growth curves and other variables and to measure the association among multiple growth curves,the authors develop a functional multiple-outcome model to decompose the total variation of multiple functional outcomes into variation explained by independent variables with time-varying coefficient functions,by latent factors and by noise.The latent factors are the hidden common factors that influence the multiple outcomes and are found through the combined functional principal component analysis approach.Through the coefficients of the latent factors one may further explore the association of the multiple outcomes.This method is applied to the multivariate growth data of infants in a real medical study in Shanghai and produces interpretable results.Convergence rates for the proposed estimates of the varying coefficient and covariance functions of the model are derived under mild conditions.
基金supported by National Natural Science Foundation of China(Grant Nos.11861014,11561006 and 11971404)Natural Science Foundation of Guangxi Province(Grant No.2018GXNSFAA281145)+1 种基金Humanity and Social Science Youth Foundation of Ministry of Education of China(Grant No.19YJC910010)the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development,National Institutes of Health,USA。
文摘To better describe and understand the time dynamics in functional data analysis,it is often desirable to recover the partial derivatives of the random surface.A novel approach is proposed based on marginal functional principal component analysis to derive the representation for partial derivatives.To obtain the Karhunen-Lo`eve expansion of the partial derivatives,an adaptive estimation is explored.Asymptotic results of the proposed estimates are established.Simulation studies show that the proposed methods perform well in finite samples.Application to the human mortality data reveals informative time dynamics in mortality rates.
基金funding this work through the Research Groups Program under Grant Number R.G.P.1/189/41.I.M.A.and M.K.A.received the grant.
文摘The problem of predicting continuous scalar outcomes from functional predictors has received high levels of interest in recent years in many fields,especially in the food industry.The k-nearest neighbor(k-NN)method of Near-Infrared Reflectance(NIR)analysis is practical,relatively easy to implement,and becoming one of the most popular methods for conducting food quality based on NIR data.The k-NN is often named k nearest neighbor classifier when it is used for classifying categorical variables,while it is called k-nearest neighbor regression when it is applied for predicting noncategorical variables.The objective of this paper is to use the functional Near-Infrared Reflectance(NIR)spectroscopy approach to predict some chemical components with some modern statistical models based on the kernel and k-Nearest Neighbour procedures.In this paper,three NIR spectroscopy datasets are used as examples,namely Cookie dough,sugar,and tecator data.Specifically,we propose three models for this kind of data which are Functional Nonparametric Regression,Functional Robust Regression,and Functional Relative Error Regression,with both kernel and k-NN approaches to compare between them.The experimental result shows the higher efficiency of k-NN predictor over the kernel predictor.The predictive power of the k-NN method was compared with that of the kernel method,and several real data sets were used to determine the predictive power of both methods.
基金Supported by the National Natural Science Foundation of China(Grant Nos.11671096,11690013,11731011 and 12071267)the Natural Science Foundation of Shanxi Province,China(Grant No.201901D111279)。
文摘In this paper,we consider composite quantile regression for partial functional linear regression model with polynomial spline approximation.Under some mild conditions,the convergence rates of the estimators and mean squared prediction error,and asymptotic normality of parameter vector are obtained.Simulation studies demonstrate that the proposed new estimation method is robust and works much better than the least-squares based method when there are outliers in the dataset or the random error follows heavy-tailed distributions.Finally,we apply the proposed methodology to a spectroscopic data sets to illustrate its usefulness in practice.
基金This research was financially supported by the Natural Science Foundation of China(Nos.71420107025,11701023).
文摘The increasing richness of data encourages a comprehensive understanding of economic and financial activities,where variables of interest may include not only scalar(point-like)indicators,but also functional(curve-like)and compositional(pie-like)ones.In many research topics,the variables are also chronologically collected across individuals,which falls into the paradigm of longitudinal analysis.The complicated nature of data,however,increases the difficulty of modeling these variables under the classic longitudinal frame-work.In this study,we investigate the linear mixed-effects model(LMM)for such complex data.Different types of variables arefirst consistently represented using the corresponding basis expansions so that the classic LMM can then be conducted on them,which gener-alizes the theoretical framework of LMM to complex data analysis.A number of simulation studies indicate the feasibility and effectiveness of the proposed model.We further illustrate its practical utility in a real data study on Chinese stock market and show that the proposed method can enhance the performance and interpretability of the regression for complex data with diversified characteristics.
基金supported by the National Natural Science Foundation of China under Grant Nos.11671096,11690013,11731011。
文摘This paper presents a robust estimation procedure by using modal regression for the partial functional linear regression,which combines the common linear model with the functional linear regression model.The outstanding merit of the new method is that it is robust against outliers or heavy-tail error distributions while performs no worse than the least-square-based estimation method for normal error cases.The slope function is fitted by B-spline.Under suitable conditions,the authors obtain the convergence rates and asymptotic normality of the estimators.Finally,simulation studies and a real data example are conducted to examine the finite sample performance of the proposed method.Both the simulation results and the real data analysis confirm that the newly proposed method works very well.
基金supported by the National Natural Science Foundation of China under Grant No.11771032。
文摘Currently,working with partially observed functional data has attracted a greatly increasing attention,since there are many applications in which each functional curve may be observed only on a subset of a common domain,and the incompleteness makes most existing methods for functional data analysis ineffective.In this paper,motivated by the appealing characteristics of conditional quantile regression,the authors consider the functional linear quantile regression,assuming the explanatory functions are observed partially on dense but discrete point grids of some random subintervals of the domain.A functional principal component analysis(FPCA)based estimator is proposed for the slope function,and the convergence rate of the estimator is investigated.In addition,the finite sample performance of the proposed estimator is evaluated through simulation studies and a real data application.