Model average receives much attention in recent years.This paper considers the semiparametric model averaging for high-dimensional longitudinal data.To minimize the prediction error,the authors estimate the model weig...Model average receives much attention in recent years.This paper considers the semiparametric model averaging for high-dimensional longitudinal data.To minimize the prediction error,the authors estimate the model weights using a leave-subject-out cross-validation procedure.Asymptotic optimality of the proposed method is proved in the sense that leave-subject-out cross-validation achieves the lowest possible prediction loss asymptotically.Simulation studies show that the performance of the proposed model average method is much better than that of some commonly used model selection and averaging methods.展开更多
High-dimensional longitudinal data arise frequently in biomedical and genomic research. It is important to select relevant covariates when the dimension of the parameters diverges as the sample size increases. We cons...High-dimensional longitudinal data arise frequently in biomedical and genomic research. It is important to select relevant covariates when the dimension of the parameters diverges as the sample size increases. We consider the problem of variable selection in high-dimensional linear models with longitudinal data. A new variable selection procedure is proposed using the smooth-threshold generalized estimating equation and quadratic inference functions (SGEE-QIF) to incorporate correlation information. The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero, and simultaneously estimates the nonzero regression coefficients by solving the SGEE-QIF. The proposed procedure avoids the convex optimization problem and is flexible and easy to implement. We establish the asymptotic properties in a high-dimensional framework where the number of covariates increases as the number of cluster increases. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure.展开更多
High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation lear...High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation learning to an HDI matrix,whose hyper-parameter adaptation can be implemented through a particle swarm optimizer(PSO) to meet scalable requirements.However, conventional PSO is limited by its premature issues,which leads to the accuracy loss of a resultant LFA model. To address this thorny issue, this study merges the information of each particle's state migration into its evolution process following the principle of a generalized momentum method for improving its search ability, thereby building a state-migration particle swarm optimizer(SPSO), whose theoretical convergence is rigorously proved in this study. It is then incorporated into an LFA model for implementing efficient hyper-parameter adaptation without accuracy loss. Experiments on six HDI matrices indicate that an SPSO-incorporated LFA model outperforms state-of-the-art LFA models in terms of prediction accuracy for missing data of an HDI matrix with competitive computational efficiency.Hence, SPSO's use ensures efficient and reliable hyper-parameter adaptation in an LFA model, thus ensuring practicality and accurate representation learning for HDI matrices.展开更多
In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all usef...In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated.展开更多
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be est...In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be estimated simultaneously by the proposed method while the feature of longitudinal data is considered. The existence, strong consistency and asymptotic normality of the estimators are proved under suitable conditions. A simulation study is conducted to investigate the finite sample performance of the proposed method. Our approach can also be used to study the pure single-index model for longitudinal data.展开更多
In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. O...In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. One is the individual test and power calculation for varying dispersion through testing the randomness of cluster effects, which is extensions of Dean(1992) and Commenges et al (1994). The second test is the composite test for varying dispersion through simultaneously testing the randomness of cluster effects and the equality of random-effect means. The score test statistics are constructed and expressed in simple, easy to use, matrix formulas. The authors illustrate their test methods using the insecticide data (Giltinan, Capizzi & Malani (1988)).展开更多
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities...The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.展开更多
This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contra...This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.展开更多
Local arterials can be significantly impacted by diversions from adjacent work zones. These diversions often occur on unofficial detour routes due to guidance received on personal navigation devices. Often, these rout...Local arterials can be significantly impacted by diversions from adjacent work zones. These diversions often occur on unofficial detour routes due to guidance received on personal navigation devices. Often, these routes do not have sufficien<span style="font-family:Verdana;">t sensing or communication equipment to obtain infrastructure-based tra</span><span style="font-family:Verdana;">ffic signal performance measures, so other data sources are required to identify locations being significantly affected by diversions. This paper examines the network impact caused by the start of an 18-month closure of the I-65/70 interchange (North Split), which usually serves approximately 214,000 vehicles per day in Indianapolis, IN. In anticipation of some proportion of the public diverting from official detour routes to local streets, a connected vehicle monitoring program was established to provide daily performances measures for over 100 intersections in the area without the need for vehicle sensing equipment. This study reports on 13 of the most impacted signals on an alternative arterial to identify locations and time of day where operations are most degraded, so that decision makers have quantitative information to make informed adjustments to the system. Individual vehicle movements at the studied locations are analyzed to estimate changes in volume, split failures, downstream blockage, arrivals on green, and travel times. Over 130,000 trajectories were analyzed in an 11-week period. Weekly afternoon peak period volumes increased by approximately 455%, split failures increased 3%, downstream blockage increased 10%, arrivals on green decreased 16%, and travel time increase 74%. The analysis performed in this paper will serve as a framework for any agency that wants to assess traffic signal performance at hundreds of locations with little or no existing sensing or communication infrastructure to prioritize tactical retiming and/or longer-term infrastructure investments.</span>展开更多
In this paper we reparameterize covariance structures in longitudinal data analysis through the modified Cholesky decomposition of itself. Based on this modified Cholesky decomposition, the within-subject covariance m...In this paper we reparameterize covariance structures in longitudinal data analysis through the modified Cholesky decomposition of itself. Based on this modified Cholesky decomposition, the within-subject covariance matrix is decomposed into a unit lower triangular matrix involving moving average coefficients and a diagonal matrix involving innovation variances, which are modeled as linear functions of covariates. Then, we propose a penalized maximum likelihood method for variable selection in joint mean and covariance models based on this decomposition. Under certain regularity conditions, we establish the consistency and asymptotic normality of the penalized maximum likelihood estimators of parameters in the models. Simulation studies are undertaken to assess the finite sample performance of the proposed variable selection procedure.展开更多
The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence.Under the sui...The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence.Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.展开更多
For the regression model about longitudinal data, we combine the robust estimation equation with the elemental empirical likelihood method, and propose an efficient robust estimator, where the robust estimation equati...For the regression model about longitudinal data, we combine the robust estimation equation with the elemental empirical likelihood method, and propose an efficient robust estimator, where the robust estimation equation is based on bounded scoring function and the covariate depended weight function. This method reduces the influence of outliers in response variables and covariates on parameter estimation, takes into account the correlation between data, and improves the efficiency of estimation. The simulation results show that the proposed method is robust and efficient.展开更多
In longitudinal data analysis, our primary interest is in the estimation of regression parameters for the marginal expectations of the longitudinal responses, and the longitudinal correlation parameters are of seconda...In longitudinal data analysis, our primary interest is in the estimation of regression parameters for the marginal expectations of the longitudinal responses, and the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly due to correlated responses. Marginal models, such as generalized estimating equations (GEEs), have received much attention based on the assumption of the first two moments of the data and a working correlation structure. The confidence regions and hypothesis tests are constructed based on the asymptotic normality. This approach is sensitive to the misspecification of the variance function and the working correlation structure which may yield inefficient and inconsistent estimates leading to wrong conclusions. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its <span style="font-family:Verdana;">characteristics and asymptotic properties. We also provide an algorithm base</span><span style="font-family:Verdana;">d on EL principles for the estimation of the regression parameters and the construction of its confidence region. We have applied the proposed method in two case examples.</span>展开更多
Logic regression is an adaptive regression method which searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome, and thus, it reveals interaction effects which ar...Logic regression is an adaptive regression method which searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome, and thus, it reveals interaction effects which are associated with the response. In this study, we extended logic regression to longitudinal data with binary response and proposed “Transition Logic Regression Method” to find interactions related to response. In this method, interaction effects over time were found by Annealing Algorithm with AIC (Akaike Information Criterion) as the score function of the model. Also, first and second orders Markov dependence were allowed to capture the correlation among successive observations of the same individual in longitudinal binary response. Performance of the method was evaluated with simulation study in various conditions. Proposed method was used to find interactions of SNPs and other risk factors related to low HDL over time in data of 329 participants of longitudinal TLGS study.展开更多
In this article, robust generalized estimating equation for the analysis of partial linear mixed model for longitudinal data is used. The authors approximate the nonparametric function by a regression spline. Under so...In this article, robust generalized estimating equation for the analysis of partial linear mixed model for longitudinal data is used. The authors approximate the nonparametric function by a regression spline. Under some regular conditions, the asymptotic properties of the estimators are obtained. To avoid the computation of high-dimensional integral, a robust Monte Carlo Newton-Raphson algorithm is used. Some simulations are carried out to study the performance of the proposed robust estimators. In addition, the authors also study the robustness and the efficiency of the proposed estimators by simulation. Finally, two real longitudinal data sets are analyzed.展开更多
Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Method...Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Methods: Generalized estimating equations (GEE) proposed by Zeger and Liang was used. For sevens covariance structures, one method was given for estimating regression and correlation parameters. Results: Regression and coerelation parameters were estimated simultaneously. A Set of program was finished and an example was illustrated. Conclusion: Longitudinal dsta often occur in medical researches and clinical trials. For solving the problem of correlation between repeated measures, it is necessary to use some special methods to cope with this Kind of data.展开更多
We consider the problem of variable selection for the single-index random effects models with longitudinal data. An automatic variable selection procedure is developed using smooth-threshold. The proposed method share...We consider the problem of variable selection for the single-index random effects models with longitudinal data. An automatic variable selection procedure is developed using smooth-threshold. The proposed method shares some of the desired features of existing variable selection methods: the resulting estimator enjoys the oracle property;the proposed procedure avoids the convex optimization problem and is flexible and easy to implement. Moreover, we use the penalized weighted deviance criterion for a data-driven choice of the tuning parameters. Simulation studies are carried out to assess the performance of our method, and a real dataset is analyzed for further illustration.展开更多
In this article, we propose a generalized empirical likelihood inference for the parametric component in semiparametric generalized partially linear models with longitudinal data. Based on the extended score vector, a...In this article, we propose a generalized empirical likelihood inference for the parametric component in semiparametric generalized partially linear models with longitudinal data. Based on the extended score vector, a generalized empirical likelihood ratios function is defined, which integrates the within-cluster?correlation meanwhile avoids direct estimating the nuisance parameters in the correlation matrix. We show that the proposed statistics are asymptotically?Chi-squared under some suitable conditions, and hence it can be used to construct the confidence region of parameters. In addition, the maximum empirical likelihood estimates of parameters and the corresponding asymptotic normality are obtained. Simulation studies demonstrate the performance of the proposed method.展开更多
When longitudinal data contains outliers, the classical least-squares approach is known to be not robust. To solve this issue, the exponential squared loss (ESL) function with a tuning parameter has been investigated ...When longitudinal data contains outliers, the classical least-squares approach is known to be not robust. To solve this issue, the exponential squared loss (ESL) function with a tuning parameter has been investigated for longitudinal data. However, to our knowledge, there is no paper to investigate the robust estimation procedure against outliers within the framework of mean-covariance regression analysis for longitudinal data using the ESL function. In this paper, we propose a robust estimation approach for the model parameters of the mean and generalized autoregressive parameters with longitudinal data based on the ESL function. The proposed estimators can be shown to be asymptotically normal under certain conditions. Moreover, we develop an iteratively reweighted least squares (IRLS) algorithm to calculate the parameter estimates, and the balance between the robustness and efficiency can be achieved by choosing appropriate data adaptive tuning parameters. Simulation studies and real data analysis are carried out to illustrate the finite sample performance of the proposed approach.展开更多
基金the Ministry of Science and Technology of China under Grant No.2016YFB0502301Academy for Multidisciplinary Studies of Capital Normal University,and the National Natural Science Foundation of China under Grant Nos.11971323 and 11529101。
文摘Model average receives much attention in recent years.This paper considers the semiparametric model averaging for high-dimensional longitudinal data.To minimize the prediction error,the authors estimate the model weights using a leave-subject-out cross-validation procedure.Asymptotic optimality of the proposed method is proved in the sense that leave-subject-out cross-validation achieves the lowest possible prediction loss asymptotically.Simulation studies show that the performance of the proposed model average method is much better than that of some commonly used model selection and averaging methods.
文摘High-dimensional longitudinal data arise frequently in biomedical and genomic research. It is important to select relevant covariates when the dimension of the parameters diverges as the sample size increases. We consider the problem of variable selection in high-dimensional linear models with longitudinal data. A new variable selection procedure is proposed using the smooth-threshold generalized estimating equation and quadratic inference functions (SGEE-QIF) to incorporate correlation information. The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero, and simultaneously estimates the nonzero regression coefficients by solving the SGEE-QIF. The proposed procedure avoids the convex optimization problem and is flexible and easy to implement. We establish the asymptotic properties in a high-dimensional framework where the number of covariates increases as the number of cluster increases. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure.
基金supported in part by the National Natural Science Foundation of China (62372385, 62272078, 62002337)the Chongqing Natural Science Foundation (CSTB2022NSCQ-MSX1486, CSTB2023NSCQ-LZX0069)the Deanship of Scientific Research at King Abdulaziz University, Jeddah, Saudi Arabia (RG-12-135-43)。
文摘High-dimensional and incomplete(HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis(LFA) model is capable of conducting efficient representation learning to an HDI matrix,whose hyper-parameter adaptation can be implemented through a particle swarm optimizer(PSO) to meet scalable requirements.However, conventional PSO is limited by its premature issues,which leads to the accuracy loss of a resultant LFA model. To address this thorny issue, this study merges the information of each particle's state migration into its evolution process following the principle of a generalized momentum method for improving its search ability, thereby building a state-migration particle swarm optimizer(SPSO), whose theoretical convergence is rigorously proved in this study. It is then incorporated into an LFA model for implementing efficient hyper-parameter adaptation without accuracy loss. Experiments on six HDI matrices indicate that an SPSO-incorporated LFA model outperforms state-of-the-art LFA models in terms of prediction accuracy for missing data of an HDI matrix with competitive computational efficiency.Hence, SPSO's use ensures efficient and reliable hyper-parameter adaptation in an LFA model, thus ensuring practicality and accurate representation learning for HDI matrices.
基金Outstanding Youth Foundation of Hunan Provincial Department of Education(Grant No.22B0911)。
文摘In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated.
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
基金Supported by the National Natural Science Foundation of China (10571008)the Natural Science Foundation of Henan (092300410149)the Core Teacher Foundationof Henan (2006141)
文摘In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be estimated simultaneously by the proposed method while the feature of longitudinal data is considered. The existence, strong consistency and asymptotic normality of the estimators are proved under suitable conditions. A simulation study is conducted to investigate the finite sample performance of the proposed method. Our approach can also be used to study the pure single-index model for longitudinal data.
基金The project supported by NNSFC (19631040), NSSFC (04BTJ002) and the grant for post-doctor fellows in SELF.
文摘In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. One is the individual test and power calculation for varying dispersion through testing the randomness of cluster effects, which is extensions of Dean(1992) and Commenges et al (1994). The second test is the composite test for varying dispersion through simultaneously testing the randomness of cluster effects and the equality of random-effect means. The score test statistics are constructed and expressed in simple, easy to use, matrix formulas. The authors illustrate their test methods using the insecticide data (Giltinan, Capizzi & Malani (1988)).
基金Supported by the National Natural Science Foundation of China(No.61502475)the Importation and Development of High-Caliber Talents Project of the Beijing Municipal Institutions(No.CIT&TCD201504039)
文摘The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.
基金supported by the Agence Nationale de la Recherche(ANR)(contract“ANR-17-EURE-0002”)by the Region of Bourgogne Franche-ComtéCADRAN Projectsupported by the European Research Council(ERC)project HYPATIA under the European Union's Horizon 2020 research and innovation programme.Grant agreement n.835294。
文摘This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
文摘Local arterials can be significantly impacted by diversions from adjacent work zones. These diversions often occur on unofficial detour routes due to guidance received on personal navigation devices. Often, these routes do not have sufficien<span style="font-family:Verdana;">t sensing or communication equipment to obtain infrastructure-based tra</span><span style="font-family:Verdana;">ffic signal performance measures, so other data sources are required to identify locations being significantly affected by diversions. This paper examines the network impact caused by the start of an 18-month closure of the I-65/70 interchange (North Split), which usually serves approximately 214,000 vehicles per day in Indianapolis, IN. In anticipation of some proportion of the public diverting from official detour routes to local streets, a connected vehicle monitoring program was established to provide daily performances measures for over 100 intersections in the area without the need for vehicle sensing equipment. This study reports on 13 of the most impacted signals on an alternative arterial to identify locations and time of day where operations are most degraded, so that decision makers have quantitative information to make informed adjustments to the system. Individual vehicle movements at the studied locations are analyzed to estimate changes in volume, split failures, downstream blockage, arrivals on green, and travel times. Over 130,000 trajectories were analyzed in an 11-week period. Weekly afternoon peak period volumes increased by approximately 455%, split failures increased 3%, downstream blockage increased 10%, arrivals on green decreased 16%, and travel time increase 74%. The analysis performed in this paper will serve as a framework for any agency that wants to assess traffic signal performance at hundreds of locations with little or no existing sensing or communication infrastructure to prioritize tactical retiming and/or longer-term infrastructure investments.</span>
文摘In this paper we reparameterize covariance structures in longitudinal data analysis through the modified Cholesky decomposition of itself. Based on this modified Cholesky decomposition, the within-subject covariance matrix is decomposed into a unit lower triangular matrix involving moving average coefficients and a diagonal matrix involving innovation variances, which are modeled as linear functions of covariates. Then, we propose a penalized maximum likelihood method for variable selection in joint mean and covariance models based on this decomposition. Under certain regularity conditions, we establish the consistency and asymptotic normality of the penalized maximum likelihood estimators of parameters in the models. Simulation studies are undertaken to assess the finite sample performance of the proposed variable selection procedure.
文摘The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence.Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.
文摘For the regression model about longitudinal data, we combine the robust estimation equation with the elemental empirical likelihood method, and propose an efficient robust estimator, where the robust estimation equation is based on bounded scoring function and the covariate depended weight function. This method reduces the influence of outliers in response variables and covariates on parameter estimation, takes into account the correlation between data, and improves the efficiency of estimation. The simulation results show that the proposed method is robust and efficient.
文摘In longitudinal data analysis, our primary interest is in the estimation of regression parameters for the marginal expectations of the longitudinal responses, and the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly due to correlated responses. Marginal models, such as generalized estimating equations (GEEs), have received much attention based on the assumption of the first two moments of the data and a working correlation structure. The confidence regions and hypothesis tests are constructed based on the asymptotic normality. This approach is sensitive to the misspecification of the variance function and the working correlation structure which may yield inefficient and inconsistent estimates leading to wrong conclusions. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its <span style="font-family:Verdana;">characteristics and asymptotic properties. We also provide an algorithm base</span><span style="font-family:Verdana;">d on EL principles for the estimation of the regression parameters and the construction of its confidence region. We have applied the proposed method in two case examples.</span>
文摘Logic regression is an adaptive regression method which searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome, and thus, it reveals interaction effects which are associated with the response. In this study, we extended logic regression to longitudinal data with binary response and proposed “Transition Logic Regression Method” to find interactions related to response. In this method, interaction effects over time were found by Annealing Algorithm with AIC (Akaike Information Criterion) as the score function of the model. Also, first and second orders Markov dependence were allowed to capture the correlation among successive observations of the same individual in longitudinal binary response. Performance of the method was evaluated with simulation study in various conditions. Proposed method was used to find interactions of SNPs and other risk factors related to low HDL over time in data of 329 participants of longitudinal TLGS study.
基金the Natural Science Foundation of China(10371042,10671038)
文摘In this article, robust generalized estimating equation for the analysis of partial linear mixed model for longitudinal data is used. The authors approximate the nonparametric function by a regression spline. Under some regular conditions, the asymptotic properties of the estimators are obtained. To avoid the computation of high-dimensional integral, a robust Monte Carlo Newton-Raphson algorithm is used. Some simulations are carried out to study the performance of the proposed robust estimators. In addition, the authors also study the robustness and the efficiency of the proposed estimators by simulation. Finally, two real longitudinal data sets are analyzed.
文摘Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Methods: Generalized estimating equations (GEE) proposed by Zeger and Liang was used. For sevens covariance structures, one method was given for estimating regression and correlation parameters. Results: Regression and coerelation parameters were estimated simultaneously. A Set of program was finished and an example was illustrated. Conclusion: Longitudinal dsta often occur in medical researches and clinical trials. For solving the problem of correlation between repeated measures, it is necessary to use some special methods to cope with this Kind of data.
文摘We consider the problem of variable selection for the single-index random effects models with longitudinal data. An automatic variable selection procedure is developed using smooth-threshold. The proposed method shares some of the desired features of existing variable selection methods: the resulting estimator enjoys the oracle property;the proposed procedure avoids the convex optimization problem and is flexible and easy to implement. Moreover, we use the penalized weighted deviance criterion for a data-driven choice of the tuning parameters. Simulation studies are carried out to assess the performance of our method, and a real dataset is analyzed for further illustration.
文摘In this article, we propose a generalized empirical likelihood inference for the parametric component in semiparametric generalized partially linear models with longitudinal data. Based on the extended score vector, a generalized empirical likelihood ratios function is defined, which integrates the within-cluster?correlation meanwhile avoids direct estimating the nuisance parameters in the correlation matrix. We show that the proposed statistics are asymptotically?Chi-squared under some suitable conditions, and hence it can be used to construct the confidence region of parameters. In addition, the maximum empirical likelihood estimates of parameters and the corresponding asymptotic normality are obtained. Simulation studies demonstrate the performance of the proposed method.
文摘When longitudinal data contains outliers, the classical least-squares approach is known to be not robust. To solve this issue, the exponential squared loss (ESL) function with a tuning parameter has been investigated for longitudinal data. However, to our knowledge, there is no paper to investigate the robust estimation procedure against outliers within the framework of mean-covariance regression analysis for longitudinal data using the ESL function. In this paper, we propose a robust estimation approach for the model parameters of the mean and generalized autoregressive parameters with longitudinal data based on the ESL function. The proposed estimators can be shown to be asymptotically normal under certain conditions. Moreover, we develop an iteratively reweighted least squares (IRLS) algorithm to calculate the parameter estimates, and the balance between the robustness and efficiency can be achieved by choosing appropriate data adaptive tuning parameters. Simulation studies and real data analysis are carried out to illustrate the finite sample performance of the proposed approach.