In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-di...In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.展开更多
基金supported by Natural Science Foundation of USA (Grant Nos. DMS1206464 and DMS1613338)National Institutes of Health of USA (Grant Nos. R01GM072611, R01GM100474 and R01GM120507)
文摘In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.