本文研究分位数回归的组变量选择问题。基于分位数回归和贝叶斯统计推断方法,通过引入系数的组“spike and slab”先验分布,提出了分位数回归的贝叶斯组变量选择方法,并给出易于实施的Gibbs后验抽样算法。进一步,本文还将所建立的贝叶...本文研究分位数回归的组变量选择问题。基于分位数回归和贝叶斯统计推断方法,通过引入系数的组“spike and slab”先验分布,提出了分位数回归的贝叶斯组变量选择方法,并给出易于实施的Gibbs后验抽样算法。进一步,本文还将所建立的贝叶斯组变量选择方法应用到变点检测中,变点的数量和位置的探测准确率较高。数值模拟和两个实例分析验证了所提方法的有效性。展开更多
The multiple determination tasks of chemical properties are a classical problem in analytical chemistry. The major problem is concerned in to find the best subset of variables that better represents the compounds. The...The multiple determination tasks of chemical properties are a classical problem in analytical chemistry. The major problem is concerned in to find the best subset of variables that better represents the compounds. These variables are obtained by a spectrophotometer device. This device measures hundreds of correlated variables related with physicocbemical properties and that can be used to estimate the component of interest. The problem is the selection of a subset of informative and uncorrelated variables that help the minimization of prediction error. Classical algorithms select a subset of variables for each compound considered. In this work we propose the use of the SPEA-II (strength Pareto evolutionary algorithm II). We would like to show that the variable selection algorithm can selected just one subset used for multiple determinations using multiple linear regressions. For the case study is used wheat data obtained by NIR (near-infrared spectroscopy) spectrometry where the objective is the determination of a variable subgroup with information about E protein content (%), test weight (Kg/HI), WKT (wheat kernel texture) (%) and farinograph water absorption (%). The results of traditional techniques of multivariate calibration as the SPA (successive projections algorithm), PLS (partial least square) and mono-objective genetic algorithm are presents for comparisons. For NIR spectral analysis of protein concentration on wheat, the number of variables selected from 775 spectral variables was reduced for just 10 in the SPEA-II algorithm. The prediction error decreased from 0.2 in the classical methods to 0.09 in proposed approach, a reduction of 37%. The model using variables selected by SPEA-II had better prediction performance than classical algorithms and full-spectrum partial least-squares.展开更多
The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensio...The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensional case. We develop a concave group selection approach for this problem using basis function expansion and study its theoretical and empirical properties. We also apply the group Lasso for variable selection and estimation in this model and study its properties. Under appropriate conditions, we show that the group least absolute shrinkage and selection operator (Lasso) selects a model whose dimension is comparable to the underlying mode], regardless of the large number of unimportant variables. In order to improve the selection results, we show that the group minimax concave penalty (MCP) has the oracle selection property in the sense that it correctly selects important variables with probability converging to one under suitable conditions. By comparison, the group Lasso does not have the oracle selection property. In the simulation parts, we apply the group Lasso and the group MCP. At the same time, the two approaches are evaluated using simulation and demonstrated on a data example.展开更多
文摘本文研究分位数回归的组变量选择问题。基于分位数回归和贝叶斯统计推断方法,通过引入系数的组“spike and slab”先验分布,提出了分位数回归的贝叶斯组变量选择方法,并给出易于实施的Gibbs后验抽样算法。进一步,本文还将所建立的贝叶斯组变量选择方法应用到变点检测中,变点的数量和位置的探测准确率较高。数值模拟和两个实例分析验证了所提方法的有效性。
文摘The multiple determination tasks of chemical properties are a classical problem in analytical chemistry. The major problem is concerned in to find the best subset of variables that better represents the compounds. These variables are obtained by a spectrophotometer device. This device measures hundreds of correlated variables related with physicocbemical properties and that can be used to estimate the component of interest. The problem is the selection of a subset of informative and uncorrelated variables that help the minimization of prediction error. Classical algorithms select a subset of variables for each compound considered. In this work we propose the use of the SPEA-II (strength Pareto evolutionary algorithm II). We would like to show that the variable selection algorithm can selected just one subset used for multiple determinations using multiple linear regressions. For the case study is used wheat data obtained by NIR (near-infrared spectroscopy) spectrometry where the objective is the determination of a variable subgroup with information about E protein content (%), test weight (Kg/HI), WKT (wheat kernel texture) (%) and farinograph water absorption (%). The results of traditional techniques of multivariate calibration as the SPA (successive projections algorithm), PLS (partial least square) and mono-objective genetic algorithm are presents for comparisons. For NIR spectral analysis of protein concentration on wheat, the number of variables selected from 775 spectral variables was reduced for just 10 in the SPEA-II algorithm. The prediction error decreased from 0.2 in the classical methods to 0.09 in proposed approach, a reduction of 37%. The model using variables selected by SPEA-II had better prediction performance than classical algorithms and full-spectrum partial least-squares.
基金supported by National Natural Science Foundation of China(GrantNos.71271128 and 11101442)the State Key Program of National Natural Science Foundation of China(GrantNo.71331006)+2 种基金National Center for Mathematics and Interdisciplinary Sciences(NCMIS)Shanghai Leading Academic Discipline Project A,in Ranking Top of Shanghai University of Finance and Economics(IRTSHUFE)Scientific Research Innovation Fund for PhD Studies(Grant No.CXJJ-2011-434)
文摘The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensional case. We develop a concave group selection approach for this problem using basis function expansion and study its theoretical and empirical properties. We also apply the group Lasso for variable selection and estimation in this model and study its properties. Under appropriate conditions, we show that the group least absolute shrinkage and selection operator (Lasso) selects a model whose dimension is comparable to the underlying mode], regardless of the large number of unimportant variables. In order to improve the selection results, we show that the group minimax concave penalty (MCP) has the oracle selection property in the sense that it correctly selects important variables with probability converging to one under suitable conditions. By comparison, the group Lasso does not have the oracle selection property. In the simulation parts, we apply the group Lasso and the group MCP. At the same time, the two approaches are evaluated using simulation and demonstrated on a data example.