摘要
探讨充分降维算法SIR、SAVE、CP-SAVE的适用范围,从两种角度对充分降维算法进行稳健改进:构建SIR与SAVE混合算法,从而融合二者优点,以适应更广数据类型与连接函数;当观测数据受污染时,利用软修剪方法估计的稳健均值、协方差代替传统估计,构建稳健充分降维算法.数值实验显示:在连接函数关于自变量均值对称时,一阶算法SIR的降维效果较差,但它对自变量分布、切片数较稳健;相比SIR,二阶算法SAVE、CP-SAVE的要求更苛刻,对切片数、自变量分布都敏感,但可找到SIR探索不到的方向;当自变量为厚尾分布时,CP-SAVE通常优于SAVE;SIR与SAVE混合算法对自变量分布、连接函数的适应性更好,在多种场合下可改进降维效果;软修剪稳健估计对截断参数稳健,建议截断参数略大于异常点比例;相对稳健SAVE,稳健SIR只需要在切片内估计稳健均值,适应条件宽松,更符合实际,推荐优先使用.
This paper discusses the application scope of suficient dimension reduction algorithms SIR,SAVE,and CP-SAVE,and makes robust improvements to suffcient dimension reduction algorithms from two perspectives:First,a hybrid algorithm of SIR and SAVE is constructed to integrate the advantages of both to adapt to a wider range of data types and connection functions;Second,when the observation data is polluted,the robust mean and covariance estimated by soft pruning method are used to replace their traditional estimators to construct a robust sufficient dimension reduction algorithm.Numerical experiments show that:The first-order algorithm SIR has a poor dimension reduction effect,when the connection function is symmetric about the mean of the independent variable,but it is relatively robust to the distribution of the independent variable and the number of slices;Compared with SIR,the second-order algorithms SAVE and CP-SAVE have more stringent requirements and are sensitive to the number of slices and the distribution of independent variables,but they can find the directions that SIR cannot explore;When the independent variable follows a heavy tail distribution,CP-SAVE is usually better than SAVE;The hybrid algorithm of SIR and SAVE has better adaptability to the distribution of independent variables and connection functions,and can improve the dimension reduction effect in various situations;The soft pruning robust estimation is robust to the truncation parameter,and it is suggested that the truncation parameter is slightly larger than the proportion of outliers;Compared with robust SAVE,robust SIR is recommended first since it only needs to estimate the robust mean in the slice and the adaptive conditions are loose and more practical.
作者
王丙参
魏艳华
张宝学
WANG Bingcan;WEI Yanhua;ZHANG Baoxue(School of Statistics,Capital University of Economics and Business,Beijing 100070)
出处
《系统科学与数学》
CSCD
北大核心
2023年第9期2388-2403,共16页
Journal of Systems Science and Mathematical Sciences
基金
国家自然科学基金(12071308)资助课题。
关键词
充分降维
切片逆回归
切片平均方差估计
混合算法
稳健估计
Sufficient dimension reduction
slice inverse regression
sliced average variance estimation
hybrid algorithm
robust estimation.