摘要
本文目的是介绍与广义主成分分析有关的基本概念、计算方法、两个实例以及SAS实现。基本概念包括合成资料、拟合成资料、部分合成资料和广义主成分分析;计算方法涉及对数中心化,构造协方差矩阵S以及求矩阵S的特征值和特征向量;两个实例分别是“某矿石中5种成分的含量”和“1993年我国30个地区农民家庭消费资料”。借助SAS对两个实例中的定量资料进行广义主成分分析,只需要1个主成分就可包含多个原变量所包含的85%以上的信息,取得了很好的降维效果;在例2中,基于广义主成分的计算结果,还实现了对地区的排序和初步分档。
The purpose of this paper was to introduce the basic concepts,calculation methods,two examples and SAS implementation related to generalized principal component analysis.Basic concepts included synthetic data,quasi-synthetic data,partial synthetic data and generalized principal component analysis.The calculation method involved logarithmic centralization,constructing the covariance matrix S and finding the eigenvalues and eigenvectors of the matrix S.The data involved in the two examples were percentage content of 5 components in a certain ore and household consumption data of farmers in 30 regions of China in 1993.With the help of SAS software,generalized principal component analysis was carried out on the quantitative data in the two examples.Only one principal component was needed to contain more than 85%of the information contained in multiple original variables,and a good dimensionality reduction effect had been achieved.In example 2,based on the calculation results of generalized principal components,the sorting and preliminary classification of regions were also realized.
作者
胡纯严
胡良平
Hu Chunyan;Hu Liangping(Graduate School,Academy of Military Sciences PLA China,Beijing 100850,China;Specialty Committee of Clinical Scientific Research Statistics of World Federation of Chinese Medicine Societies,Beijing 100029,China)
出处
《四川精神卫生》
2023年第S01期55-60,共6页
Sichuan Mental Health
关键词
合成资料
对数中心化
协方差矩阵
样品排序
广义主成分
Synthetic data
Logarithmic centralization
Covariance matrix
Sample sorting
Generalized principal component