摘要
主成分分析是数据压缩和特征提取的非常有效的统计方法.在经典的主成分分析中,每个训练数据在构建主成分时的作用是相同的.然而,在许多实际问题中,每个训练数据的意义和作用是不同的,对于重要的数据我们应给予充分的重视,而对于不可信数据(可能是异常数据)应限制其作用.文中给每个训练数据赋予一个置信权重,将训练数据视为样本空间的模糊点,研究了基于模糊点数据的主成分分析.数值实验表明,该方法能够有效控制异常点对主成分的影响,同时,该方法也为数据先验信息的利用提供了一个可行的途径.
Principal component analysis(PCA) is an effective statistical method for data compression and feature extraction. In classical PCA, all training data are treated equally in constructing principal components. However, the significance and effect of each training data are different in many applications. We should pay more attention to the important training data and restrict the effect of the unbelievable data (they may be outliers), In this paper,we apply a confidence weight to each training data, and consider training data as fuzzy points in sample space, and work over PCA based on fuzzy points data. An experiment on simulated data shows that our method can control possible outliers effectively. Meanwhile,our method provides a feasible way for using prior information.
出处
《甘肃联合大学学报(自然科学版)》
2009年第5期5-8,12,共5页
Journal of Gansu Lianhe University :Natural Sciences
关键词
主成分分析
模糊点数据
主轴
principal component analysis
fuzzy points data
principal axes.