摘要
线性判别分析尽管在许多实际应用中表现良好,但是它在处理含有缺失数据的高维数据集时,效果却很不理想。这一方面是由于线性判别分析方法无法准确地预测或填充缺失数据,另一方面是由于在高维情况下,线性判别分析使用的样本协方差矩阵不再是总体协方差矩阵的一个良好估计。因此导致计算出的判别函数值产生很大的偏差。基于随机矩阵理论,采用总体协方差矩阵的Lasso估计,提出了一种处理高维缺失数据集的线性判别分析改进方法。在多种人造及真实数据集上的仿真结果表明,所提方法的分类正确率优于其他同类算法。
Although it performs well in many applications,Linear discriminant analysis(LDA)is impractical for high-dimensional datasets with missing observations.One of the reasons for it is that most of the classification methods cannot predict or impute the missing values correctly,the other reason is that the sample covariance matrix used in LDA is no longer a good estimator of the population covariance matrix in high dimensions.Therefore,there will be a relatively large deviation for the discriminant function values.Based on the results from random matrix theory and by exploiting a Lasso estimator of the population covariance matrix,an improved LDA classifier for high-dimensional dataset with missing observations is proposed.Simulation results show that our proposed method is superior to the competitors for a wide variety of synthetic and real data sets.
作者
刘鹏
叶宾
LIU Peng;YE Bin(Engineering Research Center of Intelligent Control for Underground Space,Ministry of Education,China University of Mining and Technology,Xuzhou 221116,China;School of Information and Control Engineering,China University of Mining and Technology,Xuzhou 221116,China)
出处
《常州大学学报(自然科学版)》
CAS
2020年第2期31-37,共7页
Journal of Changzhou University:Natural Science Edition
基金
徐州市应用基础研究计划资助项目(KC18069)。
关键词
线性判别分析
缺失数据
高维数据
Lasso估计
linear discriminant analysis
missing data
high-dimensional data
Lasso estimation