摘要
现存的数据提取算法,大都以方差贡献率作为评价准则,来衡量特征提取的效果.然而方差贡献率注重的是样本相关矩阵特征值的性质,并不能顾及到信息的度量问题.文中将Shannon信息熵理论引入提取算法,定义类概率、类信息函数,通过计算累计信息贡献率来确定提取特征维数,提取效果可以从信息论的角度评价.将此理论与因子分析(FA)结合,建立基于信息熵的FA特征提取算法,利用信息贡献率确定主因子提取的个数.通过实例分析,验证理论的有效性.
The performance assessments of existing data extraction algorithms mostly use variance contribution rate calculated by eigenvalues of raw data to measure the effect of feature extraction. However, variance contribution rate emphasizes the characteristic of eigenvalues of correlation matrix of the sample and it can not take information measuring into account. The extraction effect can be assessed from the angle of information theory by introducing Shannon information entropy into extraction algorithm, defining class probability and class information function and determining feature dimensions by calculating total information contribution rate. The theory are combined with factor analysis (FA) and FA feature extraction algorithm of information function is established. The extracting number of main factors is determined by information contribution rate. Finally, the efficiency of the theory is tested by cases.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2011年第3期327-331,共5页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.60975039
41074003)
国家973重点基础研究发展计划(No.2007CB311004)
江苏省基础研究计划(自然科学基金)项目(No.BK2009093)资助
关键词
信息函数
Shannon熵
特征提取
方差贡献率
Information Function, Shannon Entropy, Feature Extraction, Variance Contribution Rate