摘要
在分析大肠早癌荧光数据属性约简方法的基础上,提出一种基于容错关系信息熵的粗糙主成分属性约简方法.该方法首先针对数据的不完整特性,建立容错关系粗糙集模型.然后,引入随信息量减小而单调下降的信息熵,建立基于信息熵的容错关系粗糙集模型,并进行初步数据属性处理.最后,结合主成分分析方法,形成基于容错关系信息熵的粗糙主成分分析方法,在进行数据降维处理的同时提取数据特征.以大肠早癌荧光光谱为实验数据的分析处理结果表明,该方法可以有效地降低荧光光谱数据的处理维数,提取影响医疗诊断的特征数据,减少后续数据处理的复杂度.
An algorithm to deal with the attributes reduction by a rough set with principal component analysis(PCA) based on tolerant relation is presented for colorectal carcinoma data sets.First,to solve the problem of data non-completeness,the algorithm establishes a rough set model based on tolerant relation.Then,a novel definition of entropy is given in which knowledge decreases as the granularity of information becomes smaller,and a tolerant relation rough set model based on the entropy is constructed to reduce the data dimension.Finally,the rough set with PCA based on tolerant relation is formed,which can decrease the data dimension and extract the data feature.The experimental results for colorectal carcinoma data sets show that the data dimension and the data analysis complexity can be reduced greatly with the algorithm,and the data feature can be extracted effectively as well.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第S1期199-203,共5页
Journal of Southeast University:Natural Science Edition
基金
国家高技术研究发展计划(863计划)资助项目(2007AA022008)
湖南省自然科学基金资助项目(06JJ5143)
关键词
容错关系
粗糙集
主成分分析
属性约简
tolerant relation
rough set
principal component analysis
attribute reduction