摘要
由于条件属性在各样本的分布特性和所反映的主观特性的不同,每一个样本对应于真实情况的局部映射。建立了粗糙集理论中样本知识与信息之间的对应表示关系,给出了由属性约简求约简决策表的方法。基于后离散化策略处理连续属性,实现离散效率和信息损失之间的动态折衷。提出相对值条件互信息的概念衡量单一样本中各条件属性的相关性,可以充分利用现有数据处理不完备信息系统。即使在先验知识不足的情况下,也能通过主动学习构造新的规则补充进知识库中。拓广了粗糙集理论的应用范围,在UCI机器学习数据集上的实验结果和样例分析证明了该算法的合理性和有效性。
Since the distribution characteristics of condition attributes in different instances and subjective characteristics they reflect are different,each corresponding to a sample of the real situation in local mapping.In this paper,information theory will be integrated into the rough set algorithm learning process,and learning approach is given by the attribute reduction for decision table.Continuous attributes are handled based on post discretization strategy to balance between the loss of information and discretization efficiency.The definition of conditional mutual information between relative values is given to measure the attribute relevancy in a single sample,and the authors can fully make use of existing data to process incomplete information systems. Even in the lack of prior knowledge,the authors can also apply active learning to learn the new rules and add them to the knowledge base.Thus the application domain of rough set theory is extended.The experimental results on UCI machine learning data sets and analysis of the instances proved that,the algorithm proposed in this paper is reasonable and effective.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第22期39-42,共4页
Computer Engineering and Applications
基金
国家自然科学基金No.60275026~~
关键词
粗糙集
相对值条件互信息
主动学习
rough set
conditional mutual information between relative values
active learning