摘要
1.引言
从给定的数据集中发现有用的知识一直是示例学习和数据库知识发现等领域研究的重要内容[1,2].一般地说[3]:规则越简单,归纳概括能力就越强,分类精度越高.因此,近几年来,从给定示例中归纳简单而概括的规则,即最大复合问题的算法研究逐渐成为上述诸领域的一个热点.然而,现有的规则归纳算法多为建立在不含噪音的理想数据基础上的,而在实际的应用领域中不可避免地存在噪音数据[4,5],这样致使现有的算法一直得不到令人满意的结果,甚至很难应用于实际领域,从而给实际领域规则的获取带来了一定难度.噪音数据一般可以分为如下三种形式[6],即个别属性值错误型噪音、未知属性值型噪音和冗余属性值型噪音.规则归纳算法能否有效地解决上述三种情况的噪音、是其能否成功应用于实际领域的关键.
In this paper, the concept, Extension Matrix Set is proposed. which is derived from Extension Matrix. A new algorithm based on Extension Matrix Set, Noise-Tolerated Heuristic Algorithm for Most General Complex (NMGC), is designed and implemented. In order to induce most general complexes, information entropy and Mexico cap function are used as attribute selection criterion and terminate function respectively. The experimental results in the real-world databases show that more general rules can be achieved; high precision can be also obtained. This implies that NMGC can be applied to real-world databases effectively.
出处
《计算机科学》
CSCD
北大核心
2002年第8期79-81,共3页
Computer Science
基金
973项目基金