摘要
本文通过对数据挖掘的几种传统属性归纳算法的分析,发现它们存在以下不足:(1)不能处理不平衡的概念层次;(2)没有考虑实际数据分布对最后的泛化规则的影响。因此,本文提出了基于抽样的概念层次挖掘算法,它先采用抽样方法,对概念层次进行初步调整,然后扫描整个数据文件,利用扫描信息再次调整概念层次,最后通过统计调整后的概念层次的叶子信息就可以得到泛化规则。本算法不仅克服了传统算法的不足,而且具有最优的时间复杂度O(h)和空间复杂度O(c)。
This paper first presents some traditional Attribute - Oriented Induction (AOI) algorithms in data mining field and points out the shortcomings of them as follows: (1) they couldn' t deal with the unbalanced concept hierarchy; (2) the final generalized result doesn't refer to the distribution of real data set.Hence,we put forward an algorithm for concept hierarchy mining based on sampling,which samples the dataset first, and arranges the initial concept hierarchy, then scans the whole dataset,later organize the concept hierarchy according to the statistics information, finally get the generalized rule by calculating the information of leaves. It not only solves the above problems, but also has the optimal time and space complexity.
出处
《计算机应用与软件》
CSCD
北大核心
2001年第3期57-63,共7页
Computer Applications and Software
关键词
数据挖掘
属性归纳算法
概念层次
数据库
Data mining Attribute - oriented induction Concept hierarchy