摘要
发现关联规则是数据挖掘的一项重要任务,本文介绍了几种数据挖掘的串行和并行算法。其中IDD算法是一种高效的和易于扩展的发现关联规则的并行算法,然而,当处理器数目增加时,由于负载的失衡导致其效率的严重下降,于是通过引入近似算法成功地解决了这个问题。我们给出了两种近似算法和其性能证明,其一是在线算法,另一种是离线算法。在本文的最后,我们进行了改进的IDD算法的复杂性分析。
Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in this paper to solve the problem. IDD algorithm is an efficient and scalable parallel method applied in the discovery of association rules in the field of data mining- However.it becomes less effective when processors increases due to the imbalance. Therefore, IDD is improved by means of introducing approximate algorithms to solve the problem of load balance effectively. There are two approximate algorithms,one is called online algorithm.and the other is named offline algorithm. After that,we give the proof of their performance ratio. In the last part.it is the complexity analysis of the improved IDD algorithm.
出处
《计算机科学》
CSCD
北大核心
2004年第2期132-134,166,共4页
Computer Science
基金
中国科学院知识创新工程方向性研究项目基金(名称:大型数字对象应用环境及其并行模拟
批准号:KGCX2-JG-09)
总装备部西昌卫星发射中心实验技术项目基金
关键词
数据库
知识发现
并行数据挖掘效率
关联规则
数据集合
数据驱动
计算机
Data mining, Parallel processing. Association rules , Load balance , Scalability, Approximate algorithm. Online algorithm, Off line algorithm