摘要
关联规则挖掘常常会产生大量的规则,这使得用户分析和利用这些规则变得十分困难,尤其是数据库中属性高度相关时,问题更为突出.为了帮助用户做探索式分析,可以采用各种技术来有效地减少规则数量,如约束性关联规则挖掘、对规则进行聚类或泛化等技术.本文提出一种关联规则冗余删除算法ADRR和一种关联规则聚类算法ACAR.根据集合具有的性质,证明在挖掘到的关联规则中存在大量可以删除的冗余规则,从而提出了算法ADRR;算法ACAR采用一种新的用项目间的相关性来定义规则间距离的方法,结合DBSCAN算法的思想对关联规则进行聚类.最后将本文提出的算法加以实现,实验结果表明该算法是有效可行的,且具有较高的效率.
A common problem in association rule mining is that a large number of rules are often generated from the databases, which makes it difficult for users to analyze and makes use of these rules. This is particularly true for data sets whose attributes are highly correlated. To facilitate exploratory analysis, the number of rules can be reduced significantly by techniques such as mining association rules with constraint items, post-pruning or clustering and summarizing rules. This paper proposed algorithms ADRR and ACAR to overcome this problem. Firstly, algorithm ADRR prunes the discovered associations by removing those redundant associations according to the property of the set, and then algorithm ACAR makes use of the correlation information of the items to measure the distances between rules, Therefore, clustering algorithm DBSCAN is applied to generate the clustering structure suitable for exploratory analysis. Finally, an experiment is conducted on a real-life database and the experimental result shows that the method is practical and effective.
出处
《小型微型计算机系统》
CSCD
北大核心
2006年第1期110-113,共4页
Journal of Chinese Computer Systems
基金
江苏省重点实验室开放基金(KJS03064)资助.
关键词
关联规则
相关性
聚类
association rules
correlation
clustering