摘要
在传统的相关规则挖掘中,多使用的是Apriori算法,就是用支持度-置信度框架寻找概率较高的模式,然后再对项集进行相关性检测获得相关规则,但会忽视低支持度-高相关性的关联规则。针对上述问题,有学者提出基于Phi相关系数挖掘正相关项对,避免了Apriori算法带来的不足,可以挖掘出低支持度-高相关度的项对。算法虽然裁剪了搜索空间,但基于大数据集上在降低运行时间性能方面效果提高不明显,同时挖掘出的项对可能是无意义的。为此提出了一种新的兴趣度模型的关联规则挖掘算法,利用项(对)的超集兴趣度的上界来裁剪搜索空间,大大地降低了算法运行时间,并根据项对冗余的约束条件过滤了无意义的项对。新算法相比,采用Phi相关系数的算法在降低运行时间方面得到显著提高,同时结果裁剪了冗余的项对,提高了挖掘效率和准确率。
Apriori algorithm is often used in traditional correlation rule mining,searching for the mode of higher frequency. Then the correlation rules are obtained by detected the correlation of the itemset,but this tends to ignore low support-high correlation of association rules. In view of the above problems,some scholars put forward the positive correlation coefficient based on Phi corre-lation to avoid the embarrassment caused by Apriori algorithm. It can dig with low support but high correlation. Although the algorithm has cut the search space,it is not obvious that the performance of the operation time based on the big data set is reduced,and the correlation pairs can be meaningless.This paper presents an improved mining algorithm with new association rules based on interest for correlation pairs,using an upper bound on interest of the supersets to prune the search space. It greatly reduces the running time,and filters the meaningless correlation pairs according to the constraints of the redundancy. Compared with the algorithm based on the Phi correlation coefficent,the new algorithm has been significantly improved in reducing the running time,the result has cut the redundant correlation pairs. So it improves the mining efficiency and accuracy.
出处
《计算机仿真》
CSCD
北大核心
2016年第8期223-228,447,共7页
Computer Simulation
基金
公益性行业(气象)科研专项项目(GYHY201306070)
南京信息工程大学大学生实践创新训练计划项目(201510300202)
关键词
兴趣度
项对
正相关
关联规则
冗余
Interest
Correlation pairs
Positive correlation
Association rules
Redundancy