摘要
在由频繁项集产生关联规则时,利用提升度判断规则前、后件之间的正相关性可以避免产生一些无意义的关联。但是,这并不能保证规则前、后件中的项是正相关的,也不能减少挖掘频繁项集的时间开销。当规则的前件或后件存在负相关的项时,仍然可能产生无意义的关联规则。针对以上问题,基于数学期望,提出了正相关的频繁项集的概念,并改进了一种直接在FP-树中挖掘频繁项集的算法,挖掘出正相关的频繁项集,从而有效地解决以上问题。实验表明,该算法可以大幅度地减少所产生的频繁项集数量,显著地降低了挖掘频繁项集的时间开销。对于大型数据集,尤其是稠密型数据集,该算法具有良好的性能。
Some uninteresting rules can be avoided by using the lift to judge the positive .correlation between the antecedent and the consequent of an association rule, while generating association rules from frequent itemsets. However, neither can it ensure items in the antecedent or the consequent of a rule are positively correlated; nor can it reduce the time of mining frequent itemsets. An association rule is still uninteresting if there are negatively correlated items in its antecedent or consequent. So, this paper, based on mathematical expectation, brought up the concept of the positively correlated frequent itemsets, and improved the algorithm of mining frequent itemsets directly in FP-tree to mine positively correlated frequent itemsets. In this way, a solution to the above mentioned problems was got. Experiments show that this algorithm can decrease the number of generated frequent itemsets largely, and reduce the time consumption of mining frequent itemsets evidently. This algorithm has an excellent performance in big database, especially in dense database mining.
出处
《计算机应用》
CSCD
北大核心
2007年第1期108-110,142,共4页
journal of Computer Applications
基金
河南省自然科学基金资助项目(0211050100)
关键词
关联规则
频繁项集
FP-树
正相关
association rules
frequent itemsets
FP-tree
positive correlation