摘要
关联模式挖掘研究是数据挖掘研究领域的重要分支之一,旨在发现项集之间存在的关联或相关关系。然而,传统的基于支持度-可信度框架的挖掘方法存在着一些不足:一是会产生过多的模式(包括频繁项集和规则);二是挖掘出来的规则有些是用户不感兴趣的、无用的,甚至是错误的。所以,在挖掘过程中有效地对无用模式进行剪枝是必要的。将卡方分析引入到模式的相关性度量中,利用卡方检验对项集之间、规则前件与后件之间的相关性进行度量是一种有效的剪枝方法。结果分析表明,在支持度度量的基础上引入卡方检验可以有效地对非相关模式进行剪枝,从而缩小频繁项集和规则的规模。
Association patterns mining is one of the important task of research on data mining, which main purpose is finding the correlations between the items. However, there are some shortcomings while using the common approach based on support-confidence framework to capture association patterns. First, there are a great number of redundant association rules generated; second, some of patterns generated are unwanted, even are misleading. So it is necessary to prune such uninteresting patterns. Chi-Squared test is introduced to prune the irrelevant items via calculating the Chi-Squared value of items. The experiment shows that Chi-Squared test is efficient and the searching space of the algorithm has been reduced remarkably.
出处
《现代计算机》
2005年第11期21-24,共4页
Modern Computer
基金
安徽省高等学校自然科学研究项目(2005KJ305ZC)