摘要
联规则挖掘是数据挖掘和知识发现领域的重要课题,但就判定关联规则是否成立的依据,即兴趣度的度量方法问题,学术界没有一致的标准。既有的兴趣度度量方法包括支持度-置信度方法,提升度方法,卡方分析方法等。这些传统的兴趣度度量方法存在各自的局限,表现为缺乏客观标准,缺少统计依据,只能度量正向关系等方面。为了克服这些问题,本文提出了一种基于统计推断的新的兴趣度度量方法,并对该方法与传统的方法进行了比较,证明了该度量公式的渐进分布形式,指出了新方法的优点,实证了新方法的特征属性。该方法用于关联规则挖掘,在判断规则成立的标准上是客观的,可以判定正负关联两种类型,在实际应用上是方便的,具有可操作性,较之原有方法有一定的优势。
Discovering association rules is an important topic in the field of data mining and knowledge discovering. However,a unitive interestingness measure has not been proposed,which is a criterion that decides whether the rule is right or not.Plentiful papers have discussed interestingness measure methods such as support-confidence,lift,chi-square analysis and so on.Each traditional measure method has its flaws,like no objective criterion,lack of statistical base, disability of defining negative relationship etc.,so a new interestingness measure is created based on statistical inference in this paper.Comparisons are made between the traditional and the new one,and meanwhile a proof is given to explain the distribution of the new measure formula.In conclusion,the new method is prior to the traditional ones in the aspects of objective criterion,comprehensive definition and convenient application.
出处
《情报学报》
CSSCI
北大核心
2011年第5期503-507,共5页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金项目(60979016)
关键词
数据挖掘
关联规则
兴趣度度量
统计推断
data mining
association rules
interesting measure
statistical inference