摘要
关联规则挖掘是数据挖掘研究的重要分支。发现频繁项目序列集又是关联规则挖掘中的一个关键阶段。十几年来,许多发现频繁项目集的算法已经被提出。近几年来,人们更关注于在大型数据集中高效发现频繁项目集的算法研究,特别是在减少数据库的扫描次数、提高内存利用率等方面。该文提出一个称为DFISP的算法,它是基于数据分段扫描策略的,并且只需两次数据库扫描即可完成频繁项目序列集的生成。实验表明,DFISP算法是稳定而高效的。
Mining association rules from databases is an important research branch of data mining,and discovering frequent itemsets or itemsequences is a key phase in mining association rules.Many algorithms have been proposed in the literatures.Recent researches have paid more attention to high mining efficiency,including reducing the number of passes over databases,memory usage and I /O costs.This paper gives a new algorithm for discovering frequent itemsequences,called DFISP,which employs two passes over databases and improves its mining efficiency in large databases by using data-partitioning scan technique.Experimental results show that it could keep memory usage space within acceptable ranges as well as achieve satisfying execution efficiency as increasing the size of the databases.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第7期19-21,202,共4页
Computer Engineering and Applications
基金
国家自然科学基金(编号:60173014)
北京市自然科学基金(编号:4022003)
北京市教委资金资助
关键词
数据挖掘
关联规则
项目序列(集)
数据分段扫描
Data Mining,Association Rules,Itemsequences(Itemsequence Sets),Data-Partitioning Scan