摘要
为了减少无用候选序列的生成,并使挖掘得到的序列模式符合用户要求,约束条件下的频繁序列模式挖掘已成为数据挖掘领域的一个新的重要研究方向.作为强约束形式的一种,均值约束目前仍然是基于约束的频繁序列模式挖掘的一个困难问题,其主要原因在于很难利用均值约束来进行序列模式挖掘中的剪枝.为此,提出了一种基于均值约束满足度剪枝策略,并且以前缀增长方法为基础设计了一个有效的频繁序列模式挖掘算法.通过分析并实验验证了该算法的时间效率和剪枝性能,结果表明,该方法是有效的.
To reduce the generation of useless candidates and make the generated patterns satisfy special requirements of users, constraint based on frequent sequential pattern mining has currently become an important research direction of data mining. However, as a kind of tough constraint, average value constraint is still a difficult problem to deal with because of its difficulty to be incorporated into the process of pruning candidates. An effective pruning strategy based on average value constraint satisfaction was proposed, and then a frequent sequential pattern mining algorithm was designed based on the prefixgrowth method. In the end, the running efficiency and pruning performance of the proposed algorithm was analyzed by experiments. The results show that the proposed method is effective.
基金
安徽省自然科学基金(050460402)
安徽省教育厅科研项目(2006sk010)资助
关键词
序列模式
均值约束
剪枝
sequential pattern
average value constraint
pruning