期刊文献+

关联规则挖掘中数据增量方式比较研究

Comparison among Data Incremental Approaches in Association Rule Mining
下载PDF
导出
摘要 随着电子商务的迅速发展,不仅交易数据程爆炸式增长,而且商品类别日新月异。因此,实时地、高效地、准确地获得频繁项集和关联规则对于商品的销售和推荐有着现实的指导意义。现有的工作针对交易数据的动态变化提出了很多增量式的挖掘算法,但只有较少的研究工作解决属性的增量变化问题。本文设计了一个增量算法来解决商品种类增加而引起的频繁项集和关联规则的更新问题。分析实际的卖家场景,商品的种类往往以两种方式动态增加,即一次只增加一种商品和一次性增加多种商品,其中,前者被称为逐一增加,后者被称为批量增加。针对商品不同的增加方式,分别提出两种挖掘子算法(add One By One与add All),电商卖家可以根据实际情况来选择相应的解决方案。丰富的实验在真实商品交易数据集上进行,讨论了两种子算法和经典的Apriori算法在挖掘结果、运行时间两方面的性能。实验结果表明:1)两种子算法所得的结果完全一致;2)最好情况下,add One By One算法所用平均时间比add All少2.93倍,比Apriori快12.85倍。 With the development of E-commence, not only the explosive of transaction data grows, but also the commodity categories changes rapidly. Therefore, efficient and accurate getting frequent item sets and association rules in real time has practical significance, In the present work, a lot of incremental mining algorithms have been proposed to deal with the dynamic change of transaction data. But only a few researches have been done to solve the problem of incremental change of attributes. In this paper, we design an incremental algorithm to solve the updating problem of frequent item sets and associational rules. Analysis of the actual situation of the seller, the kind of goods are often dynamically increased in two ways, add only one item at a time and add more than one at a time. Among which, the former named add One By One, the latter is add All. Two kinds of mining algorithms are proposed for different ways to increase the commodity. Then the sellers can choose the appropriate solution based on the actual situation. Extensive experiments are performed on real commodity transaction data. And the performance of two seed algorithms and the classical Apriori algorithm in mining results and running time are discussed. The experimental results show that firstly, the results obtained by the seed algorithms are identical. Secondly, in the best case, the average time of add One By One is 2.93 times less than add All, and it is about 12.85 times faster than Apriori algorithm.
出处 《数码设计》 2017年第2期28-32,共5页 Peak Data Science
基金 国家自然科学基金61379089
关键词 增量关联规则 数据增加方式 时间效率 incremental association rule add way efficiency
  • 相关文献

参考文献12

二级参考文献76

共引文献400

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部