摘要
当前的电商零售数据受到很多因素的影响,其内在联系属性被众多不相干非销售数据属性干扰。销售数据的关联规则很难被无差错的找到,传统的关联挖掘算法会陷入到海量的数据联系中,形成很多无用联系,造成挖掘耗时,效率较低。提出一种多维cube的海量零售关键数据挖掘模型。依据频繁项目集的性质,按照概率估算方法,将扫描的待定项目集进行选择性评估;在频繁项集的时候,根据评估的概率按照候选项目集的依赖关系进行筛检,并结合最小支持度与最小可信度的阈值产生频繁项集,进一步产生关联规则。以某电子商务网站中的销售数据和客户访问数据仿真结果表明:改进算法具有较好的访问速度,验证了算法的有效性。
In current, the electric retail provider data are influenced by many factors and the intrinsically linked properties are interferenced by irrelevant sales data. It is difficult to find the error of the association rules of sales da- ta, and the traditional association mining algorithms is easy to fall into the vast amounts of data to form a lot of un- wanted contact, resulting in mining - consuming and less efficient. This paper presented a retail key data mining model based on multidimensional cube massive. Based on the nature of frequent item sets and probability estimation methods, the scanned undetermined project was assessed selectively. In frequent item sets, according to the proba- bility of assessment and the dependency relationship of candidate itemsets, we combined the threshold of minimum support with minimum confidence to generate frequent itemsets and association rules. By using the sales data and customer access to data in an e - commerce site to make simulation, the results show that the improved algorithm has better access speed. It verifies the effectiveness of the algorithm.
出处
《计算机仿真》
CSCD
北大核心
2013年第10期399-402,共4页
Computer Simulation
关键词
数据挖掘
海量数据
星型模型
Data mining
Huge amounts of data
Star model