摘要
由于数据流的高速产生性、强流动性及变化不稳定性的需求,数据流算法应在有限存储空间里实时准确分析数据,提取有用知识。在允许的误差范围内,提出一种有效的数据流频繁项挖掘算法AECFP,通过一种基于频繁项样本的数据结构记录抵达的项目集合,进行快速的保存样本,并在样本空间满时快速删除出现次数最小且最旧的非频繁项,保留相同支持数的其它频繁项。当用户查询频繁项时,快速实时准确挖掘数据流中的频繁项,适应数据波动变化。经过实验证明,该算法在挖掘频繁项时,具有快速的处理能力,满足空间消耗的低存储要求,并能保证数据频繁项的挖掘准确度。
Data stream is a high-speed generating, strong mobility, and unstable data series. Its algorithm can realtime analyze data in limited space. With the allowed deviation, a large number of data items could be stored in a new data structure, by which an algorithm AECFP was presented to keep the coming items and delete the nonfrequent items when the sample space is full and keep the other frequent items. During guest search frequent items, the algorithm in time mine all the frequent items rapidly which can adapt to changing data. The algorithm has good performance in the time and memory space consumed by the experiments and analysis.
出处
《桂林电子科技大学学报》
2009年第6期480-482,共3页
Journal of Guilin University of Electronic Technology
关键词
数据流
数据挖掘
频繁项
ε-近似
非频繁项
data streams
data mining
frequent item
ε-approximate
non-frequent item