摘要
数据流频繁模式挖掘是当前数据挖掘领域中的研究热点之一,数据流连续性、无序性、无界性及实时性的特点为挖掘算法在时间及空间性能方面提出了更高的要求。数据流中模式频度的震荡现象,迫使现有算法对概要数据结构频繁维护,致使其时间、空间效率均受到较大影响。构造了具备较高空间性能的概要数据结构SP-tree,同时定义了震荡性因子χ以量化震荡信息,提出了一种高效的离线数据流频繁模式挖掘算法SPDS,有效降低了数据震荡对算法性能的影响;在处理新到数据集时,算法采取分而治之的分离映射策略,进一步提升了时间效率;同时在查询结果方面提高了部分模式的计数精度。
Mining frequent patterns from data streams is one of the hottest research topics in data mining nowadays. The features of data streams, such as consecution, disorder and real-time, raise requirements for higher time and space performance of mining algorithms. Vibration of pattern frequency in data streams, compels the present algorithms to revise the synopsis structure continually,and leads up to disadvantage impact on both time and space efficiency. A more scalable synopsis structure SP-tree was designed firstly, meanwhile the concept of vibration factor 3( was given for maintaining vibrational information. Then an efficient algorithm for mining frequent patterns over offline data streams SPDS was proposed, which relieves the performance from the impact of vibration effectively, and increases the count accuracy of partial patterns. This algorithm adopts a divide-and-conquer mechanism to mine the current dataset, thereby improves itself further.
出处
《计算机科学》
CSCD
北大核心
2009年第7期247-251,291,共6页
Computer Science
基金
国家自然科学基金项目(60675030)资助
关键词
数据挖掘
数据流
频繁模式
震荡性因子
Data ming, Data stream, Frequent pattern(FP), Vibration factor