摘要
针对当前关于数据流加权最大频繁项集WMFI(weighted maximal frequent itemsets)的研究无法有效地处理频繁阈值和加权频繁阈值不一致情况下WMFI的挖掘问题,提出了完全加权最大频繁项集FWM FI(full w eighted maximal frequent itemsets)的概念.为了减少naive算法在处理滑动窗口下完全加权最大频繁项集挖掘时存在的冗余运算,提出了FWMFI-SW(FWMFI mining based on sliding window over data stream)算法.所提出的算法通过基于频繁约束条件的优化策略减少了naive算法中M ax W优化策略的无效调用次数;采用编辑距离比率作为WMFP-SW-tree的重构判别函数,可以有效减少该树的重构次数.实验结果表明FWMFI-SW算法是有效的,且比naive算法更有时间优势.
Aiming at the problem that none of current researches on the WMFI ( weightedmaximal frequent itemsets) over data stream emphasizes the WMFI mining on the condition thatthe frequent threshold is not equal with the weighted frequent threshold, the concept of FWMFI(full weighted maximal frequent itemsets) was firstly promoted in this work. In order to reduceredundant operations existing in the naive algorithm which is used to handle the FWMFI miningbased on sliding window over data stream, the FWMFI - SW ( FWMFI mining based on slidingwindow over data stream) algorithm was proposed. The mining optimization strategy was adoptedbased on the frequent character to reduce the unnecessary call about the MaxW optimizationstrategy in the naive algorithm. In addition, the edit distance ratio was taken as reconstruction9udge function to decide whether the updated WMFP - SW - tree should be reconstructed as thewindow slides. The extensive experiments showed that the FWMFI - SW algorithm is effective,and outperforms the naive algorithm in running time.
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2016年第7期931-936,共6页
Journal of Northeastern University(Natural Science)
基金
国家自然科学基金资助项目(60903159
61173153
61402096)
中央高校基本科研业务费专项资金资助项目(N110818001
N100218001
N130504007
N120104001)
沈阳市科技计划项目(1091176-1-00)
国家高技术研究发展计划项目(2015AA016005)
关键词
数据流
滑动窗口
编辑距离比率
加权最大频繁项集
重构判别函数
data stream
sliding window
edit distance ratio
weighted maximal frequentitemsets
reconstruction judge function