摘要
频繁模式挖掘是很多数据流挖掘工作的基础.现有算法虽然能够有效的在数据流中挖掘近似的频繁模式,但是由于数据流数据的不确定性、连续性以及海量性,始终不能有效的将算法的时间效率和空间效率控制在一个可以接受的范围内.本文通过使用散列表作为概要数据的存储结构,并引入关联规则兴趣度的概念,提出了数据流频繁模式挖掘算法MIFS-HT(mining interesting frequent itemsets with hash table),不仅有效降低现有算法的时空复杂度,同时提高了算法的应用价值.最后,实验结果表明:MIFS-HT是一种高效的数据流频繁模式挖掘算法,其性能优于FPStream、LossyCounting等算法,并且挖掘结果更具有现实意义.
Frequent itemsets mining, which is the basic in the field of data stream mining, has been paid more and more attention by researchers. Due to the uncertainties, continuities and large amount of data streams, many mining algorithms are difficult to deal with these dynamic data streams. In this paper, hashed table and the interesting degree of association rules are introduced, where the former is used to represent the synoptic data structure and the latter is applied to incorporate attention of customers. After that, a new frequent itemsets mining algorithm named MIFS-HT(mining interesting frequent itemsets with hash table) is proposed. Comparing with lossy counting and a similar algorithm called mining frequent item sets over data streams by matrix (MISM for short), the result shows that MIFS-HT is more effective both in time and space efficiency.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2012年第12期2764-2773,共10页
Systems Engineering-Theory & Practice
基金
国家自然科学基金(71071141)
高等学校博士学科点专项科研基金(20103326110001)
浙江省自然科学基金重点项目(Z1091224)
浙江工商大学现代商贸中心(11JDSM02Z)