摘要
频繁闭项集(Frequent Closed Items,FCI)是一种表示事物之间关联关系的有效方式,它能克服频繁项集(Frequent Items,FI)信息冗余的缺点。FCI挖掘算法研究旨在以更优的时空效率,在原始数据集中找到所有的FCI。相关研究成果重在关注时间效率的提升,但空间效率欠佳。提出一种高空间压缩率数据结构——邻接比特压缩表(Compressed Adjacency Byte table,Cab-table),将项集与交易集压缩到剔除全部0之后的比特表中,使空间高度压缩。基于此数据结构的频繁闭项集挖掘算法(Cab-Miner),采用运算栈与检索栈来实现非递归方式的频繁闭项集挖掘,相较于之前普遍采用递归方式的算法,在理论上可使空间占用率由O(L*N+M)降为O(3N)。基于公开数据集与真实数据集的实验表明,上述算法在原始数据集压缩,以及运算内存消耗上,都有较优的表现,尤其在处理真实数据集时,空间表现极佳。另外在某些属性的数据集上也表现出优越的时间效率。
Frequent closed items(FCI)is an effective data structure for expressing the relationships among things,which can overcome the defect of frequent items(FI)in information redundancy.The purpose of study in frequent closed items mining is to find all frequent closed items in the original dataset with higher spatiotemporal efficiency.Unfortunately,lots of studies focused on the time efficiency improving but lost sight of spatial optimization.We proposed a data structure(Cab-table:Compressed Adjacency Byte table)for data compression efficiently,which can compress item set and transaction set into byte table without zero.Furthermore,we proposed a FCI mining algorithm with Cab-table,called Cab-Miner.In the algorithm we designed one retrieve stack and one operation stack to realize the non-recursive FCI mining,which is different from most of other algorithms.Compared to the recursive algorithm,our algorithm's space efficiency is O(2N+M)instead of O(LN+M).Several experiments were carried out with public data and real data,then we proved that our algorithm has better space occupation performance in initial data set compression and operation memory consumption,especially when the data set is collected from real scenes.Additionally,Cab-Miner also consumes lower time process in some data set with special properties.
作者
杨博超
吴美璇
胡浩
朱敏
YANG Bo-chao;WU Mei-xuan;HU Hao;ZHU Min(College of Computer Science and Technology,Sichuan University,Chengdu Sichuan 610065,China)
出处
《计算机仿真》
2024年第1期415-424,共10页
Computer Simulation
关键词
频繁闭项集
邻接比特压缩表
非递归算法
高空间效率
Frequent closed items
Compressed adjacency byte table
Non-recursive algorithm
High spatial efficiency