摘要
由于序列模式挖掘需要花费大量计算时间,并需要占用大量存储空间.减少计算量、节省存储空间开销成为序列模式挖掘的关键.因PrefixSpan算法不产生候选,而适当应用Bitmap数据结构可避免重复扫描数据库,基于此,本文提出了BM-PrefixSpan算法,用于序列模式挖掘,设计并构造了PFPBM(Prefix of First Position on BitMap)表用于记录序列中的每个项在位图中第1次出现的位置.实验结果表明,BM-PrefixSpan算法综合了PrefixSpan和SPAM算法的优点,能够更快、更好地挖掘出序列模式.
Because sequential pattern mining needs a lot of computing time and storage space, how to re duce the amount of calculation and storage space becomes the key of the sequential pattern mining algo rithm. Combining the PrefixSpan algorithm with Bitmap data structure, this text proposes an improved se quential pattern mining algorithm BM-PrefixSpan. The PFPBM ( Prefix of First Position on BitMap) table was designed and implemented. When a new item appeared in a sequence, it was recorded in the PFPBM table. The experimental results show that the BM-prefixspan algorithm mines sequential patterns faster and better than others.
出处
《广东工业大学学报》
CAS
2013年第4期49-54,共6页
Journal of Guangdong University of Technology
基金
教育部重点实验室基金资助项目(110411)
广东省自然科学基金资助项目(10451009001004804
9151009001000007)
广东省科技计划项目(2012B091000173)
广州市科技计划项目(2012J5100054
2013J4500028)
韶关市科技计划项目(2010CXY/C05)
关键词
序列模式
前缀投影序列模式挖掘
序列模式挖掘
位图
数据挖掘
sequence pattern
PrefixSpan ( Prefix-projected Sequential Pattern Mining)
SPAM ( Se-quence Pattern Mining)
bitmap
data mining