摘要
针对带时间约束的序列模式,提出了一种改进的挖掘算法TSPM,克服了传统的序列模式挖掘方法时空开销大,结果数量巨大且缺少针对性的缺陷.算法引入图结构表示频繁2序列,仅需扫描一次数据库,即可将与挖掘任务相关的信息映射到图中,图结构的表示使得挖掘过程可以充分利用项目之间的次序关系,提高了频繁序列的生成效率.另外算法利用序列的位置信息计算支持度,降低了处理时间约束的复杂性,避免了反复测试序列包含的过程.实验证明,该算法较传统的序列模式发现算法在时间和空间性能上具有优越性.
An improved time constrained sequential pattern mining algorithm (TSPM) is proposed, overco- ming the problem of traditional sequential mining algorithm whose performance is poor, and result is numerous and short of pertinence. Graph is introduced to express the frequent 2-sequence. It need scan the transaction database only once, then mapping information related to the mining task into graph. The graph representation can fully utilize the property of item order in the mining process, thus improving the generating efficiency of frequent sequences. Besides it makes use of the positional information of sequence to count support, therefore reducing the complexity of time constraints processing, and avoiding the process of testing whether a candidate sequence is contained in a data sequence. Experimental results prove the superiority of the algorithm in time and space performance.
出处
《智能系统学报》
2007年第2期89-93,共5页
CAAI Transactions on Intelligent Systems
基金
安徽省自然科学基金资助项目(050420207)
关键词
数据挖掘
序列模式
时间约束
data mining
sequential pattern
time constrain