摘要
为克服半结构化数据存储复杂的缺点,提出一种基于动态树的半结构化的存储模型。对该模型进行模式抽取,并将其引入到Apriori算法。通过设置最小支持度阀值过滤掉不必要的信息,输出最长频繁路径的集合,以实现半结构化数据的提取。实验结果表明,该算法能同时有效地处理分支及环路问题,避免了死循环的出现。
In order to overcome the complex characteristics of semi-structured data storage, we propose a semi-structured storage model based on dynamic tree. We extract mode by introducing the mode into the Apriori algorithm, and setting the minimum support threshold filter unnecessary information to output the longest frequent path collection. Experimental results show that this algorithm deal effectively with the branch and loop part at the same time, and also it can avoid infinite loop.
出处
《吉林大学学报(信息科学版)》
CAS
2012年第5期540-543,共4页
Journal of Jilin University(Information Science Edition)
关键词
半结构化数据
数据挖掘
频繁模式
模式抽取
semi-structured data
data mining
frequent patterns mining
extracting schema