摘要
针对传统挖掘算法会输出大规模频繁子树且其中包含较多冗余信息,使事物表达不够清晰完整,降低后续操作效率的问题,提出基于模式增长的嵌入式频繁子树挖掘算法。定义标签树,并分析挖掘任务,根据模式增长的基本性质,扫描森林数据库,建立与频繁子树模式对应的投影库,确定模式增长过程,设立增长框架。提出融合压缩思想,采用深度优化方式遍历所有子树的节点,构建融合压缩树,实现数据清理。基于数据清理结果组建拓扑序列,制定树与森林的拓扑编码,输入数据库与最小支持度数值,结合覆盖定理对频繁子树队列进行裁剪,完成挖掘。仿真结果表明,上述方法挖掘的数据信息更加丰富完整,挖掘效率更高。
The large-scale frequent sub-tree output by traditional mining algorithm results in unclear expression of things and low operation efficiency.Therefore,an embedded frequent sub-tree mining algorithm based on pattern growth is proposed in this work.The tag tree was defined,and then the mining task was analyzed.Based on the basic nature of pattern growth,the forest database was scanned for establishing the projection database corresponding to the frequent sub-tree pattern,thus the growth process of the model was determined and the growth framework was established.The fusion compression method was proposed,and the deep optimization method was adopted to traverse all the nodes of the sub-tree for constructing the fusion compression tree,thus the data cleaning was realized.The topological sequence was constructed according to the results of data cleaning.The topological codes of tree and forest were formulated,and the database and the minimum support value were input,meanwhile,the frequent sub-tree queue is cut by combining with the covering theorem,thus the mining was completed.The simulation results show that the method has rich data mining information and high mining efficiency.
作者
卫朝霞
邹倩影
WEI Zhao-xia;ZOU Qian-ying(Jincheng College of Sichuan University,Chengdu Sichuan 611731,China;Chengdu College,University of Electronic Science and Technology of China,Chengdu Sichuan 611731,China)
出处
《计算机仿真》
北大核心
2021年第3期249-252,263,共5页
Computer Simulation
基金
成都市科学技术局2018年成都市技术创新研发项目(2018-YFYF-00191-SN)。
关键词
模式增长
嵌入式
频繁子树挖掘
融合压缩
覆盖定理
Pattern growth
Embedded
Frequent sub-tree mining
Fusion compression
Covering theorem