摘要
频繁模式挖掘是数据挖掘领域的中一个重要问题,其研究范围包括事务,序列,树和图.频繁子树挖掘广泛应用于生物信息学,web挖掘,化合物结构分析和挖掘等领域.本文提出用模式增长方法在由无序树构成的森林中挖掘直接频繁子树.算法利用规范化方法将无序树化为为唯一的表示形式,利用最右路径扩展方法构造完整的模式增长空间,然后根据待增长模式的拓扑结构确定其增长点并构造相应投影库,从而将挖掘频繁子树模式问题转化为在各投影库中寻找频繁节点问题.通过与HybridTreeMiner算法的实验比较,表明其具有更高的效率.
Frequent patterns mining is an important problem in data mining domain. It involves mining transactions ,sequences, trees and graphs. Methods for mining frequent trees are widely used in domains like bioinformatics,web-mining, chemical data structure mining,and so on. In this paper,an efficient pattern growth algorithm is presented for mining frequent induced subtrees in a forest of rooted ,labeled, and unordered trees. It uses a breadth-first canonical form to represent unordered trees in a unique way. It uses rightmost path expansion schema to construct complete pattern growth space, and creates a projection database for every grow point of the pattern ready to grow. Then,the problem is transformed from mining frequent trees to finding frequent nodes in the projected database. Experiments show that it has better performance than HybridTreeMiner, one of the fastest methods proposed before.
出处
《小型微型计算机系统》
CSCD
北大核心
2006年第11期2104-2108,共5页
Journal of Chinese Computer Systems
关键词
知识发现
数据挖掘
频繁模式
频繁子树
knowledge discovery
data mining
frequent patterns
frequent subtrees