摘要
在挖掘无序树频繁模式的过程中,大多数的算法都是先产生候选者,再进行模式匹配判断它是否为频繁子树。产生候选者本身就需要消耗很大的空间来保存,并且要在复杂的树结构里做匹配也是件难事,它会影响整个挖掘过程的效率。为了尽量避免产生不必要的候选者,提高发现频繁模式的效率,基于对相关算法的研究,引进树投影资料库的概念,并在RootedTreeaVfiner算法的基础上,采用其模式延伸方法和广度优先标准型式概念,提出子树频繁度、频繁可延冲点串的概念,从而更有效系统地枚举所有的频繁模式树,并给出无序频繁子树挖掘算法FVTreeMiner。经系列实验结果证实了该算法合理、高效,并可以减少一定的内存开销和运行时间开销。
The most frequent unordered treec, mining algorithm, always enumerate some frequent pattern cheoser,and then to check the chooser is frequent or not. This process wastes a lot of memory, and is difficult to do matching, for increasing the frequent pattern mining' s efficiency. Under relational research, import Tree Projected Database, and based frequent unordered trees mining algorithm RootedTreeMinier, use its canonical forms and the other concept, for more efficient mining, project the new algorithm: FVTreeMiner. FVTreeMiner advice Subtree Frequent Set and Extended Nodes List, avoid enumerating useless pattern chooser. For long time' s research, a great of experiments result shows that the new algorithm use less memory and time than RootedTreeMiner.
出处
《计算机技术与发展》
2010年第5期9-12,共4页
Computer Technology and Development
基金
基金项目:国家自然科学基金(50604012)
关键词
无序树
标准型式
频繁子树
unordered tree
canonical forms
frequent subtree