摘要
传统频繁项集挖掘算法在处理稠密或长数据集(如基因表达数据集)时效率低且产生大量冗余模式,为解决这些问题一些学者提出了闭合模式的概念和挖掘闭合模式的算法,研究证明挖掘闭合模式可以显著减少项集数量并消除大量冗余模式。该文针对生物数据特点提出了一个新颖的挖掘频繁闭合模式的算法REMFOR,该算法在闭合模式概念和行枚举思想的基础上,采用垂直数据结构和fp-tree技术,对行集建立行fp-tree来挖掘频繁闭合模式。通过实例和实验证明该算法是正确有效的。
Traditional algorithms for mining frequent itemsets are proved to be inefficient and produce many redundant patterns when they are applied to dense datasets or long datasets, such as gene expression datasets. To solve this problem, some researchers propose closed pattern conception and some algorithms. It is proved that these algorithms based on the conception of closed pattern can substantially reduce the number of rules and redundant patterns. According to the characters of biological datasets, a novel algorithm called REMFOR is dlsigned to mine frequent closed pattern. It is based on the conception of closed pattern, using row enumeration and vertical data structure, building row fp-tree on row set to mine frequent closed pattern. And it is proved to be correct and efficient by example and tests.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第2期74-76,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60433020)
关键词
数据挖掘
频繁项集
闭合模式
Data mining
Frequent itemsets
Closed pattern