摘要
频繁模式挖掘的模式数量通常过于巨大,在实际应用中只有少量的频繁模式被使用。Top-k频繁模式挖掘通过排列模式频数限制频繁模式的数量,有效提高了算法效率。提出了TPN(Top-k-Patterns based on Nodesets)算法,该算法使用了节点集的概念,将数据压缩于Poc-tree,通过Top-k-rank表重新计算最小支持度限制生成候选模式的数量。实验通过与ATFP,Top-k-FP-growth算法比较,证明该算法有较好的效率。
The number of mined patterns is usually too large and a small number of frequent patterns are used in real application.Therefore, the mining of top-rank-k frequent patterns which limits the number of mined frequent patterns by ranking them in frequency, has improved the efficiency of the algorithm. This paper proposes the TPN algorithm for mining top-k frequent patterns. The TPN employs a new data structure, Nodesets, to represent patterns, compressing the data to Poc-tree and computing min support patterns to limit candidate items by the top-k- rank table. The experiments are conducted to evaluate TPN and ATFP, Top-k-FP-growth in terms of mining time for two datasets. The experimental results show that TPN is more efficient and faster.
作者
孙俊
张曦煌
SUN Jun,;ZHANG Xihuang(School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China)
出处
《计算机工程与应用》
CSCD
北大核心
2017年第6期101-105,共5页
Computer Engineering and Applications
基金
国家自然科学基金(No.61170120)