摘要
针对PFIM算法中频繁概率计算方法的局限性,且挖掘时需要多次扫描数据库和生成大量候选集的不足,提出EPFIM(efficient probabilistic frequent itemset mining)算法。新提出的频繁概率计算方法能适应数据流等项集的概率发生变化时的情况;通过不确定数据库存储在概率矩阵中,以及利用项集的有序性和逐步删除无用事物来提高挖掘效率。理论分析和实验结果证明了EPFIM算法的性能更优。
The way to calculate the frequentness probability in PFIM limited its applications, it needed to scan the database for many times and generated a large number of candidate sets. This paper proposed a new algorithm named EPFIM. First, the new method of calculating the frequentness probability made it easier to update frequentness probability of itemset, and could be adapted in more situations. Second, it used uncertain probability matrix to store the database in order to scan database less. In addition, the sequence of items and deleting unwanted transactions gradually improved efficiency of mining. Theoretical analysis andexperimental results show EPFIM performances better.
出处
《计算机应用研究》
CSCD
北大核心
2012年第3期841-843,共3页
Application Research of Computers
基金
国家自然科学基金资助项目(61163015)
教育部"春晖计划"基金资助项目(Z2009-1-01024)
关键词
不确定数据
可能世界
期望支持度
概率频繁项集
uncertain databases
possible word
expected support
probabilistic frequent itemset