摘要
最频繁项集挖掘决定了文本关联规则挖掘算法的性能,是文本关联规则挖掘中研究的重点和难点。该文分析了当前最频繁项集挖掘方面的不足,改进了传统的倒排表,结合最小支持度阈值动态调整策略,提出了一个新的基于改进的倒排表和集合理论的Top-N最频繁项集挖掘算法。同样,给出了几个命题和推论,并把它们用于该文算法以提高性能,实验结果表明,所提算法的规则有效率和时间性能优于NApriori算法和IntvMatrix算法。
Most frequent item sets mining is the focus and the difficulty of text association rules mining,and it directly determines the performance of text association rules mining algorithms.Firstly,several most frequent item sets mining algorithms are analyzd and summarized.And then,traditional inverted list is improved.Based on the improved list and set theory,a new TOP-N most frequent itemset mining algorithm combined minimum support threshold dynamic adjustment strategy is presented.In addition,several propositions and deductions for improving the performance of the performance of the provided algorithm are offered.Experimental results show that the provided algorithm is better than Napriori and IntvMatrix.
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2010年第5期757-761,773,共6页
Journal of University of Electronic Science and Technology of China
基金
四川省科技计划项目(2008GZ0003)
关键词
关联规则
倒排表
频繁项集
集合理论
支持度
association rules
inverted list
requent itemsets
set theory
supports