摘要
最频繁项集挖掘是文本关联规则挖掘中研究的重点和难点,它决定了文本关联规则挖掘算法的性能.针对当前在最频繁项集挖掘方面的不足,改进传统的倒排表,并结合最小支持度阈值动态调整策略,提出一个新的基于改进的倒排表和集合理论的最频繁项集挖掘算法.另外,给出几个命题和推论,并把它们用于本文算法以提高性能,最后对所提算法进行实验验证.实验结果表明,该算法的规则有效率和时间性能比常用的两个最频繁项集挖掘算法(NApriori算法,IntvMatrix算法)都好.
The mining of most frequent item-sets is the focal and difficult point of text association rules mining, and it directly determines the performance of the mining algorithm for text association rules. Aimed at shortcomings existing in mining algorithm for most frequent item-sets, the traditional inverted list was improved with dynamic adjustment strategy of minimum support threshold and a new mining algo- rithm for most frequent item-sets was presented based on improved inverted list and set theory. In addi- tion, several propositions and deductions were given to improve the performance of the proposed algo- rithm. Finally, the proposed algorithm was verified with experiment. Its result showed that this algorithm exhibited better efficiency of rules and time performance than Napriori and IntvMatrix which are two com- mon mining algorithms for most frequent item-sets.
出处
《兰州理工大学学报》
CAS
北大核心
2012年第4期85-88,共4页
Journal of Lanzhou University of Technology
基金
国家自然科学基金(61103249)
四川省教育厅科研基金(11ZB095)
人工智能四川省重点实验室开放基金(2011RYY06)
四川理工学院国家培育项目(2011PY05)的资助
关键词
频繁项集
关联规则
倒排表
集合理论
frequent item-sets
association rules
inverted list
set theory