摘要
通过采用惩罚函数设置支持度阈值,有效地解决了规则冗余问题。在小生境遗传算法方面,采用了一种新颖的素因子染色体编码方法并引入了最大频繁项分布表。该编码方法把原来用字符表示的事务转化为用一个整数进行表示,把字符串运算转化为数值运算,将事务数据库属性项压缩为一个数值型的项。通过引入最大频繁项分布表,使算法总是能在最大频繁项密集区进行挖掘,对组合搜索空间进行了有效修剪。实验结果表明,该方法对事务数据库压缩比超过25%,效率至少能提高47%。
The paper effectively solves the problem of regulation redundancy through adopting penalty function to set up the threshold of supporting degree. In respect of niching genetic algorithm. it adopts a new prime factor chromosome encoding method and introduces the maximal frequency item distribution list. This encoding method changes the transaction originally expressed in character into the one expressed in integer and changes the character string operation into the numerical value operation and compresses the property item of the transaction database to a numerical value item. Through introducing the maximal frequency item distribution list, it can always mine in the maximal frequency item compact district and thus, it effectively prunes the assembling searchspace. Experimental results show that the adopting method makes the compressing ratio of the transaction database in excess of 25%, and the effect can increase by at least 47%.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第10期163-165,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60573067)
关键词
关联规则
小生境遗传算法
染色体
杂交操作
association rule
niching genetic algorithm
chromosome
hybrid operation