摘要
为避免信息超载而在过多的无用信息中迷失方向,信息检索的重要性日益提高。文本自动分类是信息检索中最重要的工具之一。提出了一个用于中文文本自动分类的、称为关联规则辅助的遗传计算方法(AssociationRulesAidedGeneticComputingMethod,缩写为ARGCM)。提出并实现了文本分类的关联规则辅助的遗传算法;不同于前人的路线,适应度函数的编码借助了关联规则,而关联规则通过此文提出的ARGACM算法挖掘;实现了并测试了一系列基础遗传过程,例如AGACMRouletteSelection过程,AGACMXover过程和AGACMbinaryMutatio过程;实验结果表明新的ARG算法性能远优于传统的算法,其中向量AB Vector经过50代ARG算法的进化后,获得了高达3513.6的评分。
Information overload is a serious issue in the modern society. As a powerful method to help people out of being 'lost' in too much useless information, Information Retrieval is getting more and more important. Automatic text classification is one of the most important tools in Information Retrieval. This article proposes a new text classification method called Association Rules Aided Genetic Computing Method (ARGCM). The main contribution includes:1) The Association Rules Aided Genetic Algorithm (ARGA) for text classification,2) Different from existing work, the fitness function are coded under the assistance of the association rules mined by AprioriARGACM algorithm,3) Implementing the genetic procedures: AGACMRouletteSelection, AGACMXover and AGACMbinaryMutation and giving extended experiments.4)The experimental results show that the ARG algorithm is superior to other common methods. A B-Vector with a score 3513.6 can be achieved after running ARG algorithm after 50 generations.
出处
《四川大学学报(工程科学版)》
EI
CAS
CSCD
2004年第3期1-8,共8页
Journal of Sichuan University (Engineering Science Edition)
基金
国家自然科学基金资助项目(60073046)
973资助项目(2002CB111504)
博士点基金资助项目(20020610007)
关键词
关联规则
中文文本分类
遗传算法
关联规则辅助的计算方法
association rules
Chinese text classification
genetic algorithm
natural language processing
ARGCM(Association Rules Aided Computing Method)