期刊文献+

基于遗传算法和信息熵的文本分类规则抽取方法研究 被引量:3

Research on Method of Text Classification Rule Extraction Based on Genetic Algorithm and Entropy
下载PDF
导出
摘要 针对数据挖掘中的文本分类问题,提出了一种基于遗传算法和信息熵的文本分类规则抽取算法Genet-ic-Miner(简称GM),该算法的目标是在数据集中发现分类规则。首先利用信息熵生成初始种群,然后利用优化的遗传算法抽取相应规则。采用六个标准的公共领域的数据集比较了GM与其它两个非常著名的同类算法Ant-Miner和CN2,实验结果表明,无论是预测准确性和规则的简单性,GM都明显优于Ant-Miner和CN2,并且该算法能大大提高对知识的理解力。 Aimed at the text classification problems in data mining, a text classification rule extraction method is proposed based on genetic algorithm and entropy for rule discovery called Genetic-Miner (GM). The goal of GM is to discover classification rules in data sets. It produces population with the entropy and then extract classification rule with genetic algorithm. Compared the performance of GM with other tWO well-known algorithms Ant-miner and CN2 in six public domain data sets, the results showed that GM has a better performance in both predictive accuracy and rule list simplicity criteria than Ant-Miner and CN2. It can also mostly improve the comprehensibility of the discovered knowledge.
作者 唐华 曾碧卿
出处 《中山大学学报(自然科学版)》 CAS CSCD 北大核心 2007年第5期18-21,24,共5页 Acta Scientiarum Naturalium Universitatis Sunyatseni
基金 国家自然科学基金资助项目(60573127)
关键词 文本分类规则 知识发现 信息熵 遗传算法 数据挖掘 text classification rule data mining discover knowledge information entropy genetic algorithm
  • 相关文献

参考文献12

  • 1BOSE I, MAHAPATRA R K. Business data mining-a machine learning perspective [ J ]. Information &Management,2001, 39 (3) : 211 - 225.
  • 2王明春,王正欧,张楷,郝玺龙.一种基于CHI值特征选取的粗糙集文本分类规则抽取方法[J].计算机应用,2005,25(5):1026-1028. 被引量:8
  • 3SHIYong-feng ZHAOYan-ping.Comparison of Text Categorization Algorithms[J].Wuhan University Journal of Natural Sciences,2004,9(5):798-804. 被引量:4
  • 4TAN K C, YU Q, LEE T H. A distributed evolutionary classifier for knowledge discovery in data mining [ J ]. IEEE Transactions on Systems, Man and Cybernetics, Part C : Applications and Reviews, 2005, 35 (2) : 131 - 142.
  • 5YANG YIMING. An evaluation of statistical approaches to text categorization [ J ]. Journal of Information Retrieval, 1999 ( 1/2 ) : 67 - 88
  • 6COVER T M, THOMAS J A. Elements of Information Theory[ M]. New York: John Wiley Presss, 1991.
  • 7KOHAVI R, SAHAMI M. Error- based and entropybased discretization of continuous features [ C ]. Proceedings of second international conference on Knowledge Discovery and Data Mining. Menlo Park, USA, 1996.
  • 8QUINLAN J R. C4.5: Programs for Machine Learning [ M ]. San Francisco, CA : Morgan Kaufmann Publishers Inc, 1993.
  • 9CLARK P, NIBLETT T. The CN2 induction algorithm [J]. Machine Learning, 1989, 3(4) : 261 -283.
  • 10CLARK P, BOSWELL R. Rule induction with CN2: Some recent improvements[ C]. Lecture Notes in Artificial Intelligence. Berlin :Springer - Verlag, 1991 : 151 - 163.

二级参考文献5

  • 1SHEN Q. Alexios chouchoulas. A rough-fuzzy approach for generating classification rules[J]. Pattern Recogonition, 2002,(35):2425-2438.
  • 2CHEN YQ.Implementing the k-nearest neighbour rule via a neural network[A]. IEEE International Conference on Neural Networks[C], 1995, vol.1.136-140.
  • 3PAWLAK Z, GRAYMALA-BAUSSE J,Slowinski R. Rough sets[J]. Communications of the ACM, 1995,38(11):89-95.
  • 4HAN J, KAMBR M.DATA MINING: Concepts and techniques[M].Beijing: Higher Education Press, 2001.
  • 5常犁云,263.net,王国胤,263.net,吴渝,263.net.一种基于Rough Set理论的属性约简及规则提取方法[J].软件学报,1999,10(11):1206-1211. 被引量:285

共引文献10

同被引文献44

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部