期刊文献+

基于GA和信息熵的文本分类规则抽取方法 被引量:1

Extraction Method of Text Classification Rule Based on Genetic Algorithm and Information Entropy
下载PDF
导出
摘要 文本分类是文本数据挖掘中一个非常重要的技术,已经被广泛地应用于信息管理、搜索引擎、推荐系统等多个领域。现有的文本分类方法,大多是基于向量空间模型的算法。这些算法很难适用于大规模的文本数据集。为此,我们提出了一种基于遗传算法和信息熵的文本分类规则抽取方法。在该方法中,信息熵技术用来辅助遗传算法初始种群的生成。遗传算法和信息熵的有效集成,极大地提高了该混合方法的分类效率。实验结果表明,本文方法适用于大规模文本数据集;该方法提取规则的分类正确率较高,分类速度较快。 Text classification is a very important technique in the field of text mining, and it has been widely applied to the information management, search engine, recommendation systems, and some other fields. Most classification methods are based on vector models, these approaches are highly complicated on computation, and cannot be used on the occasion of classifying a large number of samples. For this reason, a hybrid approach combining genetic algorithm with information entropy is presented for text classification rule extraction. In this hybrid approach, the information entropy technique is applied to assist the generation of initial populations for genetic algorithm. The classification performance of the proposed approach has been improved largely by integrating genetic algorithm with information entropy effectively. The proposed approach can be applied to classify a large number of samples. Experimental results show that both the accuracy and the speed of categorization are high.
出处 《微计算机信息》 北大核心 2008年第27期268-270,共3页 Control & Automation
关键词 文本分类 遗传算法 信息熵 文本挖掘 Text classification genetic algorithm information entropy text mining
  • 相关文献

参考文献4

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2T.M. Cover, J.A. Thomas. Elements of Information Theory [M]. New York: Wiley, 1991, 20-31.
  • 3R. Bekkerman, R. EI-Yaniv, N. Tishby, et al. On Feature Distributional Clustering for Text Categorization [A]. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval [C], 2001, New Orleans, Louisiana: ACM Press, 146-153.
  • 4张启蕊,张凌,董守斌,谭景华.基于免疫算法的文本分类研究[J].微计算机信息,2007(24):210-212. 被引量:6

二级参考文献15

  • 1杨丽华,戴齐,杨占华.文本分类技术研究[J].微计算机信息,2006(05X):209-211. 被引量:13
  • 2周水庚.[D].上海:复旦大学,2000.
  • 3王建会 胡运发.基于等效半径的文本分类算法.技术报告:021011346[R].复旦大学,2002..
  • 4C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery,1998, 2(2): 955--974.
  • 5R. Schapire, Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 2000, 39(2/3) : 135-- 168.
  • 6Y. Dasarathy B. V. Minimal consistent set (MCS) identification for optimal nearest neighbor decision system terms design. IEEE Trans. on System Man Cybern, 1994, 24(3): 511-517.
  • 7W. Lam, C. Y. Ho. Using a generalized instance set for automatic text categorization. The 21st Ann. Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval(SIGIR'98), Melbourne, Australia, 1998.
  • 8Fuchun Peng, Dale Schuurmans. Self-supervised Chinese word segmentation. The 4th International Symposiun on Intelligent Data Analysis(IDA 2001), Cascais, Portugal, 2001.
  • 9R. W. Sproat, et al.. A stochastic finite-state wordsegmentation algorithm for Chinese. Computational Linguistics,1996, 22(3): 377--404.
  • 10Thomas Emerson. Segmenting Chinese in unicode. The 16th Int'l Unieode Conf., Amsterdam, Holland, 2000.

共引文献24

同被引文献14

  • 1余燕芳,陆军.基于改进遗传算法的服务器端负载均衡算法[J].微电子学与计算机,2007,24(7):146-148. 被引量:6
  • 2Cai Y, Cercone N, Hart J. Attribute-oriented Induction in relational databases, Knowledge Discovery in Databases [M]. Cambridge, MA: MIT Press, 1991.
  • 3Han J, Fu Y. Attribute-oriented induction in data mining, advances in knowledge discovery and data mining [M]. Cambridge, MA : MIT Press, 1996.
  • 4Koonce D A, Tsai S C. Using data mining to find patterns in genetic algorithm sotutlons to a joh shop schedule [J]. Computers & Industrial. Engineering, 2000, 38(2): 361-374.
  • 5Chi Z, Nelson P C, Xiao W M, et al. An intelligent data mining system for drop test analysis of electronic products [J]. IEEE Transactions on Electronics Packaging Manufacturing, 2001,24(3 ) : 222-231.
  • 6Kusiak A. Feature transformation methods in data mining [J]. IEEE Transactions on Electronics Packaging Manufacturing, 2001, 24 (3): 214-221.
  • 7Baker J E. Adaptive selection methods for genetic algorithms [C] //Lawrence Erlbaum Associates. International conference on genetic algorithms and their applications. Pittsburgh, PA: 1985.
  • 8Holland J H. Adaptation in natural and artificial systems : An introductory analysis with applications to biology, control, and artificial intelligence [M]. Cambridge, MA: The MIT Press, 1989.
  • 9Blackstone J H, Philips D T, Hogg G L. A state-of-the-art survey of dispatching rules for manufacturing job shop operations [J]. International Journal of Production Research, 1982, 20( 3 ) : 27-45.
  • 10Rabelo L, Jones A, Yih Y. A hybrid approach using neural networks, simulation, genetic algorithms, and machine learning for real-time sequencing and scheduling problem, Practical Handbook of Genetic Algorithms [M]. Boca Raton, FL: CRC Press, 1999.

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部