基于GA和信息熵的文本分类规则抽取方法被引量：1

Extraction Method of Text Classification Rule Based on Genetic Algorithm and Information Entropy

下载PDF

导出

摘要文本分类是文本数据挖掘中一个非常重要的技术,已经被广泛地应用于信息管理、搜索引擎、推荐系统等多个领域。现有的文本分类方法,大多是基于向量空间模型的算法。这些算法很难适用于大规模的文本数据集。为此,我们提出了一种基于遗传算法和信息熵的文本分类规则抽取方法。在该方法中,信息熵技术用来辅助遗传算法初始种群的生成。遗传算法和信息熵的有效集成,极大地提高了该混合方法的分类效率。实验结果表明,本文方法适用于大规模文本数据集;该方法提取规则的分类正确率较高,分类速度较快。 Text classification is a very important technique in the field of text mining, and it has been widely applied to the information management, search engine, recommendation systems, and some other fields. Most classification methods are based on vector models, these approaches are highly complicated on computation, and cannot be used on the occasion of classifying a large number of samples. For this reason, a hybrid approach combining genetic algorithm with information entropy is presented for text classification rule extraction. In this hybrid approach, the information entropy technique is applied to assist the generation of initial populations for genetic algorithm. The classification performance of the proposed approach has been improved largely by integrating genetic algorithm with information entropy effectively. The proposed approach can be applied to classify a large number of samples. Experimental results show that both the accuracy and the speed of categorization are high.

作者邹国平彭梅香黄国兵

机构地区新余高等专科学校 [

出处《微计算机信息》北大核心 2008年第27期268-270,共3页 Control & Automation

关键词文本分类遗传算法信息熵文本挖掘 Text classification genetic algorithm information entropy text mining

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献4

1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量：20
2T.M. Cover, J.A. Thomas. Elements of Information Theory [M]. New York: Wiley, 1991, 20-31.
3R. Bekkerman, R. EI-Yaniv, N. Tishby, et al. On Feature Distributional Clustering for Text Categorization [A]. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval [C], 2001, New Orleans, Louisiana: ACM Press, 146-153.
4张启蕊,张凌,董守斌,谭景华.基于免疫算法的文本分类研究[J].微计算机信息,2007(24):210-212. 被引量：6

二级参考文献15

1杨丽华,戴齐,杨占华.文本分类技术研究[J].微计算机信息,2006(05X):209-211. 被引量：13
2周水庚.[D].上海:复旦大学,2000.
3王建会胡运发.基于等效半径的文本分类算法．技术报告:021011346[R].复旦大学,2002..
4C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery,1998, 2(2): 955--974.
5R. Schapire, Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 2000, 39(2/3) : 135-- 168.
6Y. Dasarathy B. V. Minimal consistent set (MCS) identification for optimal nearest neighbor decision system terms design. IEEE Trans. on System Man Cybern, 1994, 24(3): 511-517.
7W. Lam, C. Y. Ho. Using a generalized instance set for automatic text categorization. The 21st Ann. Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval(SIGIR'98), Melbourne, Australia, 1998.
8Fuchun Peng, Dale Schuurmans. Self-supervised Chinese word segmentation. The 4th International Symposiun on Intelligent Data Analysis(IDA 2001), Cascais, Portugal, 2001.
9R. W. Sproat, et al.. A stochastic finite-state wordsegmentation algorithm for Chinese. Computational Linguistics,1996, 22(3): 377--404.
10Thomas Emerson. Segmenting Chinese in unicode. The 16th Int'l Unieode Conf., Amsterdam, Holland, 2000.

共引文献24

1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：387
2张羿,周建国,晏蒲柳.垃圾邮件过滤系统的研究与实现[J].计算机工程,2006,32(18):106-108. 被引量：9
3吕佳.文本分类中基于方差的改进特征提取算法[J].计算机工程与设计,2007,28(24):6039-6041. 被引量：5
4张博锋,白冰,苏金树.基于自训练EM算法的半监督文本分类[J].国防科技大学学报,2007,29(6):65-69. 被引量：17
5付相君,彭颖红.基于设计过程捕获的轻量级设计历史技术研究[J].中国机械工程,2008,19(15):1842-1846. 被引量：1
6许幸,张启蕊.基于KNN算法的医药信息文本分类系统的研究[J].计算机技术与发展,2009,19(4):206-209. 被引量：6
7徐沛娟,李雄飞,惠玥,张桂林.中文文本分类相关算法的研究与实现[J].吉林大学学报（理学版）,2009,47(4):790-794. 被引量：13
8刘赫,刘大有,裴志利,高滢.基于多种群协同优化的文本分类规则抽取方法[J].自动化学报,2009,35(10):1334-1340. 被引量：4
9向永生,刘燕婷,徐家宁.基于K均值和aiNet的两阶段文本聚类算法[J].微计算机信息,2009,25(30):186-187.
10何俊杰,陆军.改进WEB数据挖掘方法及其在个性化推荐中的应用[J].科技管理研究,2010,30(6):239-241. 被引量：2

同被引文献14

1余燕芳,陆军.基于改进遗传算法的服务器端负载均衡算法[J].微电子学与计算机,2007,24(7):146-148. 被引量：6
2Cai Y, Cercone N, Hart J. Attribute-oriented Induction in relational databases, Knowledge Discovery in Databases [M]. Cambridge, MA: MIT Press, 1991.
3Han J, Fu Y. Attribute-oriented induction in data mining, advances in knowledge discovery and data mining [M]. Cambridge, MA : MIT Press, 1996.
4Koonce D A, Tsai S C. Using data mining to find patterns in genetic algorithm sotutlons to a joh shop schedule [J]. Computers & Industrial. Engineering, 2000, 38(2): 361-374.
5Chi Z, Nelson P C, Xiao W M, et al. An intelligent data mining system for drop test analysis of electronic products [J]. IEEE Transactions on Electronics Packaging Manufacturing, 2001,24(3 ) : 222-231.
6Kusiak A. Feature transformation methods in data mining [J]. IEEE Transactions on Electronics Packaging Manufacturing, 2001, 24 (3): 214-221.
7Baker J E. Adaptive selection methods for genetic algorithms [C] //Lawrence Erlbaum Associates. International conference on genetic algorithms and their applications. Pittsburgh, PA: 1985.
8Holland J H. Adaptation in natural and artificial systems : An introductory analysis with applications to biology, control, and artificial intelligence [M]. Cambridge, MA: The MIT Press, 1989.
9Blackstone J H, Philips D T, Hogg G L. A state-of-the-art survey of dispatching rules for manufacturing job shop operations [J]. International Journal of Production Research, 1982, 20( 3 ) : 27-45.
10Rabelo L, Jones A, Yih Y. A hybrid approach using neural networks, simulation, genetic algorithms, and machine learning for real-time sequencing and scheduling problem, Practical Handbook of Genetic Algorithms [M]. Boca Raton, FL: CRC Press, 1999.

引证文献1

1肖伟平,何宏.基于遗传算法的数据挖掘方法及应用[J].湖南科技大学学报（自然科学版）,2009,24(3):82-86. 被引量：7

二级引证文献7

1秦仲篪,李海涛,李勇,肖鹏辉.供应链物流信息系统研究综述[J].物流技术,2010,29(8):117-120. 被引量：3
2王恩,束龙仓,刘丽红,黄币娟.基于改进支持向量回归的岩溶天窗水位预测模型[J].河海大学学报（自然科学版）,2011,39(1):20-23. 被引量：3
3董朝阳,陈珂,葛新.基于CMDB的ITIL决策支持研究[J].机械设计与制造,2011(9):266-268. 被引量：3
4王会金.中观信息系统审计风险控制体系研究——以COBIT框架与数据挖掘技术相结合为视角[J].审计与经济研究,2012,27(1):16-23. 被引量：24
5张瑜,娄卉芳,文良浩,熊颉.一种改进的遗传算法交叉策略[J].湖南科技大学学报（自然科学版）,2012,27(1):94-97. 被引量：13
6孙林,陈德鸿,王明煌,蒋洁琼.基于GA的Tokamak聚变堆芯参数优化方法研究[J].核科学与工程,2017,37(1):73-79.
7王一敏,梁治钢.基于免疫遗传算法的抗菌药物数据挖掘[J].计算机系统应用,2017,26(3):156-161. 被引量：6

1孟庆春,王汉萍,魏天滨,葛艳,高云.一种基于粗糙集的文本分类规则抽取方法[J].青岛海洋大学学报（自然科学版）,2003,33(6):943-949. 被引量：3
2康曙光,裴志利,孔英.基于改进遗传算法的WEB文本挖掘系统[J].内蒙古民族大学学报,2009,15(2):13-14.
3汪闯闯,姬东鸿.基于群集智能的CRF与规则结合的中文地址抽取[J].计算机应用研究,2015,32(3):727-730. 被引量：1
4李艳,孙娜欣,赵津,王华超.基于优势-等价关系的几种约简及规则抽取方法[J].计算机科学,2011,38(11):220-224. 被引量：4
5赵志宏,黄蕾,刘峰,骆斌.基于强化学习的多Agent系统规划规则抽取方法[J].广西师范大学学报（自然科学版）,2008,26(1):174-177. 被引量：1
6周志华,何佳洲,尹旭日,陈兆乾.一种基于统计的神经网络规则抽取方法[J].软件学报,2001,12(2):263-269. 被引量：9
7明廷波,左志宏,史永刚,林琳.Web信息抽取中基于神经网络的规则学习方法[J].南京大学学报（自然科学版）,2005,41(z1):1-6. 被引量：1
8童亚拉,陈益.一种基于混沌粒子群算法的网页分类规则抽取方法[J].微电子学与计算机,2009,26(2):193-196. 被引量：2
9孟祥萍,刘大有.基于GA和粗集结合的规则抽取方法研究[J].长春工程学院学报（自然科学版）,2002,3(1):10-13.
10刘赫,刘大有,裴志利,高滢.基于多种群协同优化的文本分类规则抽取方法[J].自动化学报,2009,35(10):1334-1340. 被引量：4

微计算机信息

2008年第27期

浏览历史

内容加载中请稍等...

基于GA和信息熵的文本分类规则抽取方法被引量：1

参考文献4

二级参考文献15

共引文献24

同被引文献14

引证文献1

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于GA和信息熵的文本分类规则抽取方法 被引量：1

参考文献4

二级参考文献15

共引文献24

同被引文献14

引证文献1

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于GA和信息熵的文本分类规则抽取方法被引量：1