基于TNG特征扩展的MLFM-MN短文本分类算法

An MLFM-MN short text classification algorithm based on TNG feature extension

下载PDF

导出

摘要在海量短文本中由于特征稀疏、数据维度高这一问题,传统的文本分类方法在分类速度和准确率上达不到理想的效果。针对这一问题提出了一种基于Topic N-Gram(TNG)特征扩展的多级模糊最小-最大神经网络(MLFM-MN)短文本分类算法。首先通过使用改进的TNG模型构建一个特征扩展库并对特征进行扩展,该扩展库不仅可以推断单词分布,还可以推断每个主题文本的短语分布;然后根据短文本中的原始特征,计算这些文本的主题倾向,根据主题倾向,从特征扩展库中选择适当的候选词和短语,并将这些候选词和短语放入原始文本中;最后运用MLFM-MN算法对这些扩展的原始文本对象进行分类,并使用精确率、召回率和F1分数来评估分类效果。实验结果表明,本文提出的新型分类算法能够显著提高文本的分类性能。 Due to the problems of sparse features and high data dimension in short text,traditional text classification methods cannot achieve the desired classification rate and accuracy.Aiming at this problem,we propose a multi-level fuzzy minimum and maximum neural network(MLFM-MN)short text classification algorithm based on topic N-Gram(TNG)feature extension.The algorithm first constructs a feature extension library and extends the features by using the improved TNG model.The extension library can not only infer the word distribution,but also infer the phrase distribution of each topic text,and then calculate these based on the original features in the short text.Appropriate candidate words and phrases are selected from the feature extension library according to topic tendencies,and put into the original text.Finally,the extended text objects are classified by the MLFM-MN algorithm.We use accuracy rate,recall rate and F1 score to evaluate the classification effect.The results show that the proposed algorithm can significantly improve text classification performance.

作者文武李培强郭有庆 WEN Wu;LI Pei-qiang;GUO You-qing(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065;Research Center of New Communication Technology Applications,Chongqing University of Posts and Telecommunications,Chongqing 400065;Chongqing Xinke Design Co.Ltd.,Chongqing 401121,China)

机构地区重庆邮电大学通信与信息工程学院重庆邮电大学通信新技术应用研究中心重庆信科设计有限公司

出处《计算机工程与科学》 CSCD 北大核心 2019年第11期2071-2078,共8页 Computer Engineering & Science

关键词特征稀疏 TNG模型模糊神经网络扩展库主题倾向 sparse feature TNG model fuzzy neural network extension library topic tendency

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1刘婧,姜文波,邵野.基于机器学习的文本分类技术研究进展[J].电脑迷,2018(6):26-26. 被引量：8
2薛涛,王雅玲,穆楠.基于词义消歧的卷积神经网络文本分类模型[J].计算机应用研究,2018,35(10):2898-2903. 被引量：15
3张磊.文本分类及分类算法研究综述[J].电脑知识与技术,2016,12(12):225-226. 被引量：12
4范云杰,刘怀亮.基于维基百科的中文短文本分类研究[J].现代图书情报技术,2012(3):47-52. 被引量：34

二级参考文献31

1王元珍,钱铁云,冯小年.基于关联规则挖掘的中文文本自动分类[J].小型微型计算机系统,2005,26(8):1380-1383. 被引量：13
2Metaler D, I)umais S C, Meek C. Similarity Measures for Short Segments of Text[ C ]. In : Proceedings of the 29th European Con- ference on Information Retrieval. Berlin : Springer - Verlag, 2007.
3Sahami M, Heilman T D. A Web -based Kernel Function for Measuring the Similarity of Short Text Snippets [ C ]. In : Proceed- ings of the 15th International World Wide Web Conference Committee (1W3C2) , Edinburgh, Scotland. New York: ACM Press, 2006: 377 - 386.
4Hynek J, Jezek K, Rohlik O. Short Document Categorization - Itemsets Method[ C ]. In : Proceedings of the 4th European Confer- ence on Principles and Practice of Knowledge Discovery in Databas- es, Workshop Machine Learning and Textual luformation Access, Lyon, France. 2000 : 14 - 19.
5Zelikovitz S, Transductive M F. Learning for Short - Text Classifi- cation Problem Using Latent Semantic Indexing Intematiotaal [ J ]. Journal of Pattern Recognition and Artificial Intelligence, 2005, 19 (2) :143 - 163.
6Wang P, Domeniconi C. Building Semantic Kernels for Text Classi- fication Using Wikipedia [ C ]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada,USA. ACM :New York ,2008:713 - 721.
7Wikipedia[ EB/OL]. [2011 - 12 - 08 ]. http://zh, wikipedia. org.
8I ; Saltort G, McGillM J. Introduction to Modern Information Retrieval [M]. New York, NY, USA:McGraw Hill, 1983.
9熊小梅,刘永浪.基于LSA的二次降维法在中文法律案情文本分类中的应用[J].电子测量技术,2007,30(10):111-114. 被引量：8
10缪建明,张全,赵金仿.基于文章标题信息的汉语自动文本分类[J].计算机工程,2008,34(20):13-14. 被引量：2

共引文献65

1徐畅,周志平,赵卫东.基于深度学习的回复类型预测聊天机器人[J].计算机应用研究,2020,37(S01):213-214.
2姚学恒,张萍,闫立伟,操诚.基于机器学习的企业秘密文档自动分类方法[J].产业与科技论坛,2020,19(7):44-45.
3张倩,刘怀亮.一种基于半监督学习的短文本分类方法[J].现代图书情报技术,2013(2):30-35. 被引量：6
4赵辉,刘怀亮.一种基于维基百科的中文短文本分类算法[J].图书情报工作,2013,57(11):120-124. 被引量：16
5赵辉,刘怀亮.面向用户生成内容的短文本聚类算法研究[J].现代图书情报技术,2013(9):88-92. 被引量：6
6赵辉,刘怀亮.面向社区问答的中文短文本分类算法研究[J].现代情报,2013,33(10):70-74. 被引量：3
7范云杰,刘怀亮,左晓飞,赵辉.社区问答中基于维基百科的问题分类方法[J].情报科学,2014,32(10):56-60. 被引量：3
8李华康,孙国梓,胥备,徐向阳,夏春蓉.一种基于知识网络血缘关系的网页分类方法[J].江苏科技大学学报（自然科学版）,2014,28(4):380-386.
9曹逸峰,陈晓伟.基于知识分层提取模型的服务台知识库建设[J].计算机系统应用,2015,24(2):261-265. 被引量：3
10王东,熊世桓.基于同义词词林扩展的短文本分类[J].兰州理工大学学报,2015,41(4):104-108. 被引量：9

1王浩铭.基于句式内容表的电网安全隐患文本分类方法[J].信息通信,2019,0(9):105-106. 被引量：1
2陈雪群.让历史解释有理有据——以'新兴力量的崛起——欧洲联合'为例[J].科学咨询,2019,0(26):96-96.
3常明,康志忠,李敏,李方舟.多特征扩展信息滤波在RGB-D点云数据中的应用[J].遥感信息,2019,34(5):113-119. 被引量：1
4任卓君,陈光,卢文科.基于N-gram特征的恶意代码可视化方法[J].电子学报,2019,47(10):2108-2115. 被引量：8
5王丹丹.小学英语单元主题式教学策略研究[J].校园英语,2019,0(43):181-181. 被引量：1
6王永波.元稹、白居易的郎官经历及其诗文创作[J].东方丛刊,2018,0(2):152-165.
7贾佳,李欢,王代红,郭鹏程,郭星歌.基于神经网络的矿山多源信息融合方法研究[J].煤炭技术,2019,38(10):177-180. 被引量：1
8邱宁佳,沈卓睿,胡小娟,王鹏,高奇.在线学习情感分类模型研究[J].长春理工大学学报（自然科学版）,2019,42(5):102-108. 被引量：1
9谢志炜,冯鸿怀,许锐埼,李慧夫.电力基建施工问题文本分类研究[J].现代信息科技,2019,3(17):17-19. 被引量：1
10万小泉.“词汇运用”题型解读[J].疯狂英语（新策略）,2019,0(11):49-52.

计算机工程与科学

2019年第11期

浏览历史

内容加载中请稍等...

基于TNG特征扩展的MLFM-MN短文本分类算法

参考文献4

二级参考文献31

共引文献65

相关作者

相关机构

相关主题

浏览历史