期刊文献+

基于TNG特征扩展的MLFM-MN短文本分类算法

An MLFM-MN short text classification algorithm based on TNG feature extension
下载PDF
导出
摘要 在海量短文本中由于特征稀疏、数据维度高这一问题,传统的文本分类方法在分类速度和准确率上达不到理想的效果。针对这一问题提出了一种基于Topic N-Gram(TNG)特征扩展的多级模糊最小-最大神经网络(MLFM-MN)短文本分类算法。首先通过使用改进的TNG模型构建一个特征扩展库并对特征进行扩展,该扩展库不仅可以推断单词分布,还可以推断每个主题文本的短语分布;然后根据短文本中的原始特征,计算这些文本的主题倾向,根据主题倾向,从特征扩展库中选择适当的候选词和短语,并将这些候选词和短语放入原始文本中;最后运用MLFM-MN算法对这些扩展的原始文本对象进行分类,并使用精确率、召回率和F1分数来评估分类效果。实验结果表明,本文提出的新型分类算法能够显著提高文本的分类性能。 Due to the problems of sparse features and high data dimension in short text,traditional text classification methods cannot achieve the desired classification rate and accuracy.Aiming at this problem,we propose a multi-level fuzzy minimum and maximum neural network(MLFM-MN)short text classification algorithm based on topic N-Gram(TNG)feature extension.The algorithm first constructs a feature extension library and extends the features by using the improved TNG model.The extension library can not only infer the word distribution,but also infer the phrase distribution of each topic text,and then calculate these based on the original features in the short text.Appropriate candidate words and phrases are selected from the feature extension library according to topic tendencies,and put into the original text.Finally,the extended text objects are classified by the MLFM-MN algorithm.We use accuracy rate,recall rate and F1 score to evaluate the classification effect.The results show that the proposed algorithm can significantly improve text classification performance.
作者 文武 李培强 郭有庆 WEN Wu;LI Pei-qiang;GUO You-qing(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065;Research Center of New Communication Technology Applications,Chongqing University of Posts and Telecommunications,Chongqing 400065;Chongqing Xinke Design Co.Ltd.,Chongqing 401121,China)
出处 《计算机工程与科学》 CSCD 北大核心 2019年第11期2071-2078,共8页 Computer Engineering & Science
关键词 特征稀疏 TNG模型 模糊神经网络 扩展库 主题倾向 sparse feature TNG model fuzzy neural network extension library topic tendency
  • 相关文献

参考文献4

二级参考文献31

  • 1王元珍,钱铁云,冯小年.基于关联规则挖掘的中文文本自动分类[J].小型微型计算机系统,2005,26(8):1380-1383. 被引量:13
  • 2Metaler D, I)umais S C, Meek C. Similarity Measures for Short Segments of Text[ C ]. In : Proceedings of the 29th European Con- ference on Information Retrieval. Berlin : Springer - Verlag, 2007.
  • 3Sahami M, Heilman T D. A Web -based Kernel Function for Measuring the Similarity of Short Text Snippets [ C ]. In : Proceed- ings of the 15th International World Wide Web Conference Committee (1W3C2) , Edinburgh, Scotland. New York: ACM Press, 2006: 377 - 386.
  • 4Hynek J, Jezek K, Rohlik O. Short Document Categorization - Itemsets Method[ C ]. In : Proceedings of the 4th European Confer- ence on Principles and Practice of Knowledge Discovery in Databas- es, Workshop Machine Learning and Textual luformation Access, Lyon, France. 2000 : 14 - 19.
  • 5Zelikovitz S, Transductive M F. Learning for Short - Text Classifi- cation Problem Using Latent Semantic Indexing Intematiotaal [ J ]. Journal of Pattern Recognition and Artificial Intelligence, 2005, 19 (2) :143 - 163.
  • 6Wang P, Domeniconi C. Building Semantic Kernels for Text Classi- fication Using Wikipedia [ C ]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada,USA. ACM :New York ,2008:713 - 721.
  • 7Wikipedia[ EB/OL]. [2011 - 12 - 08 ]. http://zh, wikipedia. org.
  • 8I ; Saltort G, McGillM J. Introduction to Modern Information Retrieval [M]. New York, NY, USA:McGraw Hill, 1983.
  • 9熊小梅,刘永浪.基于LSA的二次降维法在中文法律案情文本分类中的应用[J].电子测量技术,2007,30(10):111-114. 被引量:8
  • 10缪建明,张全,赵金仿.基于文章标题信息的汉语自动文本分类[J].计算机工程,2008,34(20):13-14. 被引量:2

共引文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部