期刊文献+

融合LC-Transformer XL文本分类的集成模型

AN INTEGRATED MODEL INTEGRATING LC-TRANSFORMER XL TEXT CLASSIFICATION
下载PDF
导出
摘要 针对文本分类任务中存在数据稀疏、无法捕捉段与段之间的更长距离依赖关系问题,提出一种LC-Transformer XL集成模型。通过LDA主题模型单词与主题的概率分布,对文本进行高频关键词提取,采用CNN算法提取局部特征向量,利用Transformer-XL模型的相对位置编码和循环机制得到全局语义特征,将其提取的局部与全局特征向量融合,在此基础上,通过Softmax分类器进行分类,得到文本分类的结果。实验表明,该模型在THUCNews中文文本数据集上的F1值达到0.9318,准确率达到94.15%,在处理文本分类任务中有较好的表现。 Aiming at the problem of data sparsity in text classification task and being unable to capture the longer distance dependence between segments,this paper proposes a LC-Transformer XL integration model.Through the probability distribution of words and topics in the LDA topic model,high-frequency Keywords:were extracted from the text.CNN algorithm was used to extract local feature vectors,and the relative position encoding and cycling mechanism of the Transformer-XL model were used to obtain global semantic features.The extracted local and global feature vectors were fused.On this basis,the text classification results were obtained through the Softmax classifier.Experimental results show that the F1 value of the model reaches 0.9318 and the accuracy rate reaches 94.15%on THUCNews Chinese text data set,and it has good performance in text classification task.
作者 葛夫勇 雷景生 唐小岚 Ge Fuyong;Lei Jingsheng;Tang Xiaolan(Shanghai University of Electric Power,Shanghai 201300,China)
机构地区 上海电力大学
出处 《计算机应用与软件》 北大核心 2023年第6期118-123,132,共7页 Computer Applications and Software
基金 国家自然科学基金项目(61672337)。
关键词 文本分类 LDA主题模型 卷积神经网络 Transformer-XL 集成模型 Text classification LDA topic model Convolutional neural network Transformer-XL Integrated model
  • 相关文献

参考文献7

二级参考文献47

  • 1J.Alamelu Mangai,V.Santhosh Kumar,S.Appavu alias Balamurugan.A Novel Feature Selection Framework for Automatic Web Page Classification[J].International Journal of Automation and computing,2012,9(4):442-448. 被引量:3
  • 2张宁,贾自艳,史忠植.使用KNN算法的文本分类[J].计算机工程,2005,31(8):171-172. 被引量:98
  • 3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:121
  • 4Salton G,Lesk M E.Computer Evaluation of Index and Text Processing. Association for Computing Machinery,1968,15(1).
  • 5Maron M E. On Relevance,Probabilistic Indexing and Information Retrieval. Journal of the ACM,1960,7(3).
  • 6Lewis D D. Feature Selection and Feature Extraction for Text Categorization. In Proceedings of Speech and Natural Language Workshop. Defense Advanced Research Projects Agency,Morgan Kaufmann,1992-02:212-217.
  • 7Yang Yiming,Liu Xin. A Re-examination of Text Categorization Methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR),1999:42-49.
  • 8Hotho A, Staab S, Stumme G. Ontologies Improve Text Document Clustering[ C ]. In : Proceedings of the 3rd IEEE International Con- ference on Data Mining ( ICDM' 03 ). Washington, D C : IEEE Computer Society, 2003:541 -544.
  • 9Pinto D, Rosso P, Benajiba Y, et al. Word Sense Induction in the Arabic Language: A Self- Term Expansion Based Approach [ C ]. In: Proceedings of the 7 th Conference on Language Engineering of the Egyptian Society of Language Engineering ( ESOLE 2007 ). 2007 : 235 - 245.
  • 10Banerjee S, Ramanathan K, Gupta A. Clustering Short Texts Using Wikipedia[ C]. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). New York: ACM, 2007:787-788.

共引文献225

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部