期刊文献+

基于ERNIE模型的中文文本分类研究 被引量:5

Research on Chinese text classification based on ERNIE model
下载PDF
导出
摘要 针对基于深度学习的中文文本分类任务中词向量表示无法充分利用语义信息的问题,提出一种基于知识增强语义表示(enhanced representation through knowledge integration,ERNIE)模型的中文文本分类方法。首先,通过ERNIE模型获得语义表达更充分的分布式文本表示;然后引入深度卷积神经网络对上下文的编码特征进一步提取,以获得更深层次的文本特征表达;最后采用分类器(soft maximum,softmax)实现中文文本分类。在3个公开的中文数据集上进行了多组对比试验,发现本模型与传统基于双向编码器表征量(bidirectional encoder representation from transformers,BERT)的分类模型相比,准确率和F_(1)值分别平均提升了6.34%、4.82%,表明基于ERNIE模型的文本分类方法能有效提高中文文本分类的性能。本方法在多领域中文文本数据集上能够更准确地实现文本的分类,可为后续自然语言处理领域研究提供参考。 In response to the problem that word vector representation can not fully utilize semantic information in the Chinese text classification task based on deep learning,a Chinese text classification method was proposedon the basis of ERNIE(enhanced representation throughknowledge integration)model.First,a more semantically expressive distributed text representation was obtained through the ERNIE model.Then,the deep convolutional neural network was introduced to further extract the encoding features of the context to obtain a deeper representation of the text features.Finally,a classifier(soft maximum,softmax)was used to realize Chinese text classification.A series of comparative experiments were conducted on three published Chinese data sets,and it was found that compared with the traditional classification model based on BERT(bidirectional encoder representation from transformers),this model has raised the accuracy and F_(1) value by 6.34% and 4.82% respectively,indicating that the text classification method based on ERNIE model can effectively improve the performance of Chinese text classification.The proposed method can achieve text classification more accurately on multi-domain Chinese text data sets,and can provide a reference for subsequent research in natural language processing.
作者 毕云杉 钱亚冠 张超华 潘俊 徐庆华 BI Yunshan;QIAN Yaguan;ZHANG Chaohua;PAN Jun;XU Qinghua(School of Sciences,Zhejiang University of Science and Technology,Hangzhou 310023,Zhejiang,China)
出处 《浙江科技学院学报》 CAS 2021年第6期461-468,476,共9页 Journal of Zhejiang University of Science and Technology
基金 科技部重点研发项目(2018YFB2100400) 国家自然科学基金项目(61902082)。
关键词 自然语言处理 文本分类 深度学习 卷积神经网络 ERNIE natural language processing text classification deep learning convolutional neural network ERNIE
  • 相关文献

参考文献7

二级参考文献43

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:96
  • 3王煜,白石,王正欧.用于Web文本分类的快速KNN算法[J].情报学报,2007,26(1):60-64. 被引量:33
  • 4Lewis D D. Naive Bayes at Forty: The Independence Assumption in Information Retrieval // Proc of the lOth European Conference on Machine Learning. Chemnitz, Germany, 1998 : 4 - 15.
  • 5Cohen W W, Singer Y. Context-Sensitive Learning Methods for Text Categorization// Proc of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Zurich, Switzerland, 1996 : 307 - 315.
  • 6Joaehims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features//Proc of the 10th European Conference on Machine Learning. Chemnitz, Germany, 1998: 137 - 142.
  • 7Nigam K, Lafferty J, McCallum A. Using Maximum Entropy for Text Classification//Proc of the Workshop on Machine Learning for Information Filtering. Stockholm, Sweden, 1999 : 61 - 67.
  • 8Yang Yiming, Liu Xin. A Re-Examination of Text Categorization Methods// Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval. Berkeley, USA, 1999:42-49.
  • 9Sebastiani F. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 2002, 34 ( 1 ) :1- 47.
  • 10Hull D A. Improving Text Retrieval for the Routing Problem Using Latent Semantic Indexing// Proc of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 1994 : 282 - 289.

共引文献511

同被引文献43

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部