摘要
针对基于深度学习的中文文本分类任务中词向量表示无法充分利用语义信息的问题,提出一种基于知识增强语义表示(enhanced representation through knowledge integration,ERNIE)模型的中文文本分类方法。首先,通过ERNIE模型获得语义表达更充分的分布式文本表示;然后引入深度卷积神经网络对上下文的编码特征进一步提取,以获得更深层次的文本特征表达;最后采用分类器(soft maximum,softmax)实现中文文本分类。在3个公开的中文数据集上进行了多组对比试验,发现本模型与传统基于双向编码器表征量(bidirectional encoder representation from transformers,BERT)的分类模型相比,准确率和F_(1)值分别平均提升了6.34%、4.82%,表明基于ERNIE模型的文本分类方法能有效提高中文文本分类的性能。本方法在多领域中文文本数据集上能够更准确地实现文本的分类,可为后续自然语言处理领域研究提供参考。
In response to the problem that word vector representation can not fully utilize semantic information in the Chinese text classification task based on deep learning,a Chinese text classification method was proposedon the basis of ERNIE(enhanced representation throughknowledge integration)model.First,a more semantically expressive distributed text representation was obtained through the ERNIE model.Then,the deep convolutional neural network was introduced to further extract the encoding features of the context to obtain a deeper representation of the text features.Finally,a classifier(soft maximum,softmax)was used to realize Chinese text classification.A series of comparative experiments were conducted on three published Chinese data sets,and it was found that compared with the traditional classification model based on BERT(bidirectional encoder representation from transformers),this model has raised the accuracy and F_(1) value by 6.34% and 4.82% respectively,indicating that the text classification method based on ERNIE model can effectively improve the performance of Chinese text classification.The proposed method can achieve text classification more accurately on multi-domain Chinese text data sets,and can provide a reference for subsequent research in natural language processing.
作者
毕云杉
钱亚冠
张超华
潘俊
徐庆华
BI Yunshan;QIAN Yaguan;ZHANG Chaohua;PAN Jun;XU Qinghua(School of Sciences,Zhejiang University of Science and Technology,Hangzhou 310023,Zhejiang,China)
出处
《浙江科技学院学报》
CAS
2021年第6期461-468,476,共9页
Journal of Zhejiang University of Science and Technology
基金
科技部重点研发项目(2018YFB2100400)
国家自然科学基金项目(61902082)。
关键词
自然语言处理
文本分类
深度学习
卷积神经网络
ERNIE
natural language processing
text classification
deep learning
convolutional neural network
ERNIE