摘要
目前的文本分类大多使用词向量,且词向量大多由Word2vec,Glove等方法训练得到,其存在的问题是部分文本中词语的数量较多而训练速度较慢,且准确率受到切词的影响。由于中文字词和英文差异较大,提出结合Bert字向量的文本分类方法。Bert是一个由Google提出的以Transformer为基础的自然语言处理通用模型,它提供了汉语字符级别的词向量即字向量。利用Bert字向量并使用卷积神经网络对新闻进行文本分类。在准确率较高的情况下,其效率高于结合词向量的文本分类方法。
Most of the current text classification use word vectors,and the word vectors are mostly trained by Word2vec,Glove and other methods.Because of the large?difference between Chinese and English,a text classification method of combining Bert word vectors is proposed.Bert is a general-purpose model of natural language processing Based on Transformer proposed by Google.It provides Chi⁃nese character-level word vectors,called character vectors.Using the Bert character vector and Convolutional Neural Networks text clas⁃sification to classify the news text,the efficiency is higher than the text classification method combining the word vector in the case of high accuracy.
作者
刘凯洋
LIU Kai-yang(Northeast Normal University,Changchun130000,China)
出处
《电脑知识与技术》
2020年第1期187-188,共2页
Computer Knowledge and Technology
基金
中央高校基本科研业务专项资金项目“利用深度学习实现多功能文本处理器”(项目编号201910200111002)
关键词
Bert
CNN
文本分类
字向量
新闻
Bert
Convolutional Neural Networks text classification
character vector
news