摘要
针对传统的中文文本分类在海量的互联网信息中难以胜任的现状,提出一种语句级的卷积神经网络中文新闻分类方案。通过信息提取算法从长短不一的新闻数据中提取固定大小的新闻摘要,压缩输入量的同时统一输入格式。信息提取时,通过对TF-IDF算法进行改进提升新闻摘要的质量,结合word2vec技术和卷积神经网络完成文本分类任务。与传统方法相比,词向量模型的引入弥补了传统词袋模型的缺陷,且语句的语义远比词的更加全面,使用语句进行分类更加可靠。通过实验对比验证了该方案具有较好的性能。
Aiming at the current situation that traditional Chinese text classification is difficult to be competent in massive Internet information,a sentence-level convolutional neural network Chinese news classification scheme was proposed.A fixed-size news digest was extracted from different lengths of news data using an information extraction algorithm,and the input amount was compressed while unifying the input format.When the information was extracted,the quality of the news digest was improved by improving the TF-IDF algorithm.The word2vec technology and convolutional neural network were combined to complete the text classification task.Compared with the traditional method,on the one hand,the introduction of the word vector model makes up for the defects of the traditional word bag model,on the other hand,the semantics of the sentence are far more comprehensive than that of the word,and the classification is more reliable using the statement.Through experimental comparison,it verifies that the scheme has better performance.
作者
曾凡锋
李玉珂
肖珂
ZENG Fan-feng;LI Yu-ke;XIAO Ke(College of Information Technology,North China University of Technology,Beijing 100144,China)
出处
《计算机工程与设计》
北大核心
2020年第4期978-982,共5页
Computer Engineering and Design
基金
国家重点研发计划基金项目(2017YFB0802300)。
关键词
文本分类
深度学习
卷积神经网络
词向量
TF-IDF算法
信息抽取
text classification
deep learning
convolutional neural network
word vector
TF-IDF algorithm
information extraction