摘要
文本分类技术是自然语言处理领域的研究热点,其主要应用于舆情检测、新闻文本分类等领域。近年来,人工神经网络技术在自然语言处理的许多任务中有着很好的表现,将神经网络技术应用于文本分类取得了许多成果。在基于深度学习的文本分类领域,文本分类的数值化表示技术和基于深度学习的文本分类技术是两个重要的研究方向。对目前文本表示的有关词向量的重要技术和应用于文本分类的深度学习方法的实现原理和研究现状进行了系统的分析和总结,并针对当前的技术发展,分析了文本分类方法的不足和发展趋势。
Text classification is a research hot spot in the field of natural language processing,which is mainly used in public opinion detection,news classification and other fields.In recent years,artificial neural networks has good performance in many tasks of natural language processing,the application of neural network technology to text classification has also made many achievements.In the field of text classification based on deep learning,numerical representation of text and deep-learning-based text classification are two main research directions.The important technology of word embedding in text representation and the implementation principle and research status of deep learning method applied in text classification were systematically analyzed and summarized.And the shortcomings and the development trend of text classification methods in view of the current technology development were analyzed.
作者
杜思佳
于海宁
张宏莉
DU Sijia;YU Haining;ZHANG Hongli(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《网络与信息安全学报》
2020年第4期1-13,共13页
Chinese Journal of Network and Information Security
基金
国家自然科学基金(61601146,61732022)。
关键词
文本分类
深度学习
人工神经网络
词向量
text classification
deep learning
artificial neural network
word embedding