摘要
针对典型的循环神经网络方法在抽取主题词时因缺少上下文相关的句子级信息而导致识别准确率较低的问题,提出了一种基于双向长短期记忆网络条件随机场(BiLSTM-CRF)模型联合TextRank的主题词抽取方法。首先,利用TextRank对新闻文本进行主题句抽取,再使用双向长短期记忆(BiLSTM)模型获取文本的前后特征,最后使用条件随机场(CRF)完成句子级序列标注,得到主题词。在多组体育类新闻数据集上进行实验,该方法较对照组BiLSTM方法F1值提高约0.8%~5.1%,且用时更短。因此,改进的BiLSTM-CRF方法可显著提升主题词的抽取准确率和效率。
To solve the problem of low recognition accuracy caused by the lack of text context information in typical recurrent neural network for extracting topic words,we proposed a novel method for extracting topic words based on Bidirectional Long Short-Term Memory(BiLSTM)network with Conditional Random Field.Firstly,the topic sentences were extracted from news texts by the TextRank model.Then,the forward and backward characters of texts were obtained by BiLSTM network.Finally,the topic words were sequence-tagged in sentence-level by a Conditional Random Field layer.Experiments were performed on multiple sports news datasets.Compared with the control group of BiLSTM method,the F1 value increases by 0.8%-5.1%.The experimental results show that our method can significantly improve the accuracy and efficiency of topic word extraction.
作者
江逸琪
赵彤洲
柴悦
高佩东
JIANG Yiqi;ZHAO Tongzhou;CHAI Yue;GAO Peidong(School of Computer Science and Engineering,Wuhan Institute of Technology,Wuhan 430205,China)
出处
《武汉工程大学学报》
CAS
2020年第1期102-107,共6页
Journal of Wuhan Institute of Technology
基金
国家自然科学基金(61601176)
武汉工程大学中青年人才项目(Q20191510)
武汉工程大学研究生创新基金(CX2018195)。