摘要
关键词自动抽取是自然语言处理(Natural Language Processing,NLP)的一项重要任务,给个性化推荐、网购等应用提供了重要的技术支撑。针对关键词自动抽取问题,提出一种新的基于双向长短期记忆网络条件随机场(Bidirectional Long Short-Term Memory Network Conditional Random Field,BiLSTM-CRF)的方法,并将该问题刻画为序列标注问题。首先,该方法通过对输入的文本进行建模,把文本表示为低维高密度的向量;然后,使用分类算法对各个词进行分类;最后,使用CRF对整个标注序列进行解码,得到最终结果。在一个大规模的真实数据中进行实验,结果表明该方法较基准系统性能提高约1个百分点。
Automatic keyword extraction is an important task of natural language processing(NLP),which provides technical support for personalized recommendation,online shopping and other applications.For the task,a new keyword extraction method based on bidirectional long short-term memory network and conditional random field(BiLSTM-CRF)was proposed.In the method,the extraction task is regarded as the sequence labeling problem.Firstly,the input text is represented as a low-dimensional,high-density vector.Then,a classification algorithm is used to predict the tags of the words.Finally,a CRF layer is used to decode the whole sequence to get the tagging result.Experiments were conducted on large scale real data,and the results show that this way can improve about 1% compared with the base system.
作者
陈伟
吴友政
陈文亮
张民
CHEN Wei1,WU You- zheng2, CHEN Wen- liang1,ZHANG Min1(1School of Computer Sciences and Technology,Soochow University,Suzhou,Jiangsu 215006 ,China;2IQIYI Artificial Intelligence Research Group, Beijng 100080, Chin)
出处
《计算机科学》
CSCD
北大核心
2018年第B06期91-96,113,共7页
Computer Science
基金
国家自然科学基金资助项目(61572338)
江苏省高校自然科学研究重大项目(16KJA520001)
CCF-腾讯科研基金资助
关键词
自然语言处理
关键词抽取
条件随机场
长短期记忆网络
Natural language processing
Keyword extraction
Conditional random field
Long short term memory network