基于复述模型的词语替代方法

Lexical Substitution Based on Paraphrase Modeling

下载PDF

导出

摘要词语替代任务旨在为句子中的目标词寻找合适的替代词。基于预训练语言模型BERT的词语替代方法直接利用目标词的上下文信息生成替代候选词。由于标注数据资源的缺乏使得研究人员通常采用无监督的方法,这也限制了预训练模型在此任务上的适用性。考虑到现有的大规模复述语料中包含了大量的词语替代规则,该文提出一种通过复述模型生成替代候选词的方法。具体的做法是:利用复述语料训练一个神经复述模型;提出了一种只关注目标词变化的解码策略,用于从复述模型中生成替代词;根据文本生成评估指标计算替代词对原句意思的改变程度,对替代词排序。相对已有的词语替代方法,在两个广泛使用的数据集LS07和CoInCo上进行评估,该文提出的方法取得了显著的提高。 Lexical substitution(LS)aims at finding an appropriate substitute for a target word in a sentence.In contrast to the BERT-based LS,this paper proposes a method to generate substitution candidates base on paraphrase to utilize the existing large-scale paraphrase corpus which contains a large number of rules of word substitution.Specifically,we first employ a paraphrase dataset to train a neural paraphrase model.Then,we propose a special decoding method to focus only on the variation of the target word to extract substitute candidates.Finally,we rank substitute candidates for choosing the most appropriate substitution without modifying the meaning of the original sentence based on text generation evaluation metrics.Compared with existing state-of-the-art methods,experimental results show that our proposed methods achieve the best results on two widely used benchmarks(LSo7 and ColnCo).

作者强继朋陈宇李杨李云吴信东 QIANG Jipeng;CHEN Yu;LI Yang;LI Yun;WU Xindong(School of information Engineering,Yangzhou University,Yangzhou,Jiangsu 225127,China;Key Laboratory for Knowledge Engineering with Big Data(Hefei University of Technology),Ministryof Education,Hefei,Anhui 230009,China;College of Computer Science and Information Technology,Hefei University of Technology,Hefei,Anhui230009,China)

机构地区扬州大学信息工程学院大数据知识工程教育部重点实验室(合肥工业大学) 合肥工业大学计算机与信息学院

出处《中文信息学报》 CSCD 北大核心 2023年第5期22-31,43,共11页 Journal of Chinese Information Processing

基金国家自然科学基金(62076217,61703362) 扬州大学“青蓝工程”资助项目。

关键词词语替代复述模型预训练模型 lexical substitution paraphrase modeling pretrained model

分类号 TP391 [自动化与计算机技术—计算机应用技术]