摘要
高考阅读理解试题语言复杂度高、自动答题难度大,是机器阅读理解领域极具难度的研究对象。问答题是高考阅读理解试题一类重要题型,答案通常分布在阅读材料的不同段落,现有抽取式阅读理解方法并未考虑该情况。同时,材料中答案句的数量通常远小于非答案句,存在不平衡特性。基于此,采用“段落筛选器—答案句抽取”答题框架,计算段落与问题的相关性得分对段落进行排序,为每个问题选择与之最相关的前k个段落,采用数据增强方法扩充数据集中的答案句,解决答案句较少的问题。实验结果表明,所提出的方法召回率达到51.76%,相较于基线模型RoBERTa提升11.37%。
The college entrance examination reading comprehension test is a very difficult research object in the field of machine reading com⁃prehension because of its high language complexity and great difficulty in automatic answer.Essay question is an important type of reading com⁃prehension questions in the college entrance examination.The answers are usually distributed in different paragraphs of reading materials.The existing extraction reading comprehension methods do not consider this situation.At the same time,the number of answer sentences in the ma⁃terials is usually much smaller than that of non answer sentences,which is unbalanced.Based on this,the answer frame of"paragraph filter-answer sentence extraction"is adopted to calculate the correlation score between paragraphs and questions,sort the paragraphs,select the most relevant first[k]paragraphs for each question,and use the data enhancement method to expand the answer sentences in the data set to solve the problem of fewer answer sentences.The experimental results show that the recall rate of the proposed method is 51.76%,which is 11.37%higher than the baseline model RoBERTa.
作者
贺文静
张虎
HE Wen-jing;ZHANG Hu(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China)
出处
《软件导刊》
2022年第8期1-6,共6页
Software Guide
基金
国家自然科学基金项目(61806117)
国家重点基础研究发展计划项目(2018YFB1005103-3)
山西省自然科学基金项目(201901D111028)。
关键词
答案句抽取
机器阅读理解
问答题
自然语言处理
answer sentence extraction
machine reading comprehension
essay question
natural language processing