摘要
多跳阅读理解是自然语言处理研究领域的热点和难点,其研究在文本理解、自动问答、对话系统等方面具有重要意义和广泛应用。针对当前面向中文的多跳阅读理解(Multi-Hop Reading Comprehension,MHRC)研究不足的现状,构建了一个面向复杂问题的中文多跳阅读理解(Complex Chinese Machine Reading Comprehension,Complex CMRC)数据集,提出了一种基于问题分解的中文MHRC方法。该方法分为问题分解和问题求解两个阶段:首先提出了一种融合JointBERT模型和规则的复杂问题分解方法,通过JointBERT模型对问题类型识别和问题片段识别联合建模,获得准确的问题类型和问题片段信息,再利用专门设计的问题分解规则将复杂问题分解为多个简单子问题;然后采用BERT预训练模型对所有子问题进行迭代求解,最终获得复杂问题的答案。分别在Complex CMRC数据集上进行问题分解和问题求解实验,取得了良好的实验结果,验证了提出方法的有效性。
Multi-Hop Reading Comprehension(MHRC) is a hot and difficult task in the field of natural language processing,and its research is importantly and widely used in text understanding,automatic question answering,and dialogue systems.To address the current lack of research on Chinese-oriented MHRC,a Chinese MHRC dataset for complex question was constructed and a Chinese MHRC method based on question decomposition was proposed.The method was divided into two stages:Firstly,a complex question decomposition method integrating JointBERT model and its rules was proposed to jointly model the question type identification and the question fragment identification by JointBERT model to obtain accurate question type and question fragment information,and then the specially designed question decomposition rules were used to decompose the complex question into multiple simple sub-questions.Secondly,the BERT pre-training model was utilized to iteratively solve all the subquestions and finally obtain the answer of the complex question.The question decomposition and question solving experiments were conducted on the Complex CMRC dataset respectively which verify the effectiveness of the proposed method.
作者
樊睿文
白宇
蔡东风
FAN Rui-wen;BAI Yu;CAI Dong-feng(Human-Computer Intelligence Research Center,Shenyang Aerospace University,Shenyang 110136,China)
出处
《沈阳航空航天大学学报》
2023年第2期63-73,共11页
Journal of Shenyang Aerospace University
基金
国家自然科学基金(项目编号:U1908216)
教育部人文社会科学研究青年基金(项目编号:17YJCZH003)。
关键词
多跳阅读理解
复杂问题分解
预训练模型
数据集构建
问题求解
Multi-Hop Reading Comprehension
complex question decomposition
pre-trained mod‐els
dataset construction
question solving