摘要
该文针对Cail2020法律多跳机器阅读理解数据集进行研究,提出了TransformerG,一个基于不同层级的实体图结构与文本信息的注意力机制融合的多跳阅读理解模型。该模型有效地结合了段落中问题节点、问题的实体节点、句子节点、句中的实体节点的特征与文本信息的特征,从而预测答案片段。此外,该文提出了一种句子级滑动窗口的方法,有效解决在预训练模型中文本过长导致的截断问题。利用TransformerG模型参加中国中文信息学会计算语言学专委会(CIPS-CL)和最高人民法院信息中心举办的“中国法研杯”司法人工智能挑战赛机器阅读理解赛道,取得了第2名的成绩。
Focused on the Cail2020 multi-hop machine reading comprehension data set,this paper presents TransformerG,a multi-hop reading comprehension model based on the integration of paragraph graph structure and attention mechanism.This model combines the feature of question node,question entity node,sentence node and sentence entity node in the text to predict the answer span.In addition,a sentence level sliding window method is designed to substitute the truncation of long text in the pre-training model.The proposed TransformerG model ranks Top 2 in the machine reading comprehension setting of Cail2020 Competition.
作者
朱斯琪
过弋
王业相
余军
汤奇峰
邵志清
ZHU Siqi;GUO Yi;WANG Yexiang;YU Jun;TANG Qifeng;SHAO Zhiqing(Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;Business Intelligence and Visualization Research Center,National Engineering Laboratory for Big Data Distribution and Exchange Technologies,Shanghai 200237,China;Shanghai Engineering Research Center of Big Data&Internet Audience,Shanghai 200072,China)
出处
《中文信息学报》
CSCD
北大核心
2022年第11期148-155,168,共9页
Journal of Chinese Information Processing
基金
上海市科学技术委员会科研计划项目(22DZ1204903,22511104800)。
关键词
层级图结构
多跳机器阅读理解
法研杯
hierarchical graph structure
multi-hop machine reading comprehension
Cail2020