摘要
解决多段落中文阅读理解任务需要考虑证据段落的稀疏性、中文语义的多样性和答案片段的有效性.基于此种情况,文中设计多段落中文阅读理解模型,利用数据增强的方式学习不包含答案的段落,利用字级别编码和中文词性标注丰富中文的语义表示,通过答案片段的特征训练答案有效性验证模型.将文中模型应用到CIPS-SOGOU事实类问答数据中,实验表明,完全匹配率和F1分数的平均分均有所提高.
In the Chinese multi-paragraph reading comprehension task,three properties should be taken into account:the sparsity of evidence paragraph,the diversity of Chinese semantic and the validity of answer snippet.To solve these problems,a Chinese multi-paragraph reading comprehension model,CMPReader,is proposed.In CMReader,data augmentation is exploited to learn the paragraphs with no answer.Word level encoding and Chinese word tag are added to enrich the Chinese semantic representation,and the features of answer snippet are employed by the answer verifier model to choose the right answer.CMPReader is applied to the CIPS-SOGOU factoid question answer dataset,and the results show that the average of exact match score and F 1 score are increased.
作者
赵峻瑶
庞亮
苏立新
兰艳艳
郭嘉丰
程学旗
ZHAO Junyao;PANG Liang;SU Lixin;LAN Yanyan;GUO Jiafeng;CHENG Xueqi(Key Laboratory of Network Data Science and Technology,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 100190)
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2019年第2期161-168,共8页
Pattern Recognition and Artificial Intelligence
基金
国家重点研发计划(2016QY02D0405)
国家自然科学基金项目(No.61425016,61472401,61722211,61872338,61773362,20180290)
中国青年创新协会CAS项目(No.20144310,20160280)资助.
关键词
阅读理解
智能问答
数据增强
Reading Comprehension
Question Answer
Data Augmentation