摘要
高考语文阅读理解问答相对普通阅读理解问答难度更大,同时高考问答任务中的训练数据较少,目前的深度学习方法不能取得良好的答题效果。针对这些问题,该文提出融合BERT语义表示的高考阅读理解答案候选句抽取方法。首先,采用改进的MMR算法对段落进行筛选;其次,运用微调之后的BERT模型对句子进行语义表示;再次,通过SoftMax分类器对答案候选句进行抽取,最后利用PageRank排序算法对输出结果进行二次排序。该方法在北京近十年高考语文阅读理解问答题上的召回率和准确率分别达到了61.2%和50.1%,验证了该方法的有效性。
Reading comprehension Q&A of Chinese college entrance examination is much more difficult than general reading comprehension Q&A,and the training data in the task is relatively small,so the method based on deep learning can not achieve satisfactory results.To solve these problems,this paper proposes an answer candidate sentence extraction method in reading comprehension of college entrance examination based on BERT semantic representation.First,the improved MMR algorithm is used to filter the paragraphs,then the BERT model is applied to represent the sentences semantically,then the softMax classifier is used to extract the answer candidate sentences,and finally we sort the output of the BERT model by PageRank algorithm.The recall and accuracy of our method on Chinese reading comprehension question of Beijing college entrance examination in recent ten years are 61.2%and 50.1%respectively,which proves the effectiveness of our method.
作者
杨陟卓
韩晖
张虎
钱揖丽
李茹
YANG Zhizhuo;HAN Hui;ZHANG Hu;QIAN Yili;LI Ru(School of Computer and Information Technology,Shanxi University,Taiyuan,Shanxi 030006,China;Key Laboratory of Computation Intelligence and Chinese Information Processing,Shanxi University,Taiyuan,Shanxi 030006,China)
出处
《中文信息学报》
CSCD
北大核心
2022年第5期59-66,共8页
Journal of Chinese Information Processing
基金
国家重点研发计划项目(2018YFB1005103)
国家自然科学基金(61772324)
山西省基础研究计划面上项目(20210302123469)
山西省1331工程项目。
关键词
高考阅读理解
自动问答
段落评价
BERT
PAGERANK
reading comprehension of college entrance examination
automatic Q&A
paragraph evalution
BERT
PageRank