摘要
阅读理解问答系统指的是能够自动分析一个自然语言文章,并且根据文中的信息为每个问题生成一个答案的系统,具有很高的研究价值。然而,缺乏中文阅读理解语料库已经成为制约汉语阅读理解问答系统发展的主要障碍。本文对于中文阅读理解语料库的构建过程进行了详细的介绍,包括语料选材、编写问句,标注答案句、语料加工和评测机制,尤其是基于汉语框架语义知识库对语料进行了框架元素、短语类型和句法功能三个层面标注的深加工技术。
A Question Answering System for Reading Comprehension (QARC) can automatically analyze a passage of natural language text and generate an answer for each question based on information in the passage. The reading comprehension task can be a valuable tool to evaluate the performance of a natural language understanding system. Unfortunately, insufficiency of Chinese Reading Comprehension Corpus(CRCC) is the main problem to the research and development of Chinese QARC. The paper describes in detail the process of building a Chinese Reading Compre- hension Corpus (CRCC), including materials selecting, questions compiling, answers labeling, corpus processing and evaluation methods. In particular, we annotated texts on such three layers as frame element, phrase type and syntactic function, based on the knowledge base of Chinese FrameNet (CFN).
出处
《中文信息学报》
CSCD
北大核心
2007年第6期29-35,共7页
Journal of Chinese Information Processing
基金
国家863高技术研究发展计划资助项目(2006AA01Z142)
关键词
计算机应用
中文信息处理
阅读理解问答系统
中文阅读理解语料库
汉语框架语义知识库
computer application
Chinese information processing
question answering system for reading comprehension
Chinese reading comprehension corpus
Chinese framenet