摘要
本文根据疑问词和谓语的距离信息对问句进行细致的句型分析,然后对答句进行浅层句法分析,在此基础上,抽取出问题特征集、答句特征集和组合特征集作为分类特征,引入最大熵模型和支持向量机训练答案抽取分类器.基于不同特征组合训练得到的分类器在五类事实性问题上进行了测试,其F值分别达到70.87%和85.75%.
This paper first conduets rigorous sentence pattern analysis of questions based on the distance between question word and predicate,and then conduct shallow parse of answer candidate sentences.Based on the analysis, we extract question feature set;answer sentence feature set and combined feature set as our features for answer classification. Then we apply maximum entropy model and support vector machine to these features to train answer classifiers. The F-Measures of the two classifiers' experiment conducted on five kinds of fact-based questions achieve 70.87 % and 85.75 % respectively.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2008年第5期833-839,共7页
Acta Electronica Sinica
基金
国家自然科学基金(No.60673109)
关键词
中文问答系统
句法分析
答案抽取
最大熵模型
支持向量机
Chinese question answering
syntax analysis
answer extraction
maximum entropy model(MEM)
support vector machine ( SVM )