摘要
选择50个词法和句法特征,进行了大量特征筛选实验,并基于筛选后的特征组合提出了一种融合C4.5和SVM的句义类型识别方法.该方法充分利用C4.5对多重句义的高精度识别和SVM对简单句义、复杂句义的高精度识别的特点,将C4.5与SVM分别识别的结果进行融合处理.给出最终的句义类型识别结果.识别结果表明,在BFS-CTC汉语标注语料库中,选取了4 500个句子,经十折交叉验证,句义类型的识别准确率达到92.1%.
50 lexical and syntax features were chosen to implement selecting experiments of twofeature combinations. Based on those feature combinations, a Chinese sentential semantic recognition method combining C4.5 (decision tree) and SVM was proposed. The method makes full use of the features of high precision of multiple by C4.5 as well as high precision of single and complex by SVM. The final recognition results are determined by synthetic blend of recognition results from C4.5 and SVM, respectively. The experimental data contains 4 500 sentences chosen from Beijing Forest Studio-Chinese Tag Corpus (BFS-CTC). Through ten-fold cross verification, it is concluded that the accuracy rate of proposed method for recognizing sentential semantic type is 92.1 %.
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2012年第10期1036-1041,共6页
Transactions of Beijing Institute of Technology
基金
国家"二四二"计划项目(2005C48)
北京理工大学基础研究基金资助项目(20060142014)
北京理工大学研究生创新资助项目(GC200802)
北京理工大学科技创新计划重大项目培育专项资助项目(2011CX01015)
关键词
自然语言处理
语义分析
句义结构
句义类型识别
natural language processing
semantic parsing
sentential semantic structure
sentential semantic type recognition(SSTR)