摘要
汉英篇章平行语料库有助于基于篇章的双语研究.构建了汉英平行语料库,对语料中的汉语及其英语对译中的连接词分别进行了标注和关系分类.其中英文连接词比单语语料上的英文连接词定义广泛,更为复杂.在此语料上,抽取词法、句法和位置信息等特征在英文文本上进行显式篇章关系识别.实验采用最大熵分类方法,获得连接词识别正确率92.5%;抽取英文和对应中文连接词作为特征,获得给定连接词关系分类正确率85.6%.研究结果可为今后的中英篇章关系对比识别提供参考.
Chinese-English discourse parallel corpus contributes to bilingual discourse research.The Chinese-English discourse Parallel Corpus,which annotates conjunctions and relation classification in Chinese corpus and English corpus was constructed in this paper.In this corpus English conjunction definition is wider than traditional conjunction’s,and is more complicate.On this corpus,the paper extracts lexical,syntactic features and location information to identify and classify the explicit discourse relation in the English text.Experiment adopts with maximum entropy classification method to obtain conjunction recognition accuracy of 92.5%;and extracts English and Chinese conjunction as features to obtain given conjunction classification accuracy of 85.6%.The research provides a reference for contrast recognition of Chinese-English discourse relation for future.
作者
冯洪玉
李艳翠
冯文贺
FENG Hongyu;LI Yancui;FENG Wenhe(School of Information Engineering,Henan Institute of Science and Technology,Xinxiang 453003,China;Guangdong University of Foreign Studies,Guangzhou 510420,China)
出处
《河南科技学院学报(自然科学版)》
2019年第5期55-62,共8页
Journal of Henan Institute of Science and Technology(Natural Science Edition)
基金
国家自然科学基金(61502149)
河南省科技计划项目(182102210048)
关键词
显式篇章关系
连接词识别
分类
explicit discourse relation
conjunction recognition
classification