摘要
与印欧语言不同,汉语的句子往往是由多个分句组成的复句。但目前的中文语义角色的标注语料和标注系统并没有对现代汉语的这个特点给予充分的重视。由于数据稀疏的问题,对于与动词跨分句的论元还没有一个有效的识别方法,直接影响了汉语真实文本语义角色标注的研究。运用统计和规则结合的方法,对与动词跨分句的论元进行识别。先用一条基本的规则识别出大部分的动词的论元,再找到规则识别的薄弱点,运用统计决策树融合多种特征构造模型,以进一步提高识别的准确率。实验结果表明,对于与动词的跨分句的论元,仅仅规则识别的F值就达到了65.3%,使用决策树后,F值提高到67.2%。
Different from European languages,Chinese sentences often contain several clauses.But the up-to-date corpora and systems for Chinese semantic role labeling do not place much emphasis on this trait of modern Chinese.Because of data-sparse problem,people do not have a method to identify the arguments that are not in the same clause with the verb.This paper combines statistical method and rule method to identify the cross-clause arguments.First authors use a basic rule to identify a majority of the arguments,then find the weak spot of rule and use the statistic decision tree to construct the model including many attributes.The experimental results show that the basic rule can achieve the F-score of 65.3%.And the F-score is improved to 67.2% when using statistic decision tree.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第16期40-42,共3页
Computer Engineering and Applications
基金
国家社会科学基金项目(No.07BYY050)
关键词
语义角色标注
跨分句
论元
统计决策树
semantic role labeling
cross-clause
argument
statistic decision tree