摘要
复述(Paraphrase)句的识别可看作文本蕴含(Text Entailment)识别的一个子问题,传统的解决方法是通过词频或句法上的相似度来判断。即使用相同的文字书写的句子其含义也可能差别很大,而相同句法结构也不能保证意义一致。该文根据新闻语料的特点,提出了一种通过引入深层的语义角色标注来帮助识别新闻领域复述句的方法。该方法通过在语义角色这种结构化的含义表达形式中提取的特征来弥补传统方法的不足:先识别待判断的两个句子中所有谓词的语义角色,然后计算两个句子间对应语义角色的相似度,最后结合传统的句子相似度计算方法来进行相似性计算。实验证明,该文提出的方法能有效地提高复述语句的识别效果。
Paraphrase Recognition can be regarded as a sub-problem of Text Entailment Recognition. This problem is difficult in that simply using term frequency or syntax information is prone to error judgment because even the same pack of words can cook up sentences with totally different meanings and similar parsing trees can either have differ- ent meanings. In this paper we present a new approach based on Semantic Role Labeling (SRL) to identify para- phrase. In our approach, we first label sentences with semantic role, and then get features partly representing the meaning of the sentence. By doing so, we also take the specialty of News sentences under consideration. Our experiment proves the effectiveness of our approach.
出处
《中文信息学报》
CSCD
北大核心
2010年第5期3-9,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60975053)
关键词
复述识别
语义角色标注
自然语言处理
natural language processing
semantic role labeling
paraphrase recognition