摘要
隐喻识别是自然语言处理的一个重要研究分支。目前人们越来越清楚地认识到隐喻在思维及语言中所处的重要地位。本研究在前人工作的实验和考察基础上,发现基于分类器来识别隐喻的方法存在数据稀疏的问题,即当训练语料中缺少需要识别的源域词数据时,分类的结果将不会太好。应对数据稀疏问题,该文提出了一种基于聚类与分类结合的隐喻短语获取方法。该方法将包含源域词S的短语进行聚类。将聚类的结果作为分类的一类特征。实验表明,使用聚类产生的特征训练出来的分类器,不仅能很好地识别训练语料中存在源域词数据的情况,也能很好地识别训练语料中缺少源域词数据的情况,具有很高的召回率。
Metaphor is popular in any natural language,and metaphor recognition is one of the challenging topics in natural language processing.Existing classification based metaphor recognition methods suffer from data sparsity,which affects the performance of the classification.In this paper,we propose a metaphor phrase recognition method by combining classification and clustering methods to improve the performance.This method firstly conducts the clustering on phrases with source words S,and then uses the clustering results as the features for classification.The classifier also produces a satisfactory performance for those phrases which miss source words.Several experiments show that our methods achieve a high recall rate.
作者
符建辉
王石
曹存根
FU Jianhui;WANG Shi;CAO Cungen(Key Laboratory of Intelligent Information Processing, Institute of Computer Technology, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100190, China)
出处
《中文信息学报》
CSCD
北大核心
2018年第2期22-28,49,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金(30973713
61035004
61173063
61203284
91224006)
国家社科基金(10AYY003)
科技部项目(201303107)
关键词
隐喻短语识别
中文隐喻短语
短语聚类
metaphor phrase recognition
Chinese metaphor phrase
phrases clustering