摘要
【目的】设计一种自动计算汉语词语抽象度的方法,并将其用在自然语言理解中的隐喻识别任务。【方法】以统计学习理论中逻辑回归为计算模型,把神经网络语言模型获取的词语词向量作为特征,通过构建抽象词库得到特征权重向量,计算汉语词语抽象度。提出一种基于词语抽象度的汉语隐喻识别算法,验证该方法的应用效果。【结果】通过与已有的方法进行实验对比,本文设计的汉语词语抽象度计算方法更接近于人的认知常识;并且在隐喻识别任务中,也体现出更好的准确率。【局限】词语词向量表示词语抽象程度有一些缺陷;抽象词语库的规模影响特征权重向量的学习。【结论】词语抽象度计算可以表现为人对概念的一种抽象分类能力,本文提出的汉语词语抽象度计算方法得到的结果能够较好地拟合人的认知,并且实验证明词语抽象度可有效提高隐喻识别的效果。
[Objective] Design a method to automatically compute Chinese word abstractness, and introduce it into metaphor identification task in natural language understanding. [Methods] The word abstractness is computed by logistic regression model. The features are the word vectors computed by neural network model and the feature weight vectors come from a hand coded abstractness dictionary. A metaphor identification algorithm based on word abstractness is proposed to demonstrate the validity of this method. [Results] By comparing with the existing methods of word abstractness computing, this method has better accordance with human cognition and is an effective method in metaphor identification task. [Limitations] The utilization of word vectors for word abstractness is defective. The scale of the abstract words affects the learning of feature weight vectors. [Conclusions] Word abstractness computing reflects the ability to concept classification, Chinese word abstractness computed by this method is better fitting the human cognition, and the experimental results show that word abstractness can improve the effect of metaphor identification.
出处
《现代图书情报技术》
CSSCI
2015年第4期34-40,共7页
New Technology of Library and Information Service
基金
国家自然科学基金青年基金项目"引入涉身认知机制的汉语隐喻计算模型及其实现"(项目编号:61103101)
国家自然科学基金青年基金项目"基于马尔科夫树与DRT的汉语句群自动划分算法研究"(项目编号:61202281)
教育部人文社会科学研究青年基金项目"面向信息处理的汉语隐喻研究"(项目编号:10YJCZH052)的研究成果之一
关键词
词语抽象度
神经网络语言模型
隐喻识别
Word abstractness
Neural network language model
Metaphor identification