摘要
词典是汉语自然语言处理中非常重要的一类资源,它能为汉语词法句法以及语义分析等提供资源支撑。该文采用众包方法构建汉语语义相关性词典,该词典是通过触发词联想的方式间接获取的,因此又称为词汇联想网络。词汇联想网络相比传统词典具有以下特点:(1)获取代价低;(2)面向互联网,易扩展;(3)词语关系从人的认知角度来建立,符合人的直觉。该文详细介绍词汇联想网络的获取方法并对已获取的数据进行分析,另外,将词汇联想网络与《知网》、《同义词词林》以及微博文本ngram进行比较说明其上述特点。
Dictionaries are crucial to the natural language processing. It's a fundamental resource for Chinese word segmentation, POS tagging, parsing and so on. This paper presents a method to build semantic relevance dictionary with crowdsourcing, which is triggered by the word association indirectly. Compared with traditional dictionaries, the so called word association network has following advantages: 1)Low cost; 2)Internet oriented and easy to ex- pend; 3)Word relationship is determined from the perspective of human cognition and is consistent with human intui tion. In addition to describing the way of building word association network, we also analyzed the data obtained, comparing it with Hownet, TongYiCi CiLin and word ngrams from Weiho to show its characteristics.
出处
《中文信息学报》
CSCD
北大核心
2013年第3期100-106,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金重点资助项目(61133012)
国家863重大资助项目(2011AA01A207)
国家863先进技术研究资助项目(2012AA011102)
关键词
众包
语义相关性词典
词汇联想网络
crowdsourcing
semantic relevance dictionary
word association network