摘要
名词短语一直是中外语言学领域的重要研究对象,近年来在自然语言处理领域也受到了研究者的持续关注。英文方面,已建立了一定规模的名词短语语义关系知识库。但迄今为止,尚未建立相应或更大规模的描述名词短语语义关系的中文资源。该文借鉴国内外诸多学者对名词短语语义分类的研究成果,对大规模真实语料中的基本复合名词短语实例进行试标注与分析,建立了中文基本复合名词短语语义关系体系及相应句法语义知识库,该库能够为中文基本复合名词短语句法语义的研究提供基础数据资源。目前该库共含有18 281条高频基本复合名词短语,每条短语均标注了语义关系、短语结构及是否指称实体等信息,每条短语包含的两个名词还分别标注了语义类信息。语义类信息基于北京大学《现代汉语语义词典》。基于该知识库,该文还做了基本复合名词短语句法语义的初步统计与分析。
As an important linguistic issue,the noun compound has arouse close attention in the NLP community recently.In English,a relatively large-scale noun compound semantic relation knowledge base has been established.To establish the similar Chinese resources,this paper tries to tag and analyze the basic compound nouns in the largescale real corpus,and establishes the basic noun compound semantic relation hierarchy and the corresponding syntax and semantic knowledge base in Chinese.So far,the knowledge base contains 18 281 high-frequency basic noun compounds,each labeled with semantic relation,phrase structure and referential entity information.The two nouns in each noun compound are further annotated for the semantic category according to the SKCC of Peking University.Based on this knowledge base,we also provide preliminary statistics and analysis of syntactic and semantics of basic noun compounds.
作者
刘鹏远
刘玉洁
LIU Pengyuan;LIU Yujie(College of Information Science, Beijing Language and Culture University, Beijing 100083, China)
出处
《中文信息学报》
CSCD
北大核心
2019年第4期20-28,共9页
Journal of Chinese Information Processing
基金
教育部人文社科规划项目(18YJA740030)
北京市自然科学基金(4192057)
国家自然科学基金(61872402)