摘要
汉语复合名词短语因其使用范围广泛、结构独特、内部语义复杂的特点,一直是语言学分析和中文信息处理领域的重要研究对象。国内关于复合名词短语的语言资源极其匮乏,且现有知识库只研究名名复合形式的短语,包含动词的复合名词短语的知识库构建仍处于空白阶段,同时现有的复合名词短语知识库大部分脱离了语境,没有句子级别的信息。针对这一现状,该文从多个领域搜集语料,建立了一套新的语义关系体系,标注构建了一个具有相当规模的带有句子信息的基本复合名词语义关系知识库。该库的标注重点是标注句子中基本复合名词短语的边界以及短语内部成分之间的语义关系,总共收录27007条句子。该文对标注后的知识库做了详细的计量统计分析。最后基于标注得到的知识库,使用基线模型对基本复合名词短语进行了自动定界和语义分类实验,并对实验结果和未来可能的改进方向做了总结分析。
Chinese compound noun phrases are characterized by their wide range of use,unique syntactic structure and complex internal semantics,which has always been an important research object in the field of linguistic analysis and Chinese information processing.We extend the existing study on noun-only Chinese compound noun phrases into compound noun phrases with verbs,and construct a corpus of Chinese compound noun with semantic relations.A total of 27007 sentences are collected from various fields,and boundary of compound noun phrases in the sentences and its internal semantic relationships are annotated.This corpus is characterized by the context information is first provide for Chinese compound nouns and a new semantic relation system is formed to depict Chinese compound nouns.In addition to a detailed analysis of the corpus,the automatic identification of the Chinese compound nouns with the relationships is investigated by Bert+BiLSTM+CRF framework.The experimental results reveal the challenges of this task and the possible solutions are discussed.
作者
张文敏
李华勇
邵艳秋
ZHANG Wenmin;LI Huayong;SHAO Yanqiu(Information Science School,Beijing Language and Culture University,Beijing 100083,China)
出处
《中文信息学报》
CSCD
北大核心
2019年第12期28-36,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(61872402)
教育部人文社科规划基金(17YJAZH068)
北京语言大学校级项目(中央高校基本科研业务费专项资金)(18ZDJ03)
关键词
汉语基本复合名词短语
语义关系体系
定界识别
Chinese basic compound noun phrases
semantic relational system
delimitation recognition