摘要
该文提出汉语的块依存语法,以谓词为核心,以组块为研究对象,在句内和句间寻找谓词所支配的组块,构建句群级别的句法分析框架。这一操作可提升叶子节点的语言单位,并针对汉语语义特点进行了分析方式和分析规则上的创新,能够较好地解决微观层次的逻辑结构知识,并为中观论元知识和宏观篇章知识打好基础。该文主要介绍了块依存语法理念、表示、分析方法及特点,并简要介绍了块依存树库的构建情况。截至2020年8月,树库规模为187万字符(4万复句、10万小句),其中包含67%新闻文本和32%百科文本。
This paper proposes a Chinese chunk-based dependency grammar(CCDG),which is focused on the chunks governed by the predicates within and between sentences.As an effort in establishing a syntactic analysis framework at the level of sentence group,the CCDG propose a novel idea to enlarge the linguistic granularity of leaf nodes.It can solve the logical structure knowledge at the micro level and pave a foundation for the meso argument knowledge and macro textual knowledge.This paper presents the concept,representation,analysis method and characteristics of CCDG,as well as the development of corresponding tree-bank.By August,2020,the treebank was scaled up to 1.87 million tokens(including 40,000 complex sentences and 100,000 sub-sentences),consisting of 67%news texts and 32%encyclopedia texts.
作者
钱青青
王诚文
王贵荣
饶高琦
荀恩东
QIAN Qingqing;WANG Chengwen;WANG Guirong;RAO Gaoqi;XUN Endong(School of Information Science,Beijing Language and Culture University,Beijing 100083,China;MOE Key Loboratory of Computational Linguistics,Peking University,Beijing 100871,China;Research Institute of International Chinese Language Education,Beijing Language and Culture University,Beijing 100083,China)
出处
《中文信息学报》
CSCD
北大核心
2022年第8期20-28,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(62076038)。
关键词
组块
依存
依存语法
谓词
chunk
dependency
dependency grammar
predicate