摘要
《线性单位语法:综合口语与笔语》(Sinclair & Mauranen 2006)摆脱传统语法范畴的束缚,突破传统语法研究的上限单位—句子和下限单位—单个词语,关注由词语自然结合形成的词块以及由词块组成的真实话语。本文尝试在线性单位语法框架下,以语音连贯性作为词块整体性的判断标准,即以停顿信息作为词块的边界标识,将学术口语语料切分为词块,并就词块的复现情况、长度信息,以及词块自动切分方法进行讨论。
'Linear unit grammar: Integrating speech and writing'(Sinclair&Mauranen,2006) is claimed to get rid of the shackles of traditional grammatical categories, break the limit of traditional grammar research units-sentences and single words, and concern about the lexical chunks formed by the natural combination of words and discourse composed of chunks. Withinthis framework, the present study usesspeech consistency as the standard to extract chunks, that is, the pause information is used as the boundary mark to divide the texts in the academic spoken corpus into chunks.The recurrent patterns and the length of the chunks are then analyzed. This paper concludes with a discussion of the automatic segmentation of texts.
出处
《语料库语言学》
2017年第2期86-96,116,共12页
Corpus Linguistics