摘要
半世纪以来自然语言处理 (NLP)研究取得两点重要认识和三大重要成果 ,即认识到 :(1 )对于句法分析 ,基于单一标记的短语结构规则是不充分的 ;(2 )短语结构规则在真实文本中的分布呈现严重扭曲。换言之 ,有限数目的短语结构规则不能覆盖大规模语料中的语法现象。这与原先的预期大相径庭。NLP技术的发展在很大程度上受到这两个事实的影响。从这个意义上说 ,本领域中称得上里程碑式的成果是 :(1 )复杂特征集和合一语法 ;(2 )语言学研究中的词汇主义 ;(3 )语料库方法和统计语言模型。大规模语言知识的开发和自动获取是NLP技术的瓶颈问题。因此 。
This paper is a brief discussion of the major findings and developments in the field of Natural Language Processing (NLP) in the past 50 years. First, the corpus investigation has shown the following two facts:(1) Single labeled PSG rules are not sufficient for natural language description, and (2) PSG rules have skew distribution in text corpora, i.e. the total number of PSG rules does not seem to be able to cover the language phenomena found in a large corpus, which is out of most linguists' expectation. The development of NLP technology has been under the influence of the two facts mentioned above. And there have been three major breakthroughs and milestones in this field: (1)multiple features and unification based grammars, (2)lexicalism in linguistics research, (3)Statistical Language Modeling (SLM) and corpus based approaches. The latest investigations reveal that the bottleneck problem in the NLP technology is the problem of obtaining and developing large scale linguistic knowledge; therefore, the corpus construction and statistical learning theory become key issues in NLP research and application.
出处
《外语教学与研究》
CSSCI
北大核心
2002年第3期180-187,共8页
Foreign Language Teaching and Research