摘要
20年来中文信息处理取得了巨大成绩 ,这是有目共睹的。当前摆在学界面前的一个重要任务是确立全局的战略目标 ,并尽快在一些社会急需的发展方向上取得实质性的突破。为此 ,首先要澄清某些认识 ,比如中文信息处理是不是一定要在汉语理解的基础上推进 ?对于解决中文信息处理的一些急需课题来说 ,究竟什么方法是最适用的 ?本文首先对国内外自然语言处理的历史作了一个简短的回顾 ,说明从小规模受限语言处理走向大规模真实文本处理 ,是一个不可抗拒的历史潮流。并通过一些具体的实例来说明 :统计语言模型能解决什么问题 ?它为什么在一些有可比评测的课题上连连胜出 ?借此阐明 ,具有统一测试数据和统一计分方法的可比评测是推动科学技术进步的有力杠杆。我们应当拿起这个武器。
Obviously Chinese information processing (CIP) has attained outstanding achievements in the past two decades. The most important task facing the research community today is to establish the strategic objective of CIP, and make essential breakthroughs as soon as possible on certain development directions urgently needed by the society. For this purpose, some ideas need to be clarified first. For example, is it necessary to push forward CIP research based on Chinese language understanding? For those urgently needed CIP projects, what is the most appropriate approach? The paper first makes a brief survey on the international history of natural language processing (NLP), and points out that the moving from small scale restricted NLP to large scale running text processing is an uncontrollable trend. And then through some concrete examples the paper describes what kind of tasks can be solved by statistical language models (SLM), and why they always outperform their competitors under comparable evaluations. The comparable evaluation with uniform testing data and scoring method is a powerful lever for achieving progress of science and technology. Let's arm ourselves with such a weapon.
出处
《语言文字应用》
CSSCI
北大核心
2002年第1期77-84,共8页
Applied Linguistics
关键词
中文信息处理
统计语言模型
Chinese information processing
statistical language mode