摘要
本文以160万字的宋代名家诗为研究对象,介绍了一个宋诗自动注音系统的设计与实现。系统的资源包括语料库、知识库以及信息库;所采用的多音字自动注音策略有以下三种:条件概率策略、互信息策略以及规则策略。本系统的特色是将现代基于统计的语言模型与宋诗自身的音韵特点相结合来实现宋诗的自动注音。
This paper discusses the design and implementation of an automatic pinyin-tagging system for the famous Song poems,which amounts to 1.6 million Chinese characters. The resource of the system includes:copora,knowledge base and information base. The automatic pinyin tagging strategies include:conditional probability strategy, mutual information strategy and rule strategy. The characteristic of the system is to combine the contemporary probabilistic language model with the phonological characteristics of the ancient poems in the process of automatic pinyin-tagging of Song poems. The experimental result is satisfying.
出处
《中文信息学报》
CSCD
北大核心
1998年第2期44-53,共10页
Journal of Chinese Information Processing
关键词
计算语言学
古籍整理
语料库
自动注音
宋代
诗
Computational Linguistics Ancient Literature Processing Corpus Automatic Pinyin-Tagging