摘要
文本标准化是对输入文本进行分析 ,生成其中非汉字符号的拼音、节奏等信息的过程。本文提出了一种层次化的、基于外部规则的标准化方法 ,通过规则匹配识别这些符号 ,并给出各种正确信息。本文首先介绍了分析树的概念 ,其次给出构造规则的步骤 ,利用权值控制规则的匹配顺序 ,最后给出实验结果。实验结果表明 :这种方法具有很好的易维护性和可扩展性 ,开放测试的正确率达到 99 76 %。
Text normalization is a procedure to generate information, such as pronunciation, rhythm and so on, for special symbols correctly. In this paper, a method based on hierarchical, external rules is presented. By matching rules, we can recognize normal special symbols and generate correct information. This paper introduces the concept of analysis tree firstly, then shows the steps of constructing rules and presents the experiment results. The results show that we can achieve easy-maintainability and easy-expandability, and the correct rate of open test is 99.76%.
出处
《中文信息学报》
CSCD
北大核心
2003年第4期45-51,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助 (6 9975 0 18)