摘要
采用基于统计的方法实现日本人名的识别和翻译系统。将人名的识别转换成序列标注问题,采用条件随机场方法训练识别模型。训练语料在标注时使用S/E(Start/End)标注风格;利用人名在上下文中的关系、人名称呼后缀词和人名字典来设计特征模板。人名翻译时将日本人名分为假名人名和汉字人名,汉字人名的翻译通过建立日本人名常用汉字翻译字典实现;假名人名的翻译通过Moses系统训练翻译模型实现。该系统在人名的识别和翻译测试中都取得了不错统计结果。
This paper implements a system of Japanese names recognition and translation by using statistical ways. In this paper,names recognition problem is transformed to the task of labeling sequential data and the Conditional Random Field way is used to train recognition model. A seven tags label strategy is designed to distinguish the name in article when labeling corpus data; the feature template is built basing on the Japanese context and title for names. In the part of translation Japanese names, Japanese names is seperated to two types, Kana names and Kanji names. A "Kanji name translation dictionary"is built to translating the Japanese Kanji names to Chinese character;and Moses machine translation system is used to deal with the kana names. Japanese names recognition and translation model both have preferably results on the test sets.
出处
《智能计算机与应用》
2012年第1期4-7,共4页
Intelligent Computer and Applications
关键词
人名识别
条件随机场
人名汉字翻译字典
Name Recognition
Conditional Random Field
Kanji Name Translation Dictionary