摘要
目前,国内少数民族地区的书报印刷行业大多使用北大方正、华光藏文排版系统。这些软件的编码各异,致使有限的藏文资源无法实现交换和共享,造成这种现象的原因是各种软件编码体系不一致。解决这个问题的根本途径是将各种不同体系的藏文编码转换为符合国际标准的编码。该文以华光Windows藏文字符编码为例,首先对每个藏文字符进行构字分析,然后采用分表分组技术构造出每个字符符合ISO/IEC 10646标准的编码序列,最后采用hash技术优化查询算法,实现非标准的藏文字符编码向标准编码序列转换。
At present, many publishing systems, such as Bei Da Fang Zheng and Hua Guang are widely applied in the printing industry for issuing Tibetan publications in the domestic minority areas. Due to the different coding system in these systems, the valuable electronic resources for Tibetan languages cannot be exchanged and shared. This paper proposes a solution to convert Tibetan code of different system into the international standard. It further realizes such conversion system for Hua Guang windows encoding of Tibetan into the ISO/IEC 10646 encoding, with a designed sub-table&group strategy in hash.
出处
《中文信息学报》
CSCD
北大核心
2009年第4期118-123,共6页
Journal of Chinese Information Processing
基金
青海省重点科技攻关项目(2006-N-176)
关键词
计算机应用
中文信息处理
藏文
字符集标准
编码转换
分表分组技术
computer application
Chinese information processing
Tibetan
character encoding standard
code conversion
encoding sort
query