摘要
根据多文种信息处理中双向文字所存在的问题,提出了一种面向信息处理、具有自描述能力的双向文字处理算法IBidi。该算法首先对字符流进行预处理,主要对数字等特殊的字符进行标注;然后分析字符流,添加各种定义好的标签,用于描述字符的特性,供信息处理系统使用;最后,IBidi利用一个重新排序算法输出处理结果。该算法在典型测试样本上正确率达到96.7%,比Unicode的双向文字处理算法高出约17个百分点。另外,在随机样本测试中,IBidi的正确率也比Unicode的双向文字处理算法高5%左右。
According to the existing problems in bidirectional text recognition, a new bidirectional algorithm-IBidi was put forward to process bidirectional text and it had the ability of self-descrlption and was oriented to information processing. Firstly, IBidi preprocessed the text stream and tagged the digits. Then it analyzed the text stream and tagged the string with predefined marks to describe the characteristic of strings. Finally, a sorting algorithm was used to sort text stream for display. The experimental result on a typical test set shows that the precision of IBidi is up to 96.7%, while that of Unicede's bidirectional algorithm is only 80%. Additionally, the experimental result on random test also shows that the precision of IBidi is 5% higher than that of Unicede's bidirectional algorithm.
出处
《计算机应用》
CSCD
北大核心
2007年第6期1513-1517,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(60673041)
江苏省高技术研究项目(BG2005020)
江苏省自然科学基金资助项目(BK2003030)