本文提出编码字符集中的完整性问题。(一)中给出两类编码实例。一类严格遵从“一个字符只分配给一个码位“或”任何字符都不重复分配码位”(简称一符一码)的节约原则。另一类以一符两码可多码的方式,以码位的牺牲换取子集的完整性。(二)中解释了子集完整性概念和意义。说明了有意义的子集通常是现实中某子系统的反映。多文种编码字符集的许多子集往往与某自然语言系统相关联,这种子集的完整性,也就与相应语言文字的系统性相关联。据此提出了完整性条件,此较了完整性得失,说明了完整性的某种相对性。(三)中指出不少字符集,含ISO 10646 DP版及DIS版,的一符一码原则损害了若干子集的完整性。在那里,拉丁文字圈中除英文以外的各国家、各民族的文字字符子集大多被肢解了,只有字母表是英文字母表(含元素2×26=52个)子集者例外。斯拉夫文字圈、阿拉伯文字圈情况相似。文中指出汉语拼音字母子集,无论在汉字编码的中国国家标准中还是国际标准10646中被严重肢解了。 本文是作者另文《字符集的序性》[9]的续编。本文的讨论仍除外汉文及藏文。
This paper clearly puts forward the integrality problem of subsets in the coded character set. Part I illustrates two types of coding. The first one strictly follows the economical principle of 'one code position for one character'or 'no one of characters can only be assigned with more than one code position' (or 'one character-one position ' for short). The other type assigns more than one code position to a character, thus sacrificing the economical principle to preserve the integrality of subsets. PartⅡ explains the meaning and the importance of the 'integrality' concept as propounded by the author. It states that a purposeful subset is usually the reflection of a subsystem in reality. Many subsets in a multi-language coded character set are often related to a certain natural linguistic system. The integrality of these subsets are then related to the systematicity of the writing, phonological, lexical, semantic systems in the relevant language, Based on this idea, this paper proposes the integrality condition, compares its gain and loss, and shows the certain relativity of integrality. PartⅢpoints out that the 'one character-one position' principle in ISO/DP 10646 has damaged the integrality of a great many subsets. Except for the English script (comprising 2×26 = 52 letters) and its, subsets, most of the character subsets of the Roman alphabet are fragmented.This is even so for the Slavic and Arabic alphabets. It is also indicated that the subset of the Chinese pinyin alphabet is seriously fragmented both in the Chinese-character-coded national standard in China and the international standard 10646.The present paper is a sequel to the author' s earlier essay 'On the Order of Character Sets' (9).
Journal of Chinese Information Processing