摘要
该文研究了朝鲜文字空间结构中不同位置上的基本字母对文字结构的分类所提供的信息贡献。首先,提出了文字的结构距离的概念与计算方法,描述了不同结构之间的差异;其次,研究了文字结构的等价类划分方法以及文字结构的概率分布;最后,通过计算结构分类时不同位置上的基本字母的信息增益,刻画了文字中信息的分布结构。对实际朝鲜语文档的实验表明,c1-v2、c1-v1-c3、c1-v2-c3型结构的文字具有显著的高概率特性,v1、v2类型和c3类型字母对结构分类的影响最大。
In this paper,the information contribution of cardinal graphemes for classifying the structures of Korean characters is in vestigated.Firstly,the concept and computational method of structure distance between Korean characters is proposed to describe the dissimilarity of different character structures.Furthermore,an approach to partitioning equivalent classes of character structures and the probability distribution are discussed.Finally,the information distribution of the character structures is described by computing information gain of cardinal graphemes for classifying the structures of characters.The results of simulation experiment on actual Korean documents show that c1-v2,c1-v1-c3 and c1-v2-c3 types of characters possess prominent high probability of occurrence,and furthermore,v1,v2 and c3 type of graphmes make a greatest difference in classifying the structures of characters.
出处
《中文信息学报》
CSCD
北大核心
2011年第5期114-119,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(69362001)
关键词
朝鲜文字
文字结构等价类
结构距离
信息增益
Korean character
equivalent class of character structures
structure distance
information gain