Based on the analysis of the unique shapes and writing styles of Uyghur characters,we design a framework for prototype character recognition system and carry out a systematic theoretical and experimental research on i...Based on the analysis of the unique shapes and writing styles of Uyghur characters,we design a framework for prototype character recognition system and carry out a systematic theoretical and experimental research on its modules.In the preprocessing procedure,we use the linear and nonlinear normalization based on dot density method.Both structural and statistical features are extracted due to the fact that there are some very similar characters in Uyghur literature.In clustering analysis,we adopt the dynamic clustering algorithm based on the minimum spanning tree(MST),and use the k-nearest neighbor matching classification as classifier.The testing results of prototype system show that the recognition rates for characters of the four different types(independent,suffix,intermediate,and initial type) are 74.67%,70.42%,63.33%,and 72.02%,respectively;the recognition rates for the case of five candidates for those characters are 94.34%,94.19%,93.15%,and 95.86%,respectively.The ideas and methods used in this paper have some commonality and usefulness for the recognition of other characters that belong to Altaic languages family.展开更多
基金Supported by the National Natural Science Foundation of China (61065001)
文摘Based on the analysis of the unique shapes and writing styles of Uyghur characters,we design a framework for prototype character recognition system and carry out a systematic theoretical and experimental research on its modules.In the preprocessing procedure,we use the linear and nonlinear normalization based on dot density method.Both structural and statistical features are extracted due to the fact that there are some very similar characters in Uyghur literature.In clustering analysis,we adopt the dynamic clustering algorithm based on the minimum spanning tree(MST),and use the k-nearest neighbor matching classification as classifier.The testing results of prototype system show that the recognition rates for characters of the four different types(independent,suffix,intermediate,and initial type) are 74.67%,70.42%,63.33%,and 72.02%,respectively;the recognition rates for the case of five candidates for those characters are 94.34%,94.19%,93.15%,and 95.86%,respectively.The ideas and methods used in this paper have some commonality and usefulness for the recognition of other characters that belong to Altaic languages family.