摘要
哈萨克语的理解一般分为以下步骤:原文输入、词语切分及词语属性特征标注、语法及句法分析、语义及语用和语境分析、生成目标形式表示、句群及篇章理解等。句子分析上接篇章理解,下联词汇分析,起着承上启下的作用。由于哈萨克语句法分析结果的准确度将对后续机器翻译的研究产生影响,在掌握哈萨克语词法分析技术的基础上,结合现代哈萨克语句法结构特点,首先介绍了厄尔利算法、GLR算法和线图算法三种基于规则的句法分析算法。通过实验对比发现,线图分析算法在哈萨克语简单句的分析中具有运算速度快和占用空间小的综合优势。针对传统线图分析算法冗余边较多造成分析准确率不高的现象引入规则库优化的改进线图算法,实验结果表明,改进后的线图算法使得准确率提高了4.19%,运行时间缩短了20倍。
The understanding of the Kazakh is generally divided into the following steps,the original input words,word segmentation and attribute features labeling,grammar and syntax analysis,semantics and pragmatics,and context analysis,generating target form,sentence group and text understanding,etc. Sentence analysis discourses text understanding,allying lexical analysis,playing the essential role. Be-cause the Kazakh syntactic analysis result accuracy influences the followed machine translation,based on mastering Kazakh lexical analy-sis technology,combined with the characteristics of modern Kazakh syntactic structure,first introduce the three rule-based parsing algo-rithms including Earley algorithm,GLR algorithm and chart analysis algorithm. The chart analysis algorithm has fast speed and small foot-print of the comprehensive advantages in simple Kazakh sentences analysis found by experimental comparison. The rule base optimization chart analysis algorithm is introduced to aim at the problem of low accuracy caused by more side redundancy,experimental results show that the algorithm makes the accuracy improved 4. 19%,the running time shortens 20 times.
出处
《计算机技术与发展》
2015年第9期43-47,共5页
Computer Technology and Development
基金
国家自然科学基金资助项目(61063025
61363062)
关键词
哈萨克语
句法分析
线图分析算法
规则库
句法树
Kazakh
syntactic analysis
chart analysis algorithm
rule base
syntax tree