摘要
由于目前哈萨克语句法分析准确率较低并缺乏基于神经网络的哈萨克语句法分析的相关研究,针对哈萨克语短语结构的句法分析,使用基于移进—归约的方法,采用在栈中存储句子跨度而不是部分树结构,从而在进行句法树解析时不需要对句法树进行二叉化。该研究在句子特征提取时使用双向LSTM对句子跨度特征进行提取,得到句子跨度在整个句子上下文中信息,再使用多层感知机对句法分析模型进行训练,最后在解码时使用动态规划选取最优句法分析结果;最终使得哈萨克语短语句法分析准确率达到了76.92%。研究成果对哈萨克语句法分析准确率有了进一步的提高,并为后续的哈萨克语机器翻译及语义分析奠定良好的基础。
Due to the low accuracy of Kazakh parsing and the lack of correlation research based on neural network Kazakh parsing,this paper focused on the parsing of Kazakh phrase structure,based on the shift-reduce method,but by the stack elements were sentence spans rather than partial tree,then it didn’t need to carry out the binary tree in parsing.The research used the bi-directional LSTM to extract the features of sentence span,and obtained the sentence span in the whole sentence context,using the multilayer perceptron to train the parsing model.In the end,the Kazakh parsing accuracy achieved 76.92%.The research results improved the accuracy of Kazakh parsing and built a good foundation for Kazakh machine translation and semantic analysis.
作者
柴伟
古丽拉·阿东别克
Chai Wei;Gulila Altenbek(College of Information Science&Engineering,Xinjiang University,Urumqi 830046,China;Xinjiang Laboratory of Multi-language Information Technology,Urumqi 830046,China;The Base of Kazakh&Kirghiz Language of National Language Resource Monitoring&Research Center on Minority Language,Urumqi 830046,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第3期731-733,753,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(61363062)。