摘要
老挝语是一种无空格切分的字母语言,在进行自然语言处理工作时需要首先进行分词处理。现有分词算法主要为首先使用规则进行音节切分,然后根据音节切分结果进行老挝语分词,存在错误传递等问题。该文提出一种基于神经网络的端到端老挝语分词方法,基于多任务联合学习思想,将老挝语音节切分与分词工作进行结合,实现了基于双向长短时记忆循环神经网络(BiLSTM)的端到端老挝语分词模型。实验表明,端到端的老挝语分词模型准确率达到89.02%,较以往分词模型有所提升。
Laotian is a non-space separated alphabetic language. The existing segmentation algorithms for Laotian mainly use rules to segment syllables first, and then segment words according to the results of syllable segmentation. This paper proposes an end-to-end Laotian word segmentation method based on neural networks. With multi-task joint learning, the Lao syllable segmentation and word segmentation are jointly processed via BiLSTM. Experiments show that the precision of the proposed method reaches 89.02%, out-performing previous word segmentation models.
作者
郝永彬
周兰江
刘畅
HAO Yongbin;ZHOU Lanjiang;LIU Chang(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650504,China;School of Information Science and Technology,Southwest Jiaotong University,Chengdu,Sichuan 611756,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第9期75-81,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61662040,61562049)。
关键词
老挝语分词
音节切分
多任务学习
端到端模型
Laotian word segmentation
syllable segmentation
multi-task learning
end-to-end model