期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
TASSM_BS:基于Bi-LSTM和Self-Attention的藏文自动分句方法
1
作者 才让叁智 多拉 +2 位作者 格桑多吉 洛桑嘎登 仁增多杰 《中文信息学报》 CSCD 北大核心 2023年第5期44-52,共9页
自动分句在自然语言处理中具有重要的应用价值,是机器翻译、句法分析和语义分析等任务的重要前期工作环节。当前藏文自动分句中采用的基于词典的分句方法,以及基于词典和统计模型相结合的分句方法因受句尾词兼类现象和数据稀疏等问题的... 自动分句在自然语言处理中具有重要的应用价值,是机器翻译、句法分析和语义分析等任务的重要前期工作环节。当前藏文自动分句中采用的基于词典的分句方法,以及基于词典和统计模型相结合的分句方法因受句尾词兼类现象和数据稀疏等问题的影响,分句效率较低。对此,该文提出了一种基于Bi-LSTM和Self-Attention的藏文自动分句方法。通过实验对比,该方法的宏准确率、宏召回率和宏F1值分别到达了97.7%、98.06%和97.88%,其结果优于所有对比方法。另外,在实验过程中还发现,当模型使用序列前端截补方式定长的数据时,其性能优于使用后端截补方式定长的数据;当模型使用基于Skip-gram的音节字表示时,其性能优于基于CBOW和随机生成的音节字表示。 展开更多
关键词 藏文句子 分句 TSRM_BS模型
下载PDF
ZHUMO AND HER CRANES──Primitive Ecological Culture in ‘KingGesar’
2
作者 dolha 《China's Tibet》 1999年第4期26-26,共1页
关键词 Primitive Ecological Culture in KingGesar ZHUMO AND HER CRANES
下载PDF
A Tibetan Sentence Boundary Disambiguation Model Considering the Components on Information on Both Sides of Shad
3
作者 Fenfang Li Hui Lv +3 位作者 Yiming Gao dolha Yan Li Qingguo Zhou 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第6期1085-1100,共16页
Sentence Boundary Disambiguation(SBD)is a preprocessing step for natural language processing.Segmenting text into sentences is essential for Deep Learning(DL)and pretraining language models.Tibetan punctuation marks m... Sentence Boundary Disambiguation(SBD)is a preprocessing step for natural language processing.Segmenting text into sentences is essential for Deep Learning(DL)and pretraining language models.Tibetan punctuation marks may involve ambiguity about the sentences’beginnings and endings.Hence,the ambiguous punctuation marks must be distinguished,and the sentence structure must be correctly encoded in language models.This study proposed a component-level Tibetan SBD approach based on the DL model.The models can reduce the error amplification caused by word segmentation and part-of-speech tagging.Although most SBD methods have only considered text on the left side of punctuation marks,this study considers the text on both sides.In this study,465669 Tibetan sentences are adopted,and a Bidirectional Long Short-Term Memory(Bi-LSTM)model is used to perform SBD.The experimental results show that the F1-score of the Bi-LSTM model reached 96%,the most efficient among the six models.Experiments are performed on low-resource languages such as Turkish and Romanian,and high-resource languages such as English and German,to verify the models’generalization. 展开更多
关键词 Sentence Boundary Disambiguation(SBD) punctuation marks AMBIGUITY Bidirectional Long Short-Term Memory(Bi-LSTM)model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部