期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
A Tibetan Sentence Boundary Disambiguation Model Considering the Components on Information on Both Sides of Shad 被引量:1
1
作者 Fenfang Li Hui Lv +3 位作者 Yiming Gao Dolha Yan Li Qingguo Zhou 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第6期1085-1100,共16页
Sentence Boundary Disambiguation(SBD)is a preprocessing step for natural language processing.Segmenting text into sentences is essential for Deep Learning(DL)and pretraining language models.Tibetan punctuation marks m... Sentence Boundary Disambiguation(SBD)is a preprocessing step for natural language processing.Segmenting text into sentences is essential for Deep Learning(DL)and pretraining language models.Tibetan punctuation marks may involve ambiguity about the sentences’beginnings and endings.Hence,the ambiguous punctuation marks must be distinguished,and the sentence structure must be correctly encoded in language models.This study proposed a component-level Tibetan SBD approach based on the DL model.The models can reduce the error amplification caused by word segmentation and part-of-speech tagging.Although most SBD methods have only considered text on the left side of punctuation marks,this study considers the text on both sides.In this study,465669 Tibetan sentences are adopted,and a Bidirectional Long Short-Term Memory(Bi-LSTM)model is used to perform SBD.The experimental results show that the F1-score of the Bi-LSTM model reached 96%,the most efficient among the six models.Experiments are performed on low-resource languages such as Turkish and Romanian,and high-resource languages such as English and German,to verify the models’generalization. 展开更多
关键词 Sentence Boundary Disambiguation(SBD) punctuation marks ambiguity Bidirectional Long Short-Term Memory(Bi-LSTM)model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部