期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Pretrained Models and Evaluation Data for the Khmer Language
1
作者 Shengyi Jiang sihui fu +1 位作者 Nankai Lin Yingwen fu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期709-718,共10页
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(N... Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications. 展开更多
关键词 pretrained models Khmer language word segmentation part-of-speech(POS)tagging news categorization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部