摘要
通过对菲律宾语的词法分析、句法分析、语义分析等基础研究和机器翻译、拼写检查、情感分析等应用技术的研究进展进行分析,得知菲律宾语仍属于语言资源较为缺乏的低资源语言,在菲律宾语自然语言处理领域,现有研究比较宽泛但不深入,与英语、汉语等语种的自然语言处理研究相比,还存在较大差距;相较而言,英菲平行语料库构建及其机器翻译的研究取得了较大进展,而其他领域研究进展相对缓慢。总体来说,通过跨语言处理技术构建跨语言平行语料库,推动深度学习应用于菲律宾语自然语言处理的方法研究,探讨基于规则、图模型、结构等方法对菲律宾语文本自动摘要的适用性,将是未来菲律宾语自然语言处理的主要研究方向。
Based on the analysis of morphological analysis,syntactic parsing,semantic analysis,along with the research progress of machine translation,spelling check,sentiment analysis and other application technologies,a conclusion can be drawn from it that Filipino is still a language with a relatively lack of language resources.As regards to the natural language processing of Filipino,the existing research is relatively broad but not in-depth,compared with that of English,Chinese and other languages,with a big gap in the research of natural language processing.Compared with other research areas,great progress has been made in the construction of parallel corpus and machine translation for English-Filipino,with a relatively slow progress in other fields.In general,the main research direction of NLP for non-native Filipino is for the construction of a parallel corpus based on cross language processing technology,the promotion of the research of deep learning applied to NLP,and the exploration of the applicability of rules,graph models,structures and other methods to the automatic abstracts of Philippine texts.
作者
李珊珊
蒋盛益
符斯慧
LI Shanshan;JIANG Shengyi;FU Sihui(Guangzhou Key Laboratory of Multilingual Intelligent Processing,Guangdong University of Foreign Studies,Guangzhou 510006,China;School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006,China)
出处
《湖南工业大学学报》
2020年第3期23-32,F0002,共11页
Journal of Hunan University of Technology
基金
国家自然科学基金资助项目(61572145)。
关键词
菲律宾语
黏着语
低资源语言
自然语言处理
词性标注
Filipino
agglutinative language
low resource language
natural language processing
part-of speech tagging