摘要
与西方语言相比,印地语是东南亚地区的一种低资源语言。由于缺少相应的语料、标注规范及计算模型,当前印地语自然语言处理工作并未得到重视,也不能较好地迁移通用语种研究中的前沿方法。该文在进行文献调研和计量分析的基础上,回顾了印地语自然语言处理研究在基础资源建设、词性标注、命名实体识别、句法分析、词义消歧、信息检索、机器翻译、情感分析以及自动摘要等方面的研究进展,最后提出了该领域研究可能面临的问题及挑战,并展望未来发展趋势。
Compared with western languages,Hindi is a low resource language in Southeast Asia.Due to the lack of corpus,annotation specifications and computational modeling practices,the studies on Hindi natural language processing have not been well addressed.This paper reviews the research progresses in Hindi natural language processing in terms of the resource construction,part of speech tagging,named entity recognition,syntactic analysis,word sense disambiguation,as well as information retrieval,machine translation,sentiment analysis and automatic summarization.This paper also reveals the issues and challenges in Hindi natural language processing,and outlooks the future development trend.
作者
王连喜
林楠铠
蒋盛益
邓致妍
WANG Lianxi;LIN Nankai;JIANG Shengyi;DENG Zhiyan(Guangzhou Key Laboratory of Multilingual Intelligent Processing,Guangzhou,Guangdong 5100o6,China;School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou,Guangdong 510006,China;Faculty of Asian and African studies,Guangdong University of Foreign Studies,Guangzhou,Guangdong 510006,China)
出处
《中文信息学报》
CSCD
北大核心
2023年第5期53-69,共17页
Journal of Chinese Information Processing
基金
广东省科技计划项目(2019A101002108)
广州市科技计划项目(202002030227)
广东省普通高校重点领域项目(2019KZDZX1016)。
关键词
印地语
自然语言处理
低资源语言
Hindi
natural language processing
resource-scarce language