期刊文献+

无监督句对齐综述

Survey of Unsupervised Sentence Alignment
下载PDF
导出
摘要 无监督句对齐在自然语言处理领域是一个重要而具有挑战性的问题。该任务旨在找到不同语言中句子的对应关系,为跨语言信息检索、机器翻译等应用提供基础支持。该综述从方法、挑战和应用3个方面概括了无监督句对齐的研究现状。在方法方面,无监督句对齐涵盖了多种方法,包括基于多语言嵌入、聚类和自监督或者生成模型等。然而,无监督句对齐面临着多样性、语言差异和领域适应等挑战。语言的多义性和差异性使得句对齐变得复杂,尤其在低资源语言中更为明显。尽管面临挑战,无监督句对齐在跨语言信息检索、机器翻译、多语言信息聚合等领域具有重要应用。通过无监督句对齐,可以将不同语言中的信息整合,提升信息检索的效果。同时,该领域的研究也在不断推动技术的创新和发展,为实现更准确和稳健的无监督句对齐提供了契机。 Unsupervised sentence alignment is an important and challenging problem in the field of natural language processing.This task aims to find corresponding sentence correspondences in different languages and provide basic support for cross-language information retrieval,machine translation and other applications.This survey summarizes the current research status of unsupervised sentence alignment from three aspects:methods,challenges and applications.In terms of methods,unsupervised sentence alignment covers a variety of methods,including based on multi-language embedding,clustering and self-supervised or generative models.However,unsupervised sentence alignment faces challenges such as diversity,language differences,and domain adaptation.The ambiguity and diversity of languages complicates sentence alignment,especially in low-resource languages.Despite the challenges,unsupervised sentence alignment has important applications in fields such as cross-lingual information retrieval,machine translation,and multilingual information aggregation.Through unsupervised sentence alignment,information in different languages can be integrated to improve the effect of information retrieval.At the same time,research in this field is alsoconstan-tly promoting technological innovation and development,providing opportunities to achieve more accurate and robust unsupervised sentence alignment.
作者 谷仕威 刘静 李丙春 熊德意 GU Shiwei;LIU Jing;LI Bingchun;XIONG Deyi(College of Intelligence and Computing,Tianjin University,Tianjin 300350,China;School of Computer Science and Technology,Kashi University,Kashgar,Xinjiang 844000,China)
出处 《计算机科学》 CSCD 北大核心 2024年第1期60-67,共8页 Computer Science
基金 新疆维吾尔自治区自然科学基金重点项目(2022D01D43) 云南省重点研发计划(202203AA080004) 基于汉语-乌尔都语平行语料库的研究(KS2022084)。
关键词 无监督句对齐 自然语言处理 机器翻译 自监督 低资源 Unsupervised sentence alignment Natural language processing Machine translation Self-supervised Low-resource
  • 相关文献

参考文献1

二级参考文献1

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部