期刊文献+

用于双语科技术语对齐的汉维文可比语料库构建 被引量:2

Construction of Chinese-Uyghur Comparable Corpus for Alignment of Bilingual Technical Terms
下载PDF
导出
摘要 面向汉文维吾尔文(以下简称汉维)双语科技术语抽取这一应用目标,本文提出新闻科技领域的汉维可比语料库设计方案并进行实验.将网络采集的汉维语料利用机器翻译系统进行初加工后映射到向量空间中并使用LSI算法计算出各向量间的相关性,利用计算后的向量建立索引并依次计算出源文本与候选文本的相似值.本文设计两种实验进行对比,对选取的可比语料进行评估、筛选,最终达到构建汉维可比语料库的目的. In order to realize the practical requirement of Chinese-Uyghur bilingual scientific and technical terms, this paper proposes a Chinese-Uyghur comparable corpus design for the field of news, science and technology and carries out a feasibility experiment. It is first proposed to use more mature Chinese-Uyghur machine translation system to establish the Chinese-Uyghur comparable corpus. We use the Chinese-Uyghur corpus collected on the network to map the collected corpus to the vector space and use the LSI algorithm to compute the correlation between the words. The calculated text is indexed as candidate text and then the similarity between the source text and the candidate text is calculated in turn. Furthermore, two experimental schemes are designed and compared, and the selected corpus is evaluated and screened to achieve the goal of constructing the Chinese-Uyghur comparable corpus.
出处 《新疆大学学报(自然科学版)》 CAS 北大核心 2017年第3期316-321,共6页 Journal of Xinjiang University(Natural Science Edition)
基金 国家自然科学基金项目(61463048 61462083 61331011) 国家重点基础研究发展计划(973)项目(2014cb340506)
关键词 可比语料库 汉维可比语料库构建 双语术语抽取 LSI comparable corpora Chinese-Uyghur bilingual corpora construction bilingual language term extraction LSI
  • 相关文献

参考文献6

二级参考文献69

共引文献25

同被引文献10

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部