期刊文献+

基于跨语言语料的汉泰词分布表示 被引量:2

Distributed representation of Chinese and Thai words based on cross-lingual corpus
下载PDF
导出
摘要 词汇的表示问题是自然语言处理的基础研究内容。目前单语词汇分布表示已经在一些自然语言处理问题上取得很好的应用效果,然而在跨语言词汇的分布表示上国内外研究很少,针对这个问题,利用两种语言名词、动词分布的相似性,通过弱监督学习扩展等方式在中文语料中嵌入泰语的互译词、同类词、上义词等,学习出泰语词在汉泰跨语言环境下的分布。实验基于学习到的跨语言词汇分布表示应用于双语文本相似度计算和汉泰混合语料集文本分类,均取得较好效果。 Word representation is the basic research content of natural language processing. At present, distributed representation of monolingual words has shown satisfactory application effect in some Neural Probabilistic Language (NPL) research, while as for distributed representation of cross-lingual words, there is little research both at home and abroad. Aiming at this problem, given distribution simi larity of nouns and verbs in these two languages, we embed mutual translated words, synonyms, superordinates into Chinese corpus by the weakly supervised learning extension approach and other methods, thus Thai word distribution in cross-lingual environment of Chinese and Thai is learned. We applied the distributed representation of the cross-lingual words learned before to compute similarities of bilingual texts and classify the mixed text corpus of Chinese and Thai. Experimental results show that the proposal has a satisfactory effect on the two tasks.
出处 《计算机工程与科学》 CSCD 北大核心 2015年第12期2358-2365,共8页 Computer Engineering & Science
基金 国家自然科学基金资助项目(61363044)
关键词 弱监督学习扩展 跨语言语料 跨语言词汇分布表示 神经概率语言模型 weakly supervised learning extension cross-lingual corpus cross-lingual word distribution representations neural probabilistic language model
  • 相关文献

参考文献15

  • 1Bengio S,Bengio Y. Taking on the curse of dimensionality in joint distributions using neural networks[J]. IEEE Transac- tions on Neural Networks, 2000,11 (3) : 550-557.
  • 2Bengio Y,Ducharme R, Vincent P, et al. A neural probabilis- tic language model [J]. Journal of Machine Learning Re- search,2003,4(3) : 1137-1155.
  • 3Collobert R, Weston J, Bottou L, et al. Natural language pro- cessing (almost) from scrateh[J]. Journal of Machine Learn- ing Research, 2011,12(1) : 2493-2537.
  • 4Zeman D, Resnik P. Cross language parser adaptation be tween related languages[C]//IJCNLP,2008:35-42.
  • 5Sogaard A. Data point selection for cross-language adaptation of dependency parsers[C]//Proe of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Teehnologiesz Short Papers-Volume 2,2011: 682- 686.
  • 6Ando R K, Zhang T. A framework for learning predietive structures from multiple tasks and unlabeled data[J]. Jour- nal of Maehine Learning Research,2005,6(6):1817-1853.
  • 7Prettenhofer P, Stein B. Cross-language text classification u- sing structural correspondence learning[C]//Proc of the 48th Annual Meeting of the Association for Computational Lin- guistics, 2010 : 1118-1127.
  • 8Steinberger R, Pouliquen B, Hagman J. Cross lingual docu- ment similarity calculation using the mu[tilingual thesaurus eurovoc[C]//Proe of CICLing 02,2002:415 424.
  • 9Wu L,Huang X,Guo Y,et al. FDU at TREC-9:CLIR, filte- ring and QA tasks[C]//Proc of the 9th Text Retrieval Con ference, 2000 : 1.
  • 10Gao J, Nie J, Xun E, et al. Improving query translation for cross-language information retrieval using statistical models [C]//ACM SIGIR,2001:96-104.

同被引文献12

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部