Word-order reformation is very uselul m mtormatlon orocessing, and it is worthy to annotate in corpus. In this paper, we unalyze the syntactic functions afforded by word-order switch in Mandarin Chinese, and present a...Word-order reformation is very uselul m mtormatlon orocessing, and it is worthy to annotate in corpus. In this paper, we unalyze the syntactic functions afforded by word-order switch in Mandarin Chinese, and present a feasible annotation approach based on the word order information. The experiment results show that there is a very significant difference for the frequency information of word distribution after annotation. Therefore, it is helpful for obtaining the accurate frequency information. Meanwhile, word-order switch information can also offer meaningful pragmatic information to improve the quality of machine translation.展开更多
词性是自然语言处理的基本要素,词语顺序包含了所传达的语义与语法信息,它们都是自然语言中的关键信息.在word embedding模型中如何有效地将两者结合起来,是目前研究的重点.本文提出的Structured word2vec on POS联合了词语顺序与词性...词性是自然语言处理的基本要素,词语顺序包含了所传达的语义与语法信息,它们都是自然语言中的关键信息.在word embedding模型中如何有效地将两者结合起来,是目前研究的重点.本文提出的Structured word2vec on POS联合了词语顺序与词性两种信息,不仅使模型可以感知词语位置顺序,而且利用词性关联信息来建立上下文窗口内词语之间的固有句法关系.Structured word2vec on POS将词语按其位置顺序定向嵌入,对词向量和词性相关加权矩阵进行联合优化.实验通过词语类比、词相似性任务,证明了所提出的方法的有效性.展开更多
基金Supported by the Scientific Research Foundation for the Returned Overseas Chinese Scholars,State Education Ministry(Z1534014)the Initial Research Foundation for High-level Talents of Huaqiao University(13SKBS219)
文摘Word-order reformation is very uselul m mtormatlon orocessing, and it is worthy to annotate in corpus. In this paper, we unalyze the syntactic functions afforded by word-order switch in Mandarin Chinese, and present a feasible annotation approach based on the word order information. The experiment results show that there is a very significant difference for the frequency information of word distribution after annotation. Therefore, it is helpful for obtaining the accurate frequency information. Meanwhile, word-order switch information can also offer meaningful pragmatic information to improve the quality of machine translation.
文摘词性是自然语言处理的基本要素,词语顺序包含了所传达的语义与语法信息,它们都是自然语言中的关键信息.在word embedding模型中如何有效地将两者结合起来,是目前研究的重点.本文提出的Structured word2vec on POS联合了词语顺序与词性两种信息,不仅使模型可以感知词语位置顺序,而且利用词性关联信息来建立上下文窗口内词语之间的固有句法关系.Structured word2vec on POS将词语按其位置顺序定向嵌入,对词向量和词性相关加权矩阵进行联合优化.实验通过词语类比、词相似性任务,证明了所提出的方法的有效性.