文本复述判别是一个重要的句子级语义理解应用。该文提出了一个轻量级的基于记忆单元的单层循环神经网络模型,并结合语义角色标注知识帮助进行英文文本复述判别。使用单层的循环网络模型减缓由于网络层数过多加重的梯度消失和梯度爆炸问...文本复述判别是一个重要的句子级语义理解应用。该文提出了一个轻量级的基于记忆单元的单层循环神经网络模型,并结合语义角色标注知识帮助进行英文文本复述判别。使用单层的循环网络模型减缓由于网络层数过多加重的梯度消失和梯度爆炸问题,易于训练;并且利用外部记忆单元和语义角色知识帮助存储两句话中不同层级的语义联系。该文模型在英文评测语料Microsoft Research Paraphrase Corpus测试集上F值为84.3%。实验表明,语义角色标注知识确实可以帮助文本复述判别,并且轻量级模型达到了与同类多层次网络模型相近的效果。展开更多
Discriminative Latent Model(DLM) is proposed for Multiword Expressions(MWEs) extraction in Chinese text to improve the performance of Machine Translation(MT) system such as Template Based MT(TBMT).For MT systems to be...Discriminative Latent Model(DLM) is proposed for Multiword Expressions(MWEs) extraction in Chinese text to improve the performance of Machine Translation(MT) system such as Template Based MT(TBMT).For MT systems to become of further practical use,they need to be enhanced with MWEs processing capability.As our study towards this goal,we propose DLM,which is developed for sequence labeling task including hidden structures,to extract MWEs for MT systems.DLM combines the advantages of existing discriminative models,which can learn hidden structures in sequence labeling task.In our evaluations,DLM achieves precisions ranging up to 90.73% for some type of MWEs,which is higher than state-of-the-art discriminative models.Such results demonstrate that it is feasible to automatically identify many Chinese MWEs using our DLM tool.With MWEs processing model,BLEU score of MT system has also been increased by up to 0.3 in close test.展开更多
文摘文本复述判别是一个重要的句子级语义理解应用。该文提出了一个轻量级的基于记忆单元的单层循环神经网络模型,并结合语义角色标注知识帮助进行英文文本复述判别。使用单层的循环网络模型减缓由于网络层数过多加重的梯度消失和梯度爆炸问题,易于训练;并且利用外部记忆单元和语义角色知识帮助存储两句话中不同层级的语义联系。该文模型在英文评测语料Microsoft Research Paraphrase Corpus测试集上F值为84.3%。实验表明,语义角色标注知识确实可以帮助文本复述判别,并且轻量级模型达到了与同类多层次网络模型相近的效果。
基金supported by Liaoning Province Doctor Startup Fund under Grant No.20101021the Fund of the State Ethic Affairs Commissions under Grant No.10DL08AnHui Provincie Key Laboratory of Affective Computing and Advanced Intelligent Machine
文摘Discriminative Latent Model(DLM) is proposed for Multiword Expressions(MWEs) extraction in Chinese text to improve the performance of Machine Translation(MT) system such as Template Based MT(TBMT).For MT systems to become of further practical use,they need to be enhanced with MWEs processing capability.As our study towards this goal,we propose DLM,which is developed for sequence labeling task including hidden structures,to extract MWEs for MT systems.DLM combines the advantages of existing discriminative models,which can learn hidden structures in sequence labeling task.In our evaluations,DLM achieves precisions ranging up to 90.73% for some type of MWEs,which is higher than state-of-the-art discriminative models.Such results demonstrate that it is feasible to automatically identify many Chinese MWEs using our DLM tool.With MWEs processing model,BLEU score of MT system has also been increased by up to 0.3 in close test.