Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based...Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based on the bilingual parallel corpus often ignore the document background in the process of retelling acquisition and application.In order to solve this problem,we introduce topic model information into the translation mode and propose a topic-based statistical machine translation method to improve the translation performance.In this method,Probabilistic Latent Semantic Analysis(PLSA)is used to obtains the co-occurrence relationship between words and documents by the hybrid matrix decomposition.Then we design a decoder to simplify the decoding process.Experiments show that the proposed method can effectively improve the accuracy of translation.展开更多
基于主题翻译模型的短文本关键词抽取方法,均采用LDA(Latent Dirichlet Allocation)主题模型作为主题发现方法,然而LDA在处理特征稀疏的短文本时,主题发现效果较差,使得当前的主题翻译模型存在不完善之处。论文通过将DMM(Dirichlet Mult...基于主题翻译模型的短文本关键词抽取方法,均采用LDA(Latent Dirichlet Allocation)主题模型作为主题发现方法,然而LDA在处理特征稀疏的短文本时,主题发现效果较差,使得当前的主题翻译模型存在不完善之处。论文通过将DMM(Dirichlet Multinomial Mixture)模型作为主题发现模型,结合统计机器翻译,提出了一种用于短文本关键词抽取的TTM_DMM(Topical Translation Model based on Dirichlet Multinomial Mixture)主题翻译模型。该模型利用DMM模型发现短文本主题信息,在主题约束下学习词语与关键词的翻译概率,从而提高短文本关键词抽取效果。在真实数据集上的实验结果表明,论文提出的TTM_DMM模型在评价指标Precious、Recall以及F-measure上优于现有的短文本关键词抽取方法。展开更多
基于统计机器翻译模型的问句检索模型,其相关性排序机制主要依赖于词项间的翻译概率,然而已有的模型没有很好地控制翻译模型的噪声,使得当前的问句检索模型存在不完善之处.文中提出一种基于主题翻译模型的问句检索模型,从理论上说明,该...基于统计机器翻译模型的问句检索模型,其相关性排序机制主要依赖于词项间的翻译概率,然而已有的模型没有很好地控制翻译模型的噪声,使得当前的问句检索模型存在不完善之处.文中提出一种基于主题翻译模型的问句检索模型,从理论上说明,该模型利用主题信息对翻译进行合理的约束,达到控制翻译模型噪声的效果,从而提高问句检索的结果.实验结果表明,文中提出的模型在MAP(Mean Average Precision)、MRR(Mean Reciprocal Rank)以及p@1(precision at position one)等指标上显著优于当前最先进的问句检索模型.展开更多
为进一步改善短文本关键词抽取的效果,提出一种TTKE(topical translation for keyphrase extraction)主题翻译模型。结合主题模型与统计机器翻译模型的优势,通过长文本辅助短文本进行主题发现,学习特定主题下词语与关键词的对齐概率,为...为进一步改善短文本关键词抽取的效果,提出一种TTKE(topical translation for keyphrase extraction)主题翻译模型。结合主题模型与统计机器翻译模型的优势,通过长文本辅助短文本进行主题发现,学习特定主题下词语与关键词的对齐概率,为给定短文本进行关键词抽取。在真实数据集上进行实验,实验结果表明,该模型能够有效提高短文本关键词抽取的效果。展开更多
基金supported by National Social Science Fund of China(Youth Program):“A Study of Acceptability of Chinese Government Public Signs in the New Era and the Countermeasures of the English Translation”(No.:13CYY010)the Subject Construction and Management Project of Zhejiang Gongshang University:“Research on the Organic Integration Path of Constructing Ideological and Political Training and Design of Mixed Teaching Platform during Epidemic Period”(No.:XKJS2020007)Ministry of Education IndustryUniversity Cooperative Education Program:“Research on the Construction of Cross-border Logistics Marketing Bilingual Course Integration”(NO.:202102494002).
文摘Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based on the bilingual parallel corpus often ignore the document background in the process of retelling acquisition and application.In order to solve this problem,we introduce topic model information into the translation mode and propose a topic-based statistical machine translation method to improve the translation performance.In this method,Probabilistic Latent Semantic Analysis(PLSA)is used to obtains the co-occurrence relationship between words and documents by the hybrid matrix decomposition.Then we design a decoder to simplify the decoding process.Experiments show that the proposed method can effectively improve the accuracy of translation.
文摘基于主题翻译模型的短文本关键词抽取方法,均采用LDA(Latent Dirichlet Allocation)主题模型作为主题发现方法,然而LDA在处理特征稀疏的短文本时,主题发现效果较差,使得当前的主题翻译模型存在不完善之处。论文通过将DMM(Dirichlet Multinomial Mixture)模型作为主题发现模型,结合统计机器翻译,提出了一种用于短文本关键词抽取的TTM_DMM(Topical Translation Model based on Dirichlet Multinomial Mixture)主题翻译模型。该模型利用DMM模型发现短文本主题信息,在主题约束下学习词语与关键词的翻译概率,从而提高短文本关键词抽取效果。在真实数据集上的实验结果表明,论文提出的TTM_DMM模型在评价指标Precious、Recall以及F-measure上优于现有的短文本关键词抽取方法。
文摘基于统计机器翻译模型的问句检索模型,其相关性排序机制主要依赖于词项间的翻译概率,然而已有的模型没有很好地控制翻译模型的噪声,使得当前的问句检索模型存在不完善之处.文中提出一种基于主题翻译模型的问句检索模型,从理论上说明,该模型利用主题信息对翻译进行合理的约束,达到控制翻译模型噪声的效果,从而提高问句检索的结果.实验结果表明,文中提出的模型在MAP(Mean Average Precision)、MRR(Mean Reciprocal Rank)以及p@1(precision at position one)等指标上显著优于当前最先进的问句检索模型.
文摘为进一步改善短文本关键词抽取的效果,提出一种TTKE(topical translation for keyphrase extraction)主题翻译模型。结合主题模型与统计机器翻译模型的优势,通过长文本辅助短文本进行主题发现,学习特定主题下词语与关键词的对齐概率,为给定短文本进行关键词抽取。在真实数据集上进行实验,实验结果表明,该模型能够有效提高短文本关键词抽取的效果。