期刊文献+

多特征融合的越南语关键词生成方法

Vietnamese keyphrase generation method based on multi-feature fusion
下载PDF
导出
摘要 越南语属于低资源语种,高质量关键词新闻数据稀缺,为了解决样本不足条件下生成越南语新闻关键词准确性不高的问题,提出了一种多特征融合的越南语关键词生成模型,拟提升生成的越南语关键词与越南语新闻文档的相关性.首先,将越南语新闻实体、词性、词汇位置特征与词向量拼接,使输入模型的词向量包含更多维度的语义信息;其次,利用双向注意力机制捕获上下文与新闻标题的依赖关系,增强标题在关键词生成中的指导作用;最后,结合复制机制生成越南语关键词,从而提高关键词的语义相关性.在构建的越南语新闻关键词数据集上进行实验,结果表明融合多特征的关键词生成模型能在越南语训练样本有限的条件下生成高质量关键词,F1@10、R@50分数比TG-Net分别提升了13.2%和17.1%. Vietnamese is a low-resource language and high-quality keyphrase news corpus is scarce. In order to solve the problem that the accuracy of generating Vietnamese news keyphrases is not high under the condition of insufficient samples, a multi-feature fusion Vietnamese keyphrase generation model is proposed to improve the relevance of the generated Vietnamese keyphrases and Vietnamese news documents. Firstly, the features of Vietnamese news entity, part of speech, vocabulary position are spliced with the word vector, so that the word vector of the input model contains more dimensional semantic information. Secondly, the bidirectional attention mechanism is used to capture the dependence of context and news headlines and enhance the guiding role of headlines in keyphrase generation. Finally, it combine the copy mechanism to generate Vietnamese keyphrases for improving the semantic relevance of keyphrases. Experiments on the constructed Vietnamese news corpus show that the keyphrase generation model fused with multiple features can generate high-quality keyphrases under the condition of limited Vietnamese training corpus. Compared with TG-Net, the F1@10 and R@50 score are improved by 13.2% and 17.1% respectively.
作者 陈瑞清 高盛祥 余正涛 张迎晨 张磊 杨舰 CHEN Rui-qing;GAO Sheng-xiang;YU Zheng-tao;ZHANG Ying-chen;ZHANG Lei;YANG Jian(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,Yunnan,China)
出处 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2022年第1期23-33,共11页 Journal of Yunnan University(Natural Sciences Edition)
基金 国家自然科学基金(61972186) 国家重点研发计划(2018YFC0830105) 云南省重大科技专项(202002AD080001-5)。
关键词 多特征 越南语 关键词生成 双向注意力机制 词向量 multi-feature Vietnamese keyphrase generation bidirectional attention mechanism word vector
  • 相关文献

参考文献2

二级参考文献3

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部