期刊文献+

基于流形学习的句向量优化

Sentence embedding optimization based on manifold learning
下载PDF
导出
摘要 句向量是自然语言处理的核心技术之一,影响着自然语言处理系统的质量和性能。然而,已有的方法无法高效推理句与句之间的全局语义关系,致使句子在欧氏空间中的语义相似性度量仍存在一定问题。为解决该问题,从句子的局部几何结构入手,提出一种基于流形学习的句向量优化方法。该方法利用局部线性嵌入(LLE)对句子及其语义相似句子进行两次加权局部线性组合,这样不仅保持了句子之间的局部几何信息,而且有助于推理全局几何信息,进而使句子在欧氏空间中的语义相似性更贴近人类真实语义。在7个文本语义相似度任务上的实验结果表明,所提方法的斯皮尔曼相关系数(SRCC)平均值相较于基于对比学习的方法SimCSE(Simple Contrastive learning of Sentence Embeddings)提升了1.21个百分点。此外,将所提方法运用于主流预训练模型上的结果表明,相较于原始预训练模型,所提方法优化后模型的SRCC平均值提升了3.32~7.70个百分点。 As one of the core technologies of natural language processing,sentence embedding affects the quality and performance of natural language processing system.However,the existing methods are unable to infer the global semantic relationship between sentences efficiently,which leads to the fact that the semantic similarity measurement of sentences in Euclidean space still has some problems.To address the issue,a sentence embedding optimization method based on manifold learning was proposed.In the method,Local Linear Embedding(LLE)was used to perform double weighted local linear combinations to the sentences and their semantically similar sentences,thereby preserving the local geometric information between sentences and providing helps to the inference of the global geometric information.As a result,the semantic similarity of sentences in Euclidean space was closer to the real semantics of humans.Experimental results on seven text semantic similarity tasks show that the proposed method has the average Spearman’s Rank Correlation Coefficient,(SRCC)improved by 1.21 percentage points compared with the contrastive learning-based method SimCSE(Simple Contrastive learning of Sentence Embeddings).In addition,the proposed method was applied to mainstream pre-trained models.The results show that compared to the original pre-trained models,the models optimized by the proposed method have the average SRCC improved by 3.32 to 7.70 percentage points.
作者 吴明月 周栋 赵文玉 屈薇 WU Mingyue;ZHOU Dong;ZHAO Wenyu;QU Wei(School of Computer Science and Engineering,Hunan University of Science and Technology University,Xiangtan Hunan 411201,China;Hunan Key Laboratory for Service Computing and Novel Software Technology(Hunan University of Science and Technology University),Xiangtan Hunan 411201,China)
出处 《计算机应用》 CSCD 北大核心 2023年第10期3062-3069,共8页 journal of Computer Applications
基金 国家自然科学基金资助项目(61876062) 湖南省自然科学基金资助项目(2022JJ30020) 湖南省教育厅科研项目(21A0319)。
关键词 流形学习 预训练模型 对比学习 句向量 自然语言处理 局部线性嵌入 manifold learning pre-trained model contrastive learning sentence embedding natural language processing Local Linear Embedding(LLE)
  • 相关文献

参考文献3

二级参考文献33

共引文献104

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部