摘要
为了解决翻译模型中的随机生成负样本的不足,以生成高质量的负样本,提高模型的训练效果,提出一种改进的负样本采样的知识表示学习模型TransE-KCB。该模型引入K-Means++聚类算法,形成不同种类的相似性实体簇;在簇中随机挑选5个实体与被替换实体计算它们之间的相似度,选出排名最高的实体,与被替换的实体进行替换;在此基础上,为了解决“假负例”问题,引入布隆过滤器,对“假负例”进行过滤。实验结果表明,与TransE等模型相比较,TransE-KCB模型具有更好的模型表达能力,知识表示的能力得到较大提升。
In order to solve the shortcomings of randomly generated negative samples in the translation model,to generate high-quality negative samples and improve the training effect of the model,the paper proposes an improved knowledge representation learning model for negative sample sampling,which is called TransE-KCB.The model introduced the K-Means++clustering algorithm to form different types of similarity entity clusters.5 entities in the cluster were randomly selected,and the similarity with the replaced entity was calculated.The highest ranked entity was selected and replaced with the replaced entity.On this basis,in order to solve the problem of"false negatives",this paper introduced a Bloom filter to filter"false negatives".The experimental results show that,compared with TransE and other models,the TransE-KCB model has better model expression ability,and the knowledge representation ability has been greatly improved.
作者
徐金诚
葛云生
Xu Jincheng;Ge Yunsheng(School of Information Science and Engineering,Guilin University of Technology,Guilin 541006,Guangxi,China)
出处
《计算机应用与软件》
北大核心
2024年第8期345-350,共6页
Computer Applications and Software
关键词
负样本
翻译模型
三元组分类
知识表示
Negative sample
Translation model
Triplet classification
Knowledge representation