摘要
知识图谱嵌入表示模型将实体与关系转化为低维的向量表示,来表达实体与关系之间的关联语义,是解决知识图谱补全问题的重要方法。传统模型采用随机负采样来构造负例三元组,容易产生低质量负样本,影响表示模型的特征学习能力。基于相似性的负采样方法,对实体点进行聚类,提高了负采样的质量。但针对知识图谱中的稀疏点,因无法控制聚类点数量,导致模型性能降低。经过对相似性负采样和样本点稀疏问题的研究,采用基于密度的聚类算法DBSCAN(Density-Based Spatial Clustering of Applications with Noise)对聚类中的样本进行头尾实体的替换,并对DBSCAN中的领域聚类半径采取了自适应优化,找到合适的聚类中心,降低离群点的数量。同时对于聚类外的离群点进行过采样,构造离群点的相似点,解决稀疏点负采样的问题。最后,将该负采样方法与TransE结合,得到了混合负采样模型TransE-DNS。研究结果表明:TransE-DNS在链路预测和三元组分类任务上取得了更好的效果。
The embedding model of knowledge graph transforms entities and relationships into low dimensional vector representation to express the association semantics between entities and relationships,which is an important method to solve the problem of knowledge graph completion.The traditional embedding model adopts random sampling to construct negative triples,which is easy to produce low-quality negative samples,affecting the feature learning ability of representation models.The clustering-based negative samplings cluster entity points to improve the quality of negative sampling.However,for the sparse points of the knowledge graph,the clustering cannot control the number of clustering points,which leads to the degradation of the model performance.After researching on negative similarity sampling and sparse sample points,we adopt DBSCAN to replace the head and tail entities of the samples in the cluster and adaptively optimize the domain clustering radius in DBSCAN to find a suitable cluster center and reduce the number of outliers.At the same time,oversampling is conducted for outliers to build similarity points,which is used to solve the sparse point problem.Finally,the negative sampling method is combined with TransE to obtain the mixed negative sampling model Trans-DNS.The results show that TransE-DNS has achieved better results in link prediction and triple classification tasks.
作者
奚超亮
冷泳林
XI Chao-liang;LENG Yong-lin(School of Information Science and Technology,Bohai University,Jinzhou 121000,China)
出处
《计算机技术与发展》
2023年第9期168-174,181,共8页
Computer Technology and Development
基金
辽宁省教育科学研究项目(LJ2020016)
渤海大学国家安全研究院项目(XK202134-39)。
关键词
翻译模型
知识图谱
三元组分类
链路预测
DBSCAN
clustering
负采样
translation model
knowledge graph
triple classification
link prediction
density-based spatial clustering of applications with noise clustering
negative sampling