摘要
为解决多源数据间广泛存在的冲突问题,真值发现成为一个热门的研究课题。现有的真值发现算法通常基于这一原则:如果一个信息源总是提供真实的信息,那么就会更加可信;如果一条信息由可信的信息源支持,那么就更有可能是真实的。现有的真值发现算法虽然在大部分场景下取得了较好的效果,但大多忽略了实体属性之间的关系。在本文中,提出了一种新的模型,该模型采用图嵌入方式在真值发现的同时捕捉了实体属性间的关系。通过构建4种异构网络,包括源-源、源-属性值、实体属性-实体属性、实体属性-实体属性值网络,以对数据之间的关系建模。接着将这些网络嵌入到低维空间中,使得可靠的来源和可靠的属性值彼此接近,实体属性之间的关系反映在属性值上,从而进行真值发现推理。在2个真实数据集上的实验,表明本文的算法优于现有的真值发现算法。
In order to solve the widely existing conflicts among multi-source data,truth discovery has become a hot topic.Existing truth discovery algorithms are usually based on such a principle:when a source always offers true information,it would be more trustworthy.When the information is supported by trustworthy sources,it would be believed to the truth.However,even though existing truth discovery algorithms have achieved good results in most scenarios,the relationship between entity attributes could be mostly ignored.This paper proposes a new model that uses graph embedding to obtain truth and captures the relationship between entity attributes.This model builds four heterogeneous networks,including source-source,source-attribute value,entity attribute-entity attribute,and entity attribute-entity attribute value networks,to capture the relationship among data.These networks are then embedded into the low dimensional space,so that the reliable sources and reliable attribute values are close to each other,and the relationships between entity attributes can be reflected in the attribute values,therefore truth discovery inference could be carried out.Experiments on two real-world datasets show that the proposed algorithm is superior to the state-of-the-art truth discovery algorithms.
作者
吕航
Xiu Susie Fang
司苏新
王康
LÜHang;Xiu Susie Fang;SI Suxin;WANG Kang(School of Computer Science and Technology,Donghua University,Shanghai 201620,China)
出处
《智能计算机与应用》
2022年第10期9-14,共6页
Intelligent Computer and Applications
关键词
图嵌入
真值发现
实体属性关系
异构网络
graph embedding
truth discovery
entity attributes relation
heterogeneous network