摘要
挖掘隐藏在网络中不同于正常数据对象的离群点是数据挖掘的重要任务之一.目前,针对双类型异质信息网络离群点检测的研究工作相对较少,原本适用于同质网络的离群点检测方法将很难适用于双类型异质网络.为此,提出了异质信息网络中基于排序和聚类的离群点检测方法(RKBOutlier).从异质信息网络中抽取两种类型的对象以及链接两种对象的语义信息,将待检测的数据作为属性对象,将另一类型数据作为目标对象,对目标对象进行聚类来检测属性对象在各个聚类中的分布情况,数据分布异常的对象即为离群点.将排序和聚类相结合来显著提高聚类的准确度.实验结果表明,RKBOutlier可以在双类型异质信息网络中有效地检测出离群点.
Mining the outliers that are different from normal data objects in the network is one of the important tasks in data mining. At present, the research aiming at outlier detection in bi-typed heterogeneous information network is relatively small. The methods which are applicable to homogeneous network can not be applied to bi-typed heterogeneous networks. Therefore, we propose a Rank-Kmeans Based Outlier detection method, called RKBOutlier, in heterogeneous information net- work. The two kinds of the objects and the connected semantic information are extracted from the heterogeneous information network. One type of the objects is regarded as the attribute objects, another type of the objects is regarded as the target ob- jects. We perform cluster partitioning on target objects to detect the distribution of the attribute objects in each cluster. The objects which are abnormal at data distribution are considered to be the outliers. Ranking and clustering are combined to sig- nificantly improve the accuracy of clustering. The experimental results show that RKBOutlier can effectively detect outliers in bi-typed heterogeneous information network.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2018年第2期281-288,共8页
Acta Electronica Sinica
基金
国家自然科学基金(No.60903098)
吉林大学研究生创新基金(No.2016183
No.2016184)
关键词
离群点检测
排序
聚类
目标对象
属性对象
outlier detection
ranking
clustering
target object
attribute object