摘要
针对仅考虑网络结构来对异质信息网络进行异常点发现可能带来的结果失真、难以理解等问题,提出一种富属性异质信息网络的可约束异常检测算法.通过将信息丰富的交互数据建模成富属性异质信息网络,以带属性元路径来指定用户感兴趣的属性和子空间,综合网络结构和属性内容两方面来评估节点的异常度,给出了可约束的异常检测算法框架.在Arxiv真实数据集上进行了实验,以带属性元路径来指定对作者、论文及论文的标题和摘要等方面的约束,对多个查询输出了异常度从高到低的节点列表及约束域异常点集合.结果表明:相比仅考虑网络结构或仅考虑属性内容的基准算法,平均准确率提高12.95%以上.
For heterogeneous information networks,anomalous vertex detection taking into account network structures would possibly distort the results or produce complicated results.To solve this problem,an algorithm for constrained anomaly detection in attributed heterogeneous information networks(CADAHIN)was proposed.In this method,interactive data with rich information was modeled as an attributed heterogeneous information network,where users are allowed to specify attributes and sub-spaces through attributed meta paths and evaluate the outlierness of vertexes in terms of network structure and attribute content.On this basis,a constrained anomaly detection algorithm framework was presented.Experiments were conducted on the real-world dataset Arxiv.Under the constraints specified by attributed meta paths on author,paper,title and abstract,the queries output a top-klist of anomalous vertexes and a set of anomalous vertexes in the constraint domain.The results show that the proposed method outperforms the baseline algorithms considering only network structures or attribute contents by over at least 12.95%.
作者
张蕊
张桂发
郭记眀
蒋洪波
Zhang Rui;Zhang Guifa;Guo Jiming;Jiang Hongbo(School of Computer Science and Technology;Hubei Key Laboratory of Transportation Internet of Things;Hubei Key Laboratory of Inland Shipping Technology, Wuhan University of Technology, Wuhan 430070, China;School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China)
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2017年第12期26-31,共6页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(61572219
61502192
61671216
61471408
51479157
51679182)
中央高校基本科研业务费专项资金资助项目(WUT:2016Ⅲ028)
内河航运技术湖北省重点实验室基金资助项目(NHHY2015005)
关键词
异常检测
异质信息网络
富属性
元路径
相似性
anomaly detection
heterogeneous information network
rich attribute
meta path
simi-larity