摘要
高频数据易出现异常且出于无序状态,研究基于局部离群点检测的高频数据共现聚类算法。利用可变网格划分的局部离群点,挖掘高频数据集内的高频数据对象,剔除异常高频数据对象,降序排列各个高频数据对象的局部离群因子值,获取较大离群因子值的高频数据对象,提升高频数据共现聚类的执行效率;计算获取的高频数据对象共现相似度,得到高频数据共现相似度矩阵,根据相似度矩阵合并包含最大相似性的聚类,完成高频数据共现聚类。实验结果表明:能准确检测出高频数据集内离群点数量,高频数据共现聚类执行效率快、准确性高。
Generally,high frequency data have defects,such as easy to appear abnormal and out of order.Therefore,this paper studies the co-occurrence clustering algorithm of high frequency data based on local outlier detection.Firstly,high-frequency data objects in high-frequency data set were mined by local outliers of variable mesh.Secondly,abnormal high-frequency data objects were excluded,and the local outlier factor values of each high-frequency data object were arranged in descending order to obtain high-frequency data objects with larger outlier factor values,thus the execution efficiency of high-frequency data co-occurrence clustering was improved.Then,the co-occurrence similarity of high-frequency data objects was calculated to get the co-occurrence similarity matrix of high-frequency data.Finally,the clustering of high-frequency data co-occurrence was completed by merging the clusters containing the maximum similarity through the similarity matrix.Simulation results show that the algorithm can accurately detect the number of outliers in high frequency data sets,and has high efficiency and accuracy.
作者
周志洪
马进
夏正敏
陈秀真
ZHOU Zhi-hong;MA Jin;XIA Zheng-min;CHEN Xiu-zhen(Institute of Network Security Technology,Shanghai Jiao Tong University,Shanghai 200240,China;Key Laboratory of Information Security Integrated Management Technology,Shanghai 200240,China)
出处
《计算机仿真》
北大核心
2021年第3期482-486,共5页
Computer Simulation
基金
上海市工业强基专项项目智能网联汽车信息安全研发与公共服务平台(GYQJ-2018-3-03)。
关键词
局部离群点
高频数据
共现相似度
可变网格划分
Local outlier
high frequency data
co-occurrence similarity
variable mesh generation