摘要
为了提高离群数据检测精度和效率,提出了一种基于相关子空间的离群数据检测算法。该算法首先根据数据局部密度分布特征得出稀疏度矩阵,通过高斯相似核函数放大稀疏度特征;然后计算各属性维中数据稀疏度相似因子,确定子空间向量及相关子空间,结合数据稀疏度和维度权值得出数据对象的离群因子,选取最大的若干个对象为离群数据;最后采用人工数据集和UCI实验数据集验证算法准确性和有效性。
In order to improve the accuracy and efficiency of outlier detection,an outlier detection algorithm based on correlation subspace is proposed.Firstly,the sparsity matrix is obtained according to the local density distribution of data,and the sparsity feature is amplified by Gaussian similarity kernel function.Then,the data sparsity similarity factor in each attribute dimension is calculated,and the subspace vector and correlation subspace are determined;The outlier factors of data objects are obtained by combining data sparsity and dimension weight,and the largest objects are selected as outlier data.Finally,the artificial data set and UCI experimental data set are used to verify the accuracy and effectiveness of the algorithm.
作者
赵向兵
张天刚
ZHAO Xiang-bing;ZHANG Tian-gang(School of Computer and Network Engineering,Shanxi Datong University,Datong,Shanxi 037009,China)
出处
《计算技术与自动化》
2022年第1期82-86,共5页
Computing Technology and Automation
基金
山西省教育科学“十三五”规划项目(GH-18044)
山西大同大学科研基金项目(2017K11)
山西大同大学教学改革创新项目(XJG2020211)。
关键词
数据挖掘
离群数据
稀疏度
高斯核函数
相似度因子
相关子空间
仿真实验
算法分析
data mining
outlier data
sparsity
Gaussian kernel function
similarity factor
correlation subspace
simulation experiment
algorithm analysis