基于机器学习的聚类序列离群点数据挖掘算法

Clustering Sequence Outlier Data Mining Algorithm Based on Machine Learning

下载PDF

导出

摘要由于聚类序列离群点数据具有时序依赖性特征,难以精准检测离群点,导致数据挖掘效果不理想.针对该问题,提出了基于机器学习的聚类序列离群点数据挖掘算法,利用机器学习方法进行聚类序列离群点数据聚类处理,计算离群点离群指数;通过机器学习聚合数据,分配离群点数据;遍历数据样本特征序列,计算特征区间适用度,分析特征与目标变量之间关系;将数据分类挖掘问题转换为线性可分问题,避免出现过拟合;设计数据挖掘过程,根据记录每个数据点出现的时间戳,实现数据挖掘.实验结果表明:该算法只是在PSLG数据集与实际离群点占比出现了1%的误差,其余均一致,数据挖掘范围与标定范围一致,具有精准挖掘效果. Due to the temporal dependency of clustering sequence outlier data,it is difficult to accurately detect outliers,resulting in unsatisfactory data mining.Clustering sequence outlier data mining algorithm based on machine learning is proposed to address this issue.Using machine learning methods to cluster outlier data in clustering sequences calculates the outlier index.Aggregating data through machine learn-ing and assigning outlier data,the research traverses the feature sequence of data samples,calculates the applicability of feature intervals,and analyzes the relationship between features and target variables.The research transforms the data classification mining problem into a linearly separable problem to avoid over-fitting,designing a data mining process that records the timestamp of each data point to achieve data min-ing.From the experimental results,it can be seen that the algorithm only has 1%error in the proportion of outliers between the PSLG dataset and the actual ones,while the rest are consistent.The data mining range is consistent with the calibration range,and it has a precise mining effect.

作者王彩霞陶健舒升 WANG Caixia;TAO Jian;SHU Sheng(Anhui Business College of Vocational Technology,Wuhu 241002,China)

机构地区安徽商贸职业技术学院

出处《通化师范学院学报》 2024年第8期28-34,共7页 Journal of Tonghua Normal University

基金安徽省自然科学重点研究项目(2022AH052741) 安徽省质量工程项目(2023cxtd149) 安徽省职业与成人教育学会2023年教育教学研究规划课题(AZCJ2023068)。

关键词机器学习聚类序列离群点数据挖掘 machine learning cluster sequence outliers data mining

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1殷丽凤,栗庆杰.启发式k-means聚类算法的改进研究[J].大连交通大学学报,2024,45(2):115-119.
2潘桢皓,关东海,袁伟伟,郭然.基于主动学习和持续学习的同义词挖掘模型[J].计算机应用,2024,44(S01):18-23.
3高文胜,许阳飞.基于大数据技术的新媒体数据底座建设与应用[J].广播与电视技术,2024,51(7):47-50.
4毋进波.基于数据挖掘技术的煤炭企业财务管理效率分析与优化[J].中国电子商情,2024(2):40-42.
5刘润畅.社交媒体中虚拟数字人相关知识产权问题研究[J].中国科技成果,2024,25(12):19-21.
6张智骞,丁凤,郭永康,宋世聪.基于用户偏好挖掘算法的IPTV用户多维画像的设计与实现[J].广播与电视技术,2024,51(8):59-63.
7崔丽华,郭文鑫,徐展强,黎皓彬.数据挖掘的配电系统瞬时故障检测信息筛选方法[J].西安工业大学学报,2024,44(4):532-540.
8左丽娜,刘小贞,李伟杰,何首武.多用户源头无线传感网络不完整数据挖掘算法[J].传感技术学报,2024,37(8):1454-1459.
9康耀龙,冯丽露,张景安,曹素娥.基于谱聚类的不确定数据集中快速离群点挖掘算法[J].吉林大学学报（工学版）,2023,53(4):1181-1186. 被引量：1
10张释如,余文瑾,王锐.基于CNN-SVM的输送带纵向撕裂检测方法研究[J].煤炭技术,2024,43(7):201-204.

通化师范学院学报

2024年第8期

浏览历史

内容加载中请稍等...

基于机器学习的聚类序列离群点数据挖掘算法

相关作者

相关机构

相关主题

浏览历史