期刊文献+

基于机器学习的聚类序列离群点数据挖掘算法

Clustering Sequence Outlier Data Mining Algorithm Based on Machine Learning
下载PDF
导出
摘要 由于聚类序列离群点数据具有时序依赖性特征,难以精准检测离群点,导致数据挖掘效果不理想.针对该问题,提出了基于机器学习的聚类序列离群点数据挖掘算法,利用机器学习方法进行聚类序列离群点数据聚类处理,计算离群点离群指数;通过机器学习聚合数据,分配离群点数据;遍历数据样本特征序列,计算特征区间适用度,分析特征与目标变量之间关系;将数据分类挖掘问题转换为线性可分问题,避免出现过拟合;设计数据挖掘过程,根据记录每个数据点出现的时间戳,实现数据挖掘.实验结果表明:该算法只是在PSLG数据集与实际离群点占比出现了1%的误差,其余均一致,数据挖掘范围与标定范围一致,具有精准挖掘效果. Due to the temporal dependency of clustering sequence outlier data,it is difficult to accurately detect outliers,resulting in unsatisfactory data mining.Clustering sequence outlier data mining algorithm based on machine learning is proposed to address this issue.Using machine learning methods to cluster outlier data in clustering sequences calculates the outlier index.Aggregating data through machine learn-ing and assigning outlier data,the research traverses the feature sequence of data samples,calculates the applicability of feature intervals,and analyzes the relationship between features and target variables.The research transforms the data classification mining problem into a linearly separable problem to avoid over-fitting,designing a data mining process that records the timestamp of each data point to achieve data min-ing.From the experimental results,it can be seen that the algorithm only has 1%error in the proportion of outliers between the PSLG dataset and the actual ones,while the rest are consistent.The data mining range is consistent with the calibration range,and it has a precise mining effect.
作者 王彩霞 陶健 舒升 WANG Caixia;TAO Jian;SHU Sheng(Anhui Business College of Vocational Technology,Wuhu 241002,China)
出处 《通化师范学院学报》 2024年第8期28-34,共7页 Journal of Tonghua Normal University
基金 安徽省自然科学重点研究项目(2022AH052741) 安徽省质量工程项目(2023cxtd149) 安徽省职业与成人教育学会2023年教育教学研究规划课题(AZCJ2023068)。
关键词 机器学习 聚类序列 离群点 数据挖掘 machine learning cluster sequence outliers data mining
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部