摘要
多源异构数据可能来自不同领域、不同格式和不同质量的数据源,处理难度较大,针对多源异构数据难以精准挖掘的问题,提出基于决策树分类的多源异构数据挖掘算法。构建决策树划分数据属性,对初始决策树实施剪枝处理,得出多源异构数据属性集,提取出多源异构数据因子,获取粗略的数据挖掘结果。再使用深度学习算法进一步挖掘出其余数据中残存的多源异构数据,并对原始多源异构数据集实施二次挖掘,将粗细挖掘结果整合后实现多源异构数据挖掘。实验结果表明,所提算法的F1值较高,泛化误差较低,数据挖掘性能较强。
Multi-source heterogeneous data may come from different fields and have different formats.In addition,the data source may have different qualities,so it is difficult to process multi-source heterogeneous data.To address the problem of difficulty in accurately mining heterogeneous data from multiple sources,this paper presented a multi-source heterogeneous data mining algorithm based on decision tree classification.At first,we constructed a decision tree to partition data attributes,and then pruned the initial decision tree,thus obtaining an attribute set of multi-source heterogeneous data.Moreover,we extracted the data factors,and thus to obtain rough data mining results.Furthermore,we used the deep learning algorithm to mine the remaining multi-source heterogeneous data,and then implemented secondary mining on the original multi-source heterogeneous dataset.Finally,we achieved the multisource heterogeneous data mining after integrating the coarse and fine mining results.Experimental results show that the proposed algorithm has high F1 value,low generalization error,and strong data mining performance.
作者
刘诗瑾
杨知玲
LIU Shi-jin;YANG Zhi-ling(Zhujiang College of South China Agricultural University,Guangzhou Guangdong 510600,China;Wuhan University,Wuhan Hubei 430072,China)
出处
《计算机仿真》
2024年第8期513-516,534,共5页
Computer Simulation
基金
广东省教育厅本科高校教学质量与教学改革工程项目(粤教高函【2023】4号-1084)。
关键词
决策树
数据分类
多源异构数据
数据挖掘
深度学习算法
Decision tree
Data classification
Multi-source heterogeneous data
Data mining
Deep learning algorithm