摘要
在大规模数据中包含过多的冗余信息,当前算法表达事物不够清晰,导致数据信息不能够完全被挖掘,操作效率过低。为此,提出了基于层次频繁模式树设计数据的自动挖掘算法。基于层次频繁模式树定义挖掘任务,以候选集剪枝思想建立数据自动连接矩阵,利用最小支持度裁剪队列自动挖掘数据,完成基于层次频繁模式树的数据自动挖掘算法设计。实验结果表明:动车组的运维数据作为测试样本,分别对不同总量的数据进行挖掘,研究算法能够在规定时间内将数据表达完全,以200万条数据为例本文算法比传统算法的挖掘数量,分别超出了10万条和8万条,提高了其工作效率。
There is too much redundant information in large-scale data,and the current algorithm is not clear enough to express things,which leads to data information that can not be completely mined and the operation efficiency is too low.Therefore,an automatic data mining algorithm based on a hierarchical frequent pattern tree is proposed.The mining task is defined based on the hierarchical frequent pattern tree,the automatic data connection matrix is established based on the idea of candidate set pruning,the data is automatically mined by using the minimum support pruning queue,and the automatic data mining algorithm design based on the hierarchical frequent pattern tree is completed.The experiment results show that the operation and maintenance data of EMU are used as test samples to mine different amounts of data respectively,and the research algorithm can fully express the data within the specified time.Taking 2 million data as an example,the mining number of the proposed algorithm exceeds 100000 and 80000 respectively compared with traditional algorithms,which can improve the work efficiency.
作者
王景兰
方晓
WANG Jinglan;FANG Xiao(Department of Information Engineering,Bozhou Vocational and Technical College,Bozhou 236800,Anhui,China)
出处
《上海电机学院学报》
2022年第4期239-242,248,共5页
Journal of Shanghai Dianji University
基金
安徽省职业教育创新发展试验区资助项目(WJ-ZYPX-003)
安徽省级质量工程资助项目(2020jxtd173)
2020年安徽省高校人文科学研究资助项目(SK2020A0778)
2020亳州职业技术学院人文科学研究资助项目(BYK2029)。
关键词
层次频繁模式树
数据自动挖掘
相关规则
数据源
连接矩阵
hierarchical frequent pattern tree
automatic data mining
relevant rules
data source
connection matrix