A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop 被引量：1

导出

摘要 High-utility itemset mining(HUIM)can consider not only the profit factor but also the profitable factor,which is an essential task in data mining.However,most HUIM algorithms are mainly developed on a single machine,which is inefficient for big data since limited memory and processing capacities are available.A parallel efficient high-utility itemset mining(P-EFIM)algorithm is proposed based on the Hadoop platform to solve this problem in this paper.In P-EFIM,the transaction-weighted utilization values are calculated and ordered for the itemsets with the MapReduce framework.Then the ordered itemsets are renumbered,and the low-utility itemsets are pruned to improve the dataset utility.In the Map phase,the P-EFIM algorithm divides the task into multiple independent subtasks.It uses the proposed S-style distribution strategy to distribute the subtasks evenly across all nodes to ensure load-balancing.Furthermore,the P-EFIM uses the EFIM algorithm to mine each subtask dataset to enhance the performance in the Reduce phase.Experiments are performed on eight datasets,and the results show that the runtime performance of P-EFIM is significantly higher than that of the PHUI-Growth,which is also HUIM algorithm based on the Hadoop framework.

作者 Zaihe Cheng Wei Shen Wei Fang Jerry Chun-Wei Lin

机构地区 School of Internet of Things School of Artificial Intelligence and Computer Science Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence Department of Computer Science

出处《Complex System Modeling and Simulation》 2023年第1期47-58,共12页 复杂系统建模与仿真（英文）

关键词 pattern mining data mining HADOOP PARALLEL high-utility itemset mining big data

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

同被引文献1

1陈侨安,李峰,曹越,龙明盛.基于运行数据分析的Spark任务参数优化[J].计算机工程与科学,2016,38(1):11-19. 被引量：22

引证文献1

1Qinlu He,Fan Zhang,Genqing Bian,Weiqi Zhang,Zhen Li.Research on Performance Optimization of Spark Distributed Computing Platform[J].Computers, Materials & Continua,2024,79(5):2833-2850.

1罗李赛,张微,赵梦露,张建丽,晋庆.轨道交通行业大数据服务器集群设计与实现[J].中文科技期刊数据库（全文版）工程技术,2021(7):359-362.
2李广林,刘朋.Hadoop核心技术的课程设计[J].电子技术（上海）,2023,52(2):316-318.
3Wenli Xu,Wenda Zhong,Chenfan Yang,Rong Zhao,Jing Wu,Xuanke Li,Nianjun Yang.Tailoring interfacial electron redistribution of Ni/Fe_(3)O_(4) electrocatalysts for superior overall water splitting[J].Journal of Energy Chemistry,2022,31(10):330-338. 被引量：2
4Yang Guangwen : supercalculateurs au profit de la recherche scientifique[J].今日中国（法文版）,2016,54(10):30-33.
5刘锦,武优西,王月华,李艳.近似保序序列模式挖掘[J].小型微型计算机系统,2023,44(3):490-496. 被引量：2
6Sandro Rasgado.BABY BOTTLE TOOTH DECAY What is this Dental Disease and How Can It Be Prevented?[J].城市漫步（GBA版）,2017(10):52-52.
7Wei Chen,Ao Xu,Hejun Zhang,Mingquan Sheng,Yue Liang,Frederic Skoczylas.The Effect of Different Freeze-Thaw Cycles on Mortar Gas Permeability and Pore Structure[J].Fluid Dynamics & Materials Processing,2023,19(6):1623-1636.
8AROUND TOWN[J].城市漫步（GBA版）,2018(2):48-49.
9Adnan Ali,Jinlong Li,Huanhuan Chen,Ali Kashif Bashir.Temporal pattern mining from user-generated content[J].Digital Communications and Networks,2022,8(6):1027-1039. 被引量：1
10李军.海量数据管理技术在桥梁结构监测中的应用[J].中国科技期刊数据库工业A,2023(4):169-172.

Complex System Modeling and Simulation

2023年第1期

浏览历史

内容加载中请稍等...

A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop 被引量：1

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史