摘要
物流大数据已经成为港口关键的生产要素,分析和利用大数据可有效控制经营风险,促进港口的健康可持续发展。本文基于Hadoop设计快速DBSCAN密度聚类算法,引入熵优化DBSCAN算法的核心点选择。在HDFS分布式文件系统中对大数据分块,采用Map对各个数据块完成初始聚类,并在Reduce上基于核心点扩展类融合形成最终聚类结果,以此提高大数据应用的效率。通过应用大数据对港口企业进行全面管理,为企业决策提供有效支持。
Logistics big data is a key production factor of ports.So it can effectively control operating risks and promote the sustainable development of ports by using big data.In this paper,the clustering algorithm of big data based on Hadoop is proposed.Entropy is defined to optimize the core point selection of DBSCAN algorithm.Then the big data is blocked on HDFS,and the initial clustering is completed on the Map for each block.And the final clustering are formed on Reduce based on extension class of core point.The efficiency of big data clustering is improved by using this method.It can realize the overall management of port enterprises and provide effective support for enterprise decision-making by applying big data.
作者
王妍妍
王艳宁
刘佳新
任家东
WANG Yanyan;WANG Yanning;LIU Jiaxin;REN Jiadong(School of Economics and Management,Yanshan University,Qinhuangdao,Hebei 066004,China;School of Science,Yanshan University,Qinhuangdao,Hebei 066004,China;School of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004,China)
出处
《燕山大学学报》
CAS
北大核心
2023年第3期216-220,228,共6页
Journal of Yanshan University
基金
河北省社会科学基金资助项目(HB18GL074)。