摘要
为了优化水务平台上多源异构数据资源的整合效果并降低重复数据资源存在的可能性,以及节省数据资源存储空间,文章引入模糊聚类算法,进行了基于该算法的水务平台多源异构数据资源整合方法的研究。首先,对存在质量问题的脏数据进行清洗处理,从而提高数据资源的质量。其次,映射多源异构数据模式与本体关系,为后续数据资源整合效率的提高奠定基础。在此基础上,利用模糊聚类算法计算整合信息熵,并以整合初始聚类中心为标准对水务平台上的多源异构数据资源进行整合。实验结果表明,应用提出的整合方法后,水务平台上的数据冗余比例始终小于对照组,最大不超过5%,说明多源异构数据资源存在重复的可能性较小。
In order to optimize the integration effect of multi-source heterogeneous data resources on the water management platform,reduce the possibility of duplicate data resources,and save data resource storage space,a fuzzy clustering algorithm was introduced to conduct research on the integration method of multi-source heterogeneous data resources on the water management platform based on this algorithm.Firstly,clean and process dirty data with quality issues to improve the quality of data resources.Secondly,mapping multi-source heterogeneous data patterns and ontology relationships lays the foundation for improving the efficiency of subsequent data resource integration.On this basis,the fuzzy clustering algorithm is used to calculate the integration information entropy,and the integration of multi-source heterogeneous data resources on the water management platform is carried out based on the integration initial clustering center as the standard.The experimental results show that after applying the proposed integration method,the redundancy ratio of data on the water management platform is always lower than that of the control group,with a maximum of no more than 5%.This indicates that the possibility of duplication of multi-source heterogeneous data resources is relatively small.
作者
吕文斌
LV Wenbin(Shanghai Big Data Center,Shanghai 200050,China)
出处
《计算机应用文摘》
2024年第9期109-111,共3页
Chinese Journal of Computer Application
关键词
模糊聚类算法
水务平台
多源异构数据
fuzzy clustering algorithm
water affairs platform
multi-source heterogeneous data