摘要
传统数据挖掘方法对数据挖掘时必须为高速通信网络,而且还导致系统响应时间延长,对数据安全性产生威胁。文中以分布式环境为背景,提出基于熵值思想的聚类挖掘改进方法,实现网络多层次数据挖掘。设定网络多层次数据聚类参数,计算产生新聚类数,将该数据值作为聚类搜索范围的上限值kmax,选取合适的有效性Silhouette指标,结合最大最小距离理论设置的聚类中心,获得最佳聚类数目;运用熵值理论及动态规划思想形成改进聚类挖掘方法,运用熵值理论判定数据属性权重值,并获取多层次数据对象与邻近数据间的权重关系,将欧氏距离当作数据相似度衡量依据;利用动态规划思想计算获得最大k个数据对象,确定多层次数据挖掘聚类中心。实验证明,利用文中改进数据挖掘方法可有效挖掘网络多层次数据中的有价值信息。
Traditional data mining methods must be high-speed communication networks for data mining,but also lead to longer response time and threat to data security. Based on the distributed environment,this paper proposes an improved clustering mining method based on entropy value to realize multi-level data mining. According to the network data of multi level clustering parameters set in advance,produce new clustering number by calculating the data value as clustering the search range of the upper limit of kmax,select the appropriate indicators of the effectiveness of Silhouette,with the maximum and minimum distance clustering center set theory,obtain the optimal number of clusters; using the entropy theory and dynamic programming form improvement clustering mining method,determine the data value of attribute weight by entropy theory,and obtain the weight hierarchy data object and the adjacent data between the Euclidean distance as a similarity measure based on the maximum data; k data object is calculated by using the dynamic programming to determine the multi-level data mining clustering center. Experimental results show that the improved data mining method can effectively mine valuable information in multi-level data.
出处
《科技通报》
2018年第5期208-211,共4页
Bulletin of Science and Technology
基金
2017年西安市社科规划基金项目(项目编号:17Z61)
关键词
分布式网络
数据挖掘
多层次数据
有价值信息
distributed network
data mining
multilevel data
valuable information