摘要
非结构化网络具有数据量大、异常分散、复杂性高的特点,传统方法通过欧几里德距离衡量特性因子,将特异性因子较大的数据看作差异数据,容易受网络环境影响,导致挖掘结果不可靠。为此提出一种新的非结构化网络分布式差异数据实时挖掘方法。通过HISTORY系统对非结构化网络数据进行实时采集。通过信息熵衡量连续型随机变量,求出各时间段内非结构化网络数据若干特征要素的熵值,按照熵值对数据进行分类。通过独立分量分析将正常数据信号和差异数据信号分离。通过形成频繁项集和形成强关联规则两个步骤对非结构化网络分布式差异数据进行关联分析。依据强关联规则求出支持度与置信度,通过比较数据强关联规则的相似度实现差异数据的实时挖掘。实验结果表明,所提方法能够有效实现差异数据实时挖掘,与其它方法相比挖掘结果更加准确。
This paper proposes a real time mining method for distribution discrepant data of unstructured network. We measured random variable of continuous type via information entropy and solved entropy of some feature element of data in the unstructured network, then classified the data according to the entropy. We separated normal data signal from discrepant data signal via independent component analysis and carried out correlation analysis for the distribution discrepant data via forming frequent item set and forming strong correlation rule. According to the strong correlation rule, we solved support degree and credibility. Finally, we achieved the real time mining via comparing similarity of the strong correlation rule. Simulation results show that the proposed method can achieve the real time mining effectively. Mining results are more accurate.
作者
周鹏
ZHOU Peng(International Education College,Huanghuai University,Zhumadian Henan 463000,China;Behavioral and Security Depth Learning Big Data Henan Engineering Laboratory,Huanghuai University,Zhumadian Henan 463000,China)
出处
《计算机仿真》
北大核心
2018年第9期333-337,共5页
Computer Simulation
基金
教育部科技发展中心基金课题(2017B00011)
关键词
非结构化网络
分布式
差异数据
实时挖掘
Unstructured network
Distributed
Discrepant data
Real-time mining