摘要
结合传统消除系统中重复数据吞吐量不可控、相似判断完成时间长的问题,利用关联规则,改进设计了海量重复数据消除系统。为增强相似数据搜索速度、保证消除可靠性,对系统的总体框架进行设计,依据框架将硬件部分重新划分为重复数据检测、总吞吐量提升、消除可靠性保证、系统安全四个主要模块。应用TCP/IP,NetBEUI,IPX/SPX三种协议相结合的方式代替传统系统的NetBEUI传输协议作为核心运行基础,建立基础数据序列关联规则优化重复数据检测编码,确保海量重复数据消除流程的高效性和稳定性,完成消除系统设计。采用传统系统和改进系统对比的方式进行实验,实验结果表明,该系统的重复数据吞吐量始终维持在可控范围内,且相似判断完成时间可缩短至传统系统的1 2左右。
The traditional data elimination system has the disadvantages of uncontrollable repeating data throughput and long similar judgment completion time.Therefore,a massive repeating data elimination system based on association rules is designed.The system overall framework is designed to enhance the search speed of similar data and guarantee the elimination of reliability.The hardware of the system is divided into repeated data detection module,total throughput upgrading module,reliability assurance module and system security module.The combination mode of TCP/IP,NetBEUI and IPX/SPX is used to replace the traditional NetBEUI transmission protocol as the core operation basis.The basic data sequence association rules are established to optimize the repeated data detection coding,so as to ensure the efficiency and stability of the massive duplication data elimination process,and complete the system design.The contrast experimental results show that the massive repeating data elimination system based on association rules can maintain the repeating data throughput in the controllable range,and shorten the similar judgment completion time to about half of the traditional system.
作者
连雁平
LIAN Yanping(Wuyi University,Wuyishan 354300,China)
出处
《现代电子技术》
北大核心
2018年第23期27-31,共5页
Modern Electronics Technique
基金
福建省自然科学基金项目(2017J01406)
福建省中青年教师教育科研项目(JA15527)
武夷学院高级引进人才科研启动项目(YJ201607)~~
关键词
关联规则
数据消除
系统框架
冗余纠正
通信接口
数据序列
检测编码
消除流程
association rule
data elimination
system framework
redundancy correction
communication interface
data sequence
detection coding
elimination process