期刊文献+

基于最小哈希的网络单信道重复数据剔除算法 被引量:1

Duplicate Data Elimination of Network Single-Channel Based on Minimum Hash
下载PDF
导出
摘要 剔除重复数据是保证网络高效运行不可缺少的步骤,但该过程易受信号强度、网络装置、路由器性能等问题的干扰。为此,提出基于最小哈希的网络单信道重复数据剔除算法。首先利用哈希算法中的散列函数对网络单信道数据实行聚类处理,然后采用带有监督判别的投影算法对聚类后的数据进行降维处理,最后采用代数签名预估数据,保证数据之间的计算开销最小,再构造最小哈希树生成校验值,在更新去重标签的同时,通过双层剔除机制完全剔除单信道中的重复数据。实验结果表明,该算法的执行时间短,且计算和存储开销较小。 Eliminating duplicate data is an indispensable step to ensure efficient network operation. But this process is susceptible to interference from signal strength, network device, router performance and other problems. Therefore, a minimum-hashing algorithm for single channel data elimination is proposed. First the hash function in the hash algorithm network is used for single channel data clustering, and then supervision discriminant projection algorithm is applied for clustering of data dimension reduction after processing, finally the algebraic sign estimate is used to guarantee the data between the computing cost minimum and to construct minimum hash tree generated calibration value, in the update to heavy tags. The repeated data in a single channel is completely eliminated by double-layer culling mechanism. Experimental results show that the algorithm has short execution time and low computation and storage cost.
作者 邬剑飞 周路明 刘小强 WU Jianfei;ZHOU Luming;LIU Xiaoqiang(Cancer Hospital Affiliated of Tongji Medical College,Huazhong University of Science and Technology,Wuhan 430079,China;College of Applied Engineering,Henan University of Science and Technology,Sanmenxia 472000,China)
出处 《吉林大学学报(信息科学版)》 CAS 2023年第2期367-373,共7页 Journal of Jilin University(Information Science Edition)
基金 河南省教育厅重点科研基金资助项目(22B413007)。
关键词 散列函数 原始聚类中心 近邻局部图 约束目标函数 代数签名 哈希树 网络信道 hash function original cluster center nearby local graph constrained objective function algebraic signature hash tree network channel
  • 相关文献

参考文献10

二级参考文献76

共引文献93

同被引文献8

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部