大样本序列重叠碰撞统计检验方法

Statistical Tests on Overlapping Collisions with Large Samples

下载PDF

导出

摘要随机数发生器(RNG)产生的随机数质量直接关系到密码系统的安全性,而随机数统计假设检验是常用的随机数质量评测方法,在密码应用实践中发挥着重要作用.近年来,信息技术的发展极大提升了大数据处理能力,RNG的随机数统计检验也面临着更大数据量、更强检验能力和更快检验速度的迫切需求.本文针对现有检验方法在大数据量(GB级)检验适用性方面的问题,对大样本下的长模板重叠碰撞类检验进行了深入研究.首先,我们构造了碰撞对数统计量,并在此基础上通过理论推导给出了一种适用各种计数方式的新型碰撞统计模型.继而,我们得到了n比特长序列m比特模板的重叠碰撞对数的理论期望值,并基于该期望值通过统计分析,提出了两种序列重叠碰撞检验方法:(1)我们基于重叠碰撞数期望可作为碰撞出现概率上界的事实,利用极小非平凡碰撞期望拒绝较长碰撞出现这一小概率事件,给出了重叠碰撞拒绝检验;(2)基于大数定律,并通过多个样本构造样本方差,我们给出了无需已知理论方差的多样本t分布检验.针对现有碰撞搜索方法在重叠统计方式下的资源占用高、搜索效率低等问题,我们提出了一种资源受限下基于CUDA的快速搜索方法,即通过前缀分类、后缀排序等优化技术,能够降低内存等硬件资源1/2b倍(其中b为前缀比特长度),同时也提高算法并行性,再利用CUDA并行技术,实现了重叠碰撞搜索在通用计算机上的实际可行性.我们利用提出的优化算法进行具体实验,在我们的实验平台上4200 s左右可完成10GB数据的64比特以上的重叠碰撞搜索与碰撞信息存储.同时通过简单的数据构造实验展示了我们所提出的新方法的优势,即能够发掘现有NIST检验方法无法发现的长模板非随机因素.最后我们利用1000 GB随机数据给出了1GB随机数据的长模板统计特征,也验证了本文所提出的检验方法的有效性.本文提出的重叠碰撞检验方法结合快速搜索工具,能够满足实际大数据量随机数测评需求,更好地保障密码系统安全性. The quality of the random number generated by the random number generator is directly related to the security of the cryptographic system,and the statistical hypothesis test for random-ness is the commonly used method for the quality evaluation of random number,which plays an important role in the practice of cryptography.In recent years,the development of information technology has greatly improved the ability of big data processing.Accordingly,the statistical test for randomness is also facing the urgent needs of larger sample,stronger test ability and faster evaluation speed.Considering the applicability of existing test methods in large(GB-level)sample evaluation,this paper makes an in-depth study on tests for long template overlapping collision in large samples.First,we construct the statistic on the number of collision pairs.Based on this statistic,a new collision statistical model suitable for all counting methods is proposed through theoretical derivation.Then we derive the expected value of the overlapping collision statistic with the parameters of m-bit template of n-bit sequence.Based on this conclusion,we propose two statistical tests on the overlapping collisions with some theoretical analysis:(1)Based on the fact that the expected number of overlapping collisions can be used as the upper bound of the probability of collision occurrence,we use the nontrivial collision expectation that is small enough to reject the small probability event of a long collision occurrence,and introduce the overlapping collision rejection test;(2)Based on the law of large samples and constructing the sample variance from multiple samples,we provide the approximate t-distribution test with multiple samples that does not require a known theoretical variance.Considering the problems of high resource consumption and low search efficiency of existing collision search methods under the overlapping situation,we propose a fast search method based on CUDA with resource constraints,which can reduce hardware resources such as memory by 1/2 times through optimization skills such as prefix classification and suffix sorting.These skills also improve the parallelism of the algorithm,and then we use the CUDA parallel technology to realize the practical feasibility of overlapping collision search on general computers.We take advantage of the proposed optimization algorithm to carry out specific experiments,which can complete the overlapping collision search and collision information storage of more than 64-bit templates of 10GB data in about 4200 seconds.At the same time,through simple data construction experiments,the advantages of our proposed new test are demonstrated,that is,it can discover the non-random factors with long templates that cannot be detected by the NIST statistical tests for randomness.Finally,we use 1oooGB of random data to show the statistical distribution of long overlapping collision templates for 1GB of random data,and also verify the effectiveness of the theoretical models and testing methods proposed in this paper.The overlapping collision tests proposed in this paper,combined with the fast search tool,can meet the actual evaluation needs of large samples,and better ensure the security of cryptographic systems.

作者陈东昱范丽敏陈华王舰付一方 CHEN Dong-Yu;FAN Li-Min;CHEN Hua;ANG Jian;FU Yi-Fang(Trusted Compuing and Information Assurance Laboratory,Institute of Sofrware,Chinese Academy of Sciences,Beijing100190;Unirversity of Chinese Academy of Sciences,Bejing 100049)

机构地区中国科学院软件研究所可信计算与信息保障实验室中国科学院大学

出处《计算机学报》 EI CAS CSCD 北大核心 2023年第8期1636-1649,共14页 Chinese Journal of Computers

基金国家重点研究计划项目基金(2020YFA0309704)资助。

关键词随机性统计检验大样本重叠碰撞并行搜索分类排序 statistical test for randomness large sample overlapping collision parallel search classify and sort

分类号 TP309 [自动化与计算机技术—计算机系统结构]