摘要
随着大数据、云计算应用的日益普及,作为数据处理的内存系统起着重要作用,而现有的内存系统分析方法已无法满足人类的需求,因此提出基于相似度计算的大数据访存踪迹聚类方法。首先对数据项进行排列,使平面上相似度高的数据相互靠近,并计算出内存数据项与其它邻域间的内存数据项相似度。然后设计相似度矩阵,使蚂蚁根据相似度矩阵有选择的进行映射,与数据项进行重新关联,通过余弦相似度对数据项的距离进行计算,以相似度由高到低的顺序对相似矩阵进行排序处理,形成相似度序列矩阵。最后为了能够在短时间内对内存踪迹进行准确的识别与跟踪,将分层抽样与蚁群聚类方法相结合,对数据项中的内存数据进行划分,选择初始点,降低分层过程中因初始点随机选择产生的影响。实验结果表明,通过对随机选择的10种应用进行访存踪迹的采集与测试分析,验证了所提方法的有效性,为今后应用的访存行为以及访存踪迹的聚类提供了强有力的理论支持。
With the increasing popularity of big data and cloud computing applications,the memory system as a data processing system plays an important role,and the existing memory system analysis methods can not meet human needs.Therefore,a big data access trace clustering method based on similarity calculation is proposed.Firstly,the data items were arranged so that the data with high similarity on the plane were close to each other,and the similarity between the memory data items and other neighbors was calculated.Then the similarity matrix was designed to make the ants map selectively according to the similarity matrix,reassociate with the data items,calculate the distance of the data items through the cosine similarity,and sort the similarity matrix in the order of similarity from high to low to form the similarity sequence matrix.Finally,in order to accurately identify and track the memory trace in a short time,the hierarchical sampling and ant colony clustering method were combined to divide the memory data in the data item and select the initial point to reduce the impact caused by the random selection of the initial point in the hierarchical process.The experimental results show that through the collection,test and analysis of memory access traces for 10 randomly selected applications,the effectiveness of this method is verified,which provides a powerful method for memory access behavior and memory access trace clustering in future applications.
作者
李明倩
王苗
刘芳
LI Ming-qian;WANG miao;LIU Fang(City College,Wuhan University of Science and Technology,Wuhan Hubei 430083,China;Wuhan University,Hubei Wuhan 430072,China)
出处
《计算机仿真》
北大核心
2023年第3期485-489,共5页
Computer Simulation
基金
2020年湖北省教育厅大学生创新创业训练计划项目(S202013235008)。
关键词
相似度
相似度矩阵
数据项
分层抽样
蚁群聚类
Similarity
Similarity matrix
Data items
Stratified sampling
Ant colony clustering