摘要
图计算在机器学习、数据挖掘、网络安全等领域都有着重要应用。而图数据结构不规则且规模巨大,导致访存成为图应用运行时的瓶颈。由于图数据的顶点度数服从幂律分布,许多研究通过重排序图数据使高度数顶点连续储存在相邻位置,从而提升图数据被访问时的时间与空间局部性。然而重排序会破坏原始图数据中存在的群落结构,导致目前基于顶点度数信息的重排序算法在高结构性图数据集上无法产生性能提升。本文针对上述问题提出了一种新的保护图数据结构性的重排序算法,通过对自然图数据集中存在的群落结构进行研究,结合处理器访存结构特性,将图数据集合理划分成不同区域后进行重排序,保护其群落存储顺序以提高重排序后访存时的局部性。本文在通用处理器平台上,对6个不同结构性图数据集和3种图计算应用共18个测试点进行了验证,实验结果表明,该重排序方法相对于原始图数据集实现了平均18.86%的性能提升,相对于目前最优的基于顶点度数信息的轻量级重排序算法实现了平均11.3%的性能提升。
Graph analytics power a range of applications in areas as diverse as machine learning,data mining and network security.However,due to the irregular memory access patterns and the large scale of graph data,memory accessing has become the bottleneck of graph applications.The vertex degree of graph follows a power-law distribution,and many recent researches use this property to store high degree vertexes in adjacent position of memory by graph reordering to increase spatial and temporal locality of memory access.However,graph reordering usually destroys the community structure in original graph,causing the performance degradation in structure graph datasets.To overcoming limitations of existing reordering techniques,the area graph reordering is proposed,which is a novel lightweight reordering technique that divides graph into areas before reordering to protect the original community structure of graph and increase the spatial and temporal locality.The proposed reordering algorithm is evaluated through testing six different real graph datasets and three graph applications on an Intel server.The results show that the proposed reordering technique yields average speed-up of 18.86%and 11.3%,compared with the baseline of no reordering and the state-of-the-art reordering technique based on vertex degree,seperately.
作者
李策
章隆兵
LI Ce;ZHANG Longbing(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;School of Computer Engineering,University of Chinese Academy of Sciences,Beijing 100190)
出处
《高技术通讯》
CAS
2022年第9期903-913,共11页
Chinese High Technology Letters
基金
中国科学院战略性先导科技专项(C类)课题(XDC05020100)资助项目。
关键词
图数据重排序
顶点度数
访存局部性
幂律分布
群落结构
graph reordering
vertex degree
memory access locality
power-law distribution
community structure