Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph processing.This paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-re...Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph processing.This paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-resorting algorithm(BAR).In order to make the algorithm run in parallel and improve the efficiency of algorithm operation,the BAR algorithm is mapped onto the reconfigurable array processor(APR-16)to achieve vertex reordering,effectively improving the locality of graph data.This paper validates the BAR algorithm on the GraphBIG framework,by utilizing the reordered dataset with BAR on breadth-first search(BFS),single source shortest paht(SSSP)and betweenness centrality(BC)algorithms for traversal.The results show that compared with DBR and Corder algorithms,BAR can reduce execution time by up to 33.00%,and 51.00%seperatively.In terms of data movement,the BAR algorithm has a maximum reduction of 39.00%compared with the DBR algorithm and 29.66%compared with Corder algorithm.In terms of computational complexity,the BAR algorithm has a maximum reduction of 32.56%compared with DBR algorithm and53.05%compared with Corder algorithm.展开更多
In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations t...In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations to achieve flexible switching between different requirements when implemented in hardware.To address this problem,this paper proposes a matrix operation accelerator based on reconfigurable arrays in the context of the application of recommender systems(RS).Based on the reconfigurable array processor(APR-16)with reconfiguration,a parallelized design of matrix operations on processing element(PE)array is realized with flexibility.The experimental results show that,compared with the proposed central processing unit(CPU)and graphics processing unit(GPU)hybrid implementation matrix multiplication framework,the energy efficiency ratio of the accelerator proposed in this paper is improved by about 35×.Compared with blocked alternating least squares(BALS),its the energy efficiency ratio has been accelerated by about 1×,and the switching of matrix factorization(MF)schemes suitable for different sparsity can be realized.展开更多
文摘目的:主观评价和客观评估不同成像参数下CBCT的图像质量,分析图像质量的主观评价和客观评价间的关系。方法:分别采用6台不同品牌CBCT扫描仪〔3D Accuitomo(Morita)、i-CAT(Kavo)、5G(NewTom)、Smart3D(北京朗视)、DCT Pro(Vatech)、VGi(NewTom)〕,在各个品牌的典型曝光条件下(电压和电流强度不同)扫描空间分辨率模体和高仿真头模,7位医师对拍摄的CBCT图像进行主观评价打分,比较不同CBCT扫描仪的空间分辨率和对常见口腔解剖结构的可见性。客观评价指标采用各仪器所获的图像空间分辩率(LP/mm)。结果:7位医师的组内一致性和组间一致性均无显著性差异。主观评价New Tom 5G为2分,i-CAT为5分,其余4个品牌匀为4分,客观评价i-CAT的LP/mm为1.8,Smart3D为2.0,其余4个品牌为1.0~1.7。在相同管电流条件下,不同管电压的图像主观质量有显著性差异。在相同管电压条件下,不同管电流的图像主观质量有显著性差异。结论:图像质量的主客观评价具有一定的一致性,不同品牌之间的客观评价差异可能与电压、电流强度不同有关。
基金the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005)+3 种基金the Shaanxi Province Key R&D Plan(No.2022GY-027)the Key Scientific Research Project of Shaanxi Department of Education(No.22JY060)the Education Research Project of XUPT(No.JGA202108)the Graduate Student Innovation Fund of Xi'an University of Posts and Telecommunications(No.CXJJZL2022011)。
文摘Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph processing.This paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-resorting algorithm(BAR).In order to make the algorithm run in parallel and improve the efficiency of algorithm operation,the BAR algorithm is mapped onto the reconfigurable array processor(APR-16)to achieve vertex reordering,effectively improving the locality of graph data.This paper validates the BAR algorithm on the GraphBIG framework,by utilizing the reordered dataset with BAR on breadth-first search(BFS),single source shortest paht(SSSP)and betweenness centrality(BC)algorithms for traversal.The results show that compared with DBR and Corder algorithms,BAR can reduce execution time by up to 33.00%,and 51.00%seperatively.In terms of data movement,the BAR algorithm has a maximum reduction of 39.00%compared with the DBR algorithm and 29.66%compared with Corder algorithm.In terms of computational complexity,the BAR algorithm has a maximum reduction of 32.56%compared with DBR algorithm and53.05%compared with Corder algorithm.
基金the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005)+3 种基金the Shaanxi Province Key R&D Plan(No.2022GY-027)the Key Scientific Research Project of Shaanxi Department of Education(No.22JY060)the Education Research Project of Xi'an University of Posts and Telecommunications(No.JGA202108)the Graduate Student Innovation Fund of Xi’an University of Posts and Telecommunications(No.CXJJYL2022035).
文摘In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations to achieve flexible switching between different requirements when implemented in hardware.To address this problem,this paper proposes a matrix operation accelerator based on reconfigurable arrays in the context of the application of recommender systems(RS).Based on the reconfigurable array processor(APR-16)with reconfiguration,a parallelized design of matrix operations on processing element(PE)array is realized with flexibility.The experimental results show that,compared with the proposed central processing unit(CPU)and graphics processing unit(GPU)hybrid implementation matrix multiplication framework,the energy efficiency ratio of the accelerator proposed in this paper is improved by about 35×.Compared with blocked alternating least squares(BALS),its the energy efficiency ratio has been accelerated by about 1×,and the switching of matrix factorization(MF)schemes suitable for different sparsity can be realized.