In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, ...In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, and then several performance optimization techniques which are loop optimization, cache optimization, false sharing optimization, synchronization overhead op-timization, and thread affinity optimization, are used to improve the implementation's performance on multi-core systems. The performance of the parallel SEBVHOS algorithm is analyzed on a dual-core system. The experimental results show that the parallel SEBVHOS algorithm achieves an average of 1.86x speedup. More importantly, our method does not come with additional aliasing artifacts, com-paring to the original SEBVHOS algorithm.展开更多
针对国防科技大学自主研发的异构多核数字信号处理(digital signal processing, DSP)芯片的特征以及卷积算法自身特点,提出了一种面向多核DSP架构的高性能多核并行卷积实现方案。针对1×1卷积提出了特征图级多核并行方案;针对卷积...针对国防科技大学自主研发的异构多核数字信号处理(digital signal processing, DSP)芯片的特征以及卷积算法自身特点,提出了一种面向多核DSP架构的高性能多核并行卷积实现方案。针对1×1卷积提出了特征图级多核并行方案;针对卷积核大于1的卷积提出了窗口级多核并行优化设计,同时提出了逐元素向量化计算的核内并行优化实现。实验结果表明,所提并行优化方法实现单核计算效率最高能达到64.95%,在带宽受限情况下,多核并行扩展效率可达到48.36%~88.52%,在典型网络ResNet50上的执行性能与E5-2640 CPU相比,获得了5.39倍性能加速。展开更多
Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design effic...Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design efficient multiple pattern searching algorithms on multi-core architectures.We observe an important feature which indicates that the multiple pattern matching time mainly depends on the number and minimal length of patterns.The multi-core algorithm proposed in this paper leverages this feature to decompose pattern set so that the parallel execution time is minimized.We formulate the problem as an optimal decomposition and scheduling of a pattern set,then propose a heuristic algorithm,which takes advantage of dynamic programming and greedy algorithmic techniques,to solve the optimization problem.Experimental results suggest that our decomposition approach can increase the searching speed by more than 200% on a 4-core AMD Barcelona system.展开更多
基因表达式编程(Gene Expression Programming,GEP)是一种计算量大且通用性强的新型进化算法,其传统计算形式不能充分利用目前主流的多核处理器。为提高算法效率,提出了基于通用多核处理器平台的并行基因表达式编程算法(Parallel Gene E...基因表达式编程(Gene Expression Programming,GEP)是一种计算量大且通用性强的新型进化算法,其传统计算形式不能充分利用目前主流的多核处理器。为提高算法效率,提出了基于通用多核处理器平台的并行基因表达式编程算法(Parallel Gene Expression Programming Based on General Multi-core Processor,PGEP-MP)。主要工作包括:(1)分析通用多核处理器平台下并行基因表达式编程算法的机理;(2)利用MPI和OpenMP混合编程模型设计基于通用多核处理器平台的基因表达式编程算法的粗粒度与细粒度相结合的并行模型;(3)提出改进PGEP-MP算法效率的进化策略;(4)通过对函数挖掘和分类的实验证明,PGEP-MP算法提高了函数挖掘和分类的效率,在并行双核处理器数为4的情况下,PGEP-MP的平均并行加速比分别是传统GEP算法的4.22倍和4.06倍。展开更多
基金Supported by the National Natural Science Foundation of China(No.61071173)
文摘In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, and then several performance optimization techniques which are loop optimization, cache optimization, false sharing optimization, synchronization overhead op-timization, and thread affinity optimization, are used to improve the implementation's performance on multi-core systems. The performance of the parallel SEBVHOS algorithm is analyzed on a dual-core system. The experimental results show that the parallel SEBVHOS algorithm achieves an average of 1.86x speedup. More importantly, our method does not come with additional aliasing artifacts, com-paring to the original SEBVHOS algorithm.
文摘针对国防科技大学自主研发的异构多核数字信号处理(digital signal processing, DSP)芯片的特征以及卷积算法自身特点,提出了一种面向多核DSP架构的高性能多核并行卷积实现方案。针对1×1卷积提出了特征图级多核并行方案;针对卷积核大于1的卷积提出了窗口级多核并行优化设计,同时提出了逐元素向量化计算的核内并行优化实现。实验结果表明,所提并行优化方法实现单核计算效率最高能达到64.95%,在带宽受限情况下,多核并行扩展效率可达到48.36%~88.52%,在典型网络ResNet50上的执行性能与E5-2640 CPU相比,获得了5.39倍性能加速。
基金supported by the National Natural Science Foundation of China under Grant Nos.60803030,60925009,60921002the National Basic Research 973 Program of China under Grant No.2011CB302502
文摘Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion detection.In this paper,we present an attempt to design efficient multiple pattern searching algorithms on multi-core architectures.We observe an important feature which indicates that the multiple pattern matching time mainly depends on the number and minimal length of patterns.The multi-core algorithm proposed in this paper leverages this feature to decompose pattern set so that the parallel execution time is minimized.We formulate the problem as an optimal decomposition and scheduling of a pattern set,then propose a heuristic algorithm,which takes advantage of dynamic programming and greedy algorithmic techniques,to solve the optimization problem.Experimental results suggest that our decomposition approach can increase the searching speed by more than 200% on a 4-core AMD Barcelona system.
文摘基因表达式编程(Gene Expression Programming,GEP)是一种计算量大且通用性强的新型进化算法,其传统计算形式不能充分利用目前主流的多核处理器。为提高算法效率,提出了基于通用多核处理器平台的并行基因表达式编程算法(Parallel Gene Expression Programming Based on General Multi-core Processor,PGEP-MP)。主要工作包括:(1)分析通用多核处理器平台下并行基因表达式编程算法的机理;(2)利用MPI和OpenMP混合编程模型设计基于通用多核处理器平台的基因表达式编程算法的粗粒度与细粒度相结合的并行模型;(3)提出改进PGEP-MP算法效率的进化策略;(4)通过对函数挖掘和分类的实验证明,PGEP-MP算法提高了函数挖掘和分类的效率,在并行双核处理器数为4的情况下,PGEP-MP的平均并行加速比分别是传统GEP算法的4.22倍和4.06倍。