Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of mu...Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of multicast under non-speedup or fixed crosspoint buffer size is addressed. Inspired by the load balance two-stage Birkhoff-von Neumann architecture that can provide 100% throughput for all kinds of unicast traffic, a novel 3-stage architecture, consisting of the first stage for multicast fan-out splitting, the second stage for load balancing, and the last stage for switching (FSLBS) is proposed. And the dedicated multicast fan-out splitting to unicast (M2U) scheduling algorithm is developed for the first stage, while the scheduling algorithms in the last two stages adopt the periodic permutation matrix. FSLBS can achieve 100% throughput for integrated uni- and multicast traffic without speedup employing the dedicated M2U and periodic permutation matrix scheduling algorithm. The operation is theoretically validated adopting the fluid model.展开更多
With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycent...With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycentric”common understanding has a humble beginning.More than three decades ago,the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation.The memory-bounded model was well received even by then.It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing.These include Prof.Kai Hwang’s book“Scalable Parallel Computing”in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law,parallel with the Amdahl’s Law and the Gustafson’s Law.Through the years,the impacts of this model have grown far beyond parallel processing and into the fundamental of computing.In this article,we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue,to stimulate new solutions for big data applications,and to promote data-centric thinking and rethinking.展开更多
The role the quantum entanglement plays in quantum computation speedup has been widely disputed. Some believe that quantum computation's speedup over classical computation is impossible if entan-glement is absent,...The role the quantum entanglement plays in quantum computation speedup has been widely disputed. Some believe that quantum computation's speedup over classical computation is impossible if entan-glement is absent,while others claim that the presence of entanglement is not a necessary condition for some quantum algorithms. This paper discusses this problem systematically. Simulating quantum computation with classical resources is analyzed and entanglement in known algorithms is reviewed. It is concluded that the presence of entanglement is a necessary but not sufficient condition in the pure state or pseudo-pure state quantum computation speedup. The case with the mixed state remains open. Further work on quantum computation will benefit from the presented results.展开更多
针对油浸式变压器2维流-热耦合仿真计算效率低的问题,提出了基于混合有限元法的并行计算方法。首先,在Visual Studio 2019中采用C++语言实现无量纲最小二乘有限元法以及迎风有限元法的串行计算方法。然后,基于图形处理器(graphic proces...针对油浸式变压器2维流-热耦合仿真计算效率低的问题,提出了基于混合有限元法的并行计算方法。首先,在Visual Studio 2019中采用C++语言实现无量纲最小二乘有限元法以及迎风有限元法的串行计算方法。然后,基于图形处理器(graphic processing unit,GPU)实现流体场的并行计算,针对单分区分匝模型对比分析了不同GPU卡在不同网格条件下的并行计算效率,分析结果表明数据规模越大,GPU卡流处理器越多并行效果越好。其次,基于Intel MKL(Intel math kernel library)函数库结合共享存储并行编程(open multi-processing,OpenMP)实现了2维温度场的并行计算,并对比分析了不同网格数量对并行效率的影响。最后,在此基础上提出了根据不同仿真条件的混合并行计算方法,并应用到大型油浸式变压器绕组模型的2维温升热点分析中。结果表明,相较于串行程序,混合有限元并行计算方法的加速比达到了69.5,实验测试结果进一步验证了并行计算结果的准确性,研究成果为大型油浸式变压器流-热耦合问题的快速计算奠定了基础。展开更多
With the rise of image data and increased complexity of tasks in edge detection, conventional artificial intelligence techniques have been severely impacted. To be able to solve even greater problems of the future, le...With the rise of image data and increased complexity of tasks in edge detection, conventional artificial intelligence techniques have been severely impacted. To be able to solve even greater problems of the future, learning algorithms must maintain high speed and accuracy through economical means. Traditional edge detection approaches cannot detect edges in images in a timely manner due to memory and computational time constraints. In this work, a novel parallelized ant colony optimization technique in a distributed framework provided by the Hadoop/Map-Reduce infrastructure is proposed to improve the edge detection capabilities. Moreover, a filtering technique is applied to reduce the noisy background of images to achieve significant improvement in the accuracy of edge detection. Close examinations of the implementation of the proposed algorithm are discussed and demonstrated through experiments. Results reveal high classification accuracy and significant improvements in speedup, scaleup and sizeup compared to the standard algorithms.展开更多
Row fixation is a parallel algorithm based on MPI that can be implemented on high performance computer system. It keeps the characteristics of matrices since row-computations are fixed on different nodes. Therefore t...Row fixation is a parallel algorithm based on MPI that can be implemented on high performance computer system. It keeps the characteristics of matrices since row-computations are fixed on different nodes. Therefore the locality of computation is realized effectively and the acceleration ratio is obtained very well for large scale parallel computations such as solving linear equations using Gaussian reduction method, LU decomposition of matrices and m-th power of matrices.展开更多
文摘Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of multicast under non-speedup or fixed crosspoint buffer size is addressed. Inspired by the load balance two-stage Birkhoff-von Neumann architecture that can provide 100% throughput for all kinds of unicast traffic, a novel 3-stage architecture, consisting of the first stage for multicast fan-out splitting, the second stage for load balancing, and the last stage for switching (FSLBS) is proposed. And the dedicated multicast fan-out splitting to unicast (M2U) scheduling algorithm is developed for the first stage, while the scheduling algorithms in the last two stages adopt the periodic permutation matrix. FSLBS can achieve 100% throughput for integrated uni- and multicast traffic without speedup employing the dedicated M2U and periodic permutation matrix scheduling algorithm. The operation is theoretically validated adopting the fluid model.
基金supported in part by the U.S.National Science Foundation under Grant Nos.CCF-2029014 and CCF-2008907.
文摘With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycentric”common understanding has a humble beginning.More than three decades ago,the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation.The memory-bounded model was well received even by then.It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing.These include Prof.Kai Hwang’s book“Scalable Parallel Computing”in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law,parallel with the Amdahl’s Law and the Gustafson’s Law.Through the years,the impacts of this model have grown far beyond parallel processing and into the fundamental of computing.In this article,we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue,to stimulate new solutions for big data applications,and to promote data-centric thinking and rethinking.
基金Supported by the National Natural Science Foundation for Distinguished Young Scholars of China (Grant No. 60625204)the Key Project of the National Natural Science Foundation of China (Grant No. 60496324)+2 种基金the National 973 Fundamental Research and Development Program of China (Grant No. 2002CB312004)the National 863 High-Tech Project of China (Grant No. 2006AA01Z155)the Knowledge Innovation Program of the Chinese Academy of Sciences and MADIS
文摘The role the quantum entanglement plays in quantum computation speedup has been widely disputed. Some believe that quantum computation's speedup over classical computation is impossible if entan-glement is absent,while others claim that the presence of entanglement is not a necessary condition for some quantum algorithms. This paper discusses this problem systematically. Simulating quantum computation with classical resources is analyzed and entanglement in known algorithms is reviewed. It is concluded that the presence of entanglement is a necessary but not sufficient condition in the pure state or pseudo-pure state quantum computation speedup. The case with the mixed state remains open. Further work on quantum computation will benefit from the presented results.
文摘针对油浸式变压器2维流-热耦合仿真计算效率低的问题,提出了基于混合有限元法的并行计算方法。首先,在Visual Studio 2019中采用C++语言实现无量纲最小二乘有限元法以及迎风有限元法的串行计算方法。然后,基于图形处理器(graphic processing unit,GPU)实现流体场的并行计算,针对单分区分匝模型对比分析了不同GPU卡在不同网格条件下的并行计算效率,分析结果表明数据规模越大,GPU卡流处理器越多并行效果越好。其次,基于Intel MKL(Intel math kernel library)函数库结合共享存储并行编程(open multi-processing,OpenMP)实现了2维温度场的并行计算,并对比分析了不同网格数量对并行效率的影响。最后,在此基础上提出了根据不同仿真条件的混合并行计算方法,并应用到大型油浸式变压器绕组模型的2维温升热点分析中。结果表明,相较于串行程序,混合有限元并行计算方法的加速比达到了69.5,实验测试结果进一步验证了并行计算结果的准确性,研究成果为大型油浸式变压器流-热耦合问题的快速计算奠定了基础。
文摘With the rise of image data and increased complexity of tasks in edge detection, conventional artificial intelligence techniques have been severely impacted. To be able to solve even greater problems of the future, learning algorithms must maintain high speed and accuracy through economical means. Traditional edge detection approaches cannot detect edges in images in a timely manner due to memory and computational time constraints. In this work, a novel parallelized ant colony optimization technique in a distributed framework provided by the Hadoop/Map-Reduce infrastructure is proposed to improve the edge detection capabilities. Moreover, a filtering technique is applied to reduce the noisy background of images to achieve significant improvement in the accuracy of edge detection. Close examinations of the implementation of the proposed algorithm are discussed and demonstrated through experiments. Results reveal high classification accuracy and significant improvements in speedup, scaleup and sizeup compared to the standard algorithms.
文摘Row fixation is a parallel algorithm based on MPI that can be implemented on high performance computer system. It keeps the characteristics of matrices since row-computations are fixed on different nodes. Therefore the locality of computation is realized effectively and the acceleration ratio is obtained very well for large scale parallel computations such as solving linear equations using Gaussian reduction method, LU decomposition of matrices and m-th power of matrices.