Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real\|time using multithreading technique based on the basic principle of commu nication and mult...Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real\|time using multithreading technique based on the basic principle of commu nication and multitasking mechanism in the circumstance of Windows.This method r esolves the question of Real\|time answering in the serial communication validly ,reduces losing rate of data and improves reliability of system.This article pre sents a general method used in the serial communication which is practical.展开更多
Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CN...Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).展开更多
To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and r...To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development.展开更多
Transient fault detection mechanism is added to simultaneous muhithreading architecture. By exploiting both ILP (Instruction l~evel Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading(SMT) Fau...Transient fault detection mechanism is added to simultaneous muhithreading architecture. By exploiting both ILP (Instruction l~evel Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading(SMT) Fault Tolerance Processor can be expected to achieve better tradeoff between performance and hardware cost than traditional Fault Tolerance Processors. Detailed simulations of 3 of SPEC95 benchmarks show that executing two redundant programs on the fault-tolerant microarchitecture takes only 40%-61% longer than running a single version of the program. The new instruction fetch algorithm enhances the performance by 0. 4%~1% to most of the benchmarks we choose randomly.展开更多
在一些较大面积的建筑物内,移动机器人的路径规划算法的效率仍然面临着较大的挑战。针对这类工作场景,提出了一种结合维诺区域分割和路径优化的路径规划算法(Voronoi region segmentation and path optimization,VSO),实现在大规模室内...在一些较大面积的建筑物内,移动机器人的路径规划算法的效率仍然面临着较大的挑战。针对这类工作场景,提出了一种结合维诺区域分割和路径优化的路径规划算法(Voronoi region segmentation and path optimization,VSO),实现在大规模室内场景下的快速路径规划。该算法使用广义维诺图(generalized Voronoi graph,GVG)从地图中构建拓扑图,在拓扑图上可以快速获得初始启发式路径。通过将采样过程约束在初始路径周围的区域,减少了对工作空间的过度探索。在此基础上,选择路径点将采样区域划分为多个子区域,之后在子区域中并行搜索路径来减少搜索空间并提升搜索速度。最后将连接各个子区域内的路径作为结果路径,并使用优化算法来平滑最终路径。仿真实验与机器人实验验证了该算法的实用性与有效性。展开更多
针对航天嵌入式软件(aerospace embedded software,AES)时序需求复杂带来的时序需求定义不准确问题,提出一种基于MARTE(modeling and analysis of real-time and embedded systems)模型的数据流时序(data flow timing based on MARTE,DF...针对航天嵌入式软件(aerospace embedded software,AES)时序需求复杂带来的时序需求定义不准确问题,提出一种基于MARTE(modeling and analysis of real-time and embedded systems)模型的数据流时序(data flow timing based on MARTE,DFT-MARTE)模型,设计基于该模型的处理点缓存计算算法、时序偏离概率检测算法和时序序列分析算法。处理点缓存计算算法动态更新缓存空间,使后续时序检测正常执行;时序偏离概率检测算法利用多线程并发模拟时序特性,检测需求中时序偏离问题;时序序列分析算法是基于梯度下降算法,拟合时序序列,指导用户优化需求。该模型相比传统数据流模型更适用航天嵌入式软件,利于后续开发和维护,具有极高的应用价值。展开更多
The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory ...The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.展开更多
电力监控系统环境中存在过多噪声因素干扰,导致通信效率和质量低。为此,提出一种基于传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)与关联规则的多线程通信算法。将背景噪声看作电力监控信号的突变现...电力监控系统环境中存在过多噪声因素干扰,导致通信效率和质量低。为此,提出一种基于传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)与关联规则的多线程通信算法。将背景噪声看作电力监控信号的突变现象,查找回应突变函数的信号值,利用谐波分离算法,去除背景噪声。根据不同线程的传输特点,采用TCP/IP协议建立通信程序包,分别设置句柄、终止、挂起以及执行函数,为不同线程的通信数据,匹配不同的通信协议。试验结果证明:对电力监控系统源设备的传输信号多线程通信时,通信信号波频变化最为平稳,在0~2000 s的采样区间内,未出现传输为0现象;对背景噪声去噪后,波形相比原始信号变化明显较为稳定,没有出现过高或过低的幅值变化。所提方法通信信号表达平稳、效率较高,对原始信号的保留效果较好,去噪能力很强。展开更多
文摘Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real\|time using multithreading technique based on the basic principle of commu nication and multitasking mechanism in the circumstance of Windows.This method r esolves the question of Real\|time answering in the serial communication validly ,reduces losing rate of data and improves reliability of system.This article pre sents a general method used in the serial communication which is practical.
文摘Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).
基金Supported by the National Natural Science Foun-dation of China (60503015)
文摘To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development.
基金Supported by the National Natural Science Funda tion of China (60103002)
文摘Transient fault detection mechanism is added to simultaneous muhithreading architecture. By exploiting both ILP (Instruction l~evel Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading(SMT) Fault Tolerance Processor can be expected to achieve better tradeoff between performance and hardware cost than traditional Fault Tolerance Processors. Detailed simulations of 3 of SPEC95 benchmarks show that executing two redundant programs on the fault-tolerant microarchitecture takes only 40%-61% longer than running a single version of the program. The new instruction fetch algorithm enhances the performance by 0. 4%~1% to most of the benchmarks we choose randomly.
文摘在一些较大面积的建筑物内,移动机器人的路径规划算法的效率仍然面临着较大的挑战。针对这类工作场景,提出了一种结合维诺区域分割和路径优化的路径规划算法(Voronoi region segmentation and path optimization,VSO),实现在大规模室内场景下的快速路径规划。该算法使用广义维诺图(generalized Voronoi graph,GVG)从地图中构建拓扑图,在拓扑图上可以快速获得初始启发式路径。通过将采样过程约束在初始路径周围的区域,减少了对工作空间的过度探索。在此基础上,选择路径点将采样区域划分为多个子区域,之后在子区域中并行搜索路径来减少搜索空间并提升搜索速度。最后将连接各个子区域内的路径作为结果路径,并使用优化算法来平滑最终路径。仿真实验与机器人实验验证了该算法的实用性与有效性。
文摘针对航天嵌入式软件(aerospace embedded software,AES)时序需求复杂带来的时序需求定义不准确问题,提出一种基于MARTE(modeling and analysis of real-time and embedded systems)模型的数据流时序(data flow timing based on MARTE,DFT-MARTE)模型,设计基于该模型的处理点缓存计算算法、时序偏离概率检测算法和时序序列分析算法。处理点缓存计算算法动态更新缓存空间,使后续时序检测正常执行;时序偏离概率检测算法利用多线程并发模拟时序特性,检测需求中时序偏离问题;时序序列分析算法是基于梯度下降算法,拟合时序序列,指导用户优化需求。该模型相比传统数据流模型更适用航天嵌入式软件,利于后续开发和维护,具有极高的应用价值。
基金supported by the National Key Research and Development Program of China(No.2016YFB0201300)。
文摘The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.
文摘电力监控系统环境中存在过多噪声因素干扰,导致通信效率和质量低。为此,提出一种基于传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)与关联规则的多线程通信算法。将背景噪声看作电力监控信号的突变现象,查找回应突变函数的信号值,利用谐波分离算法,去除背景噪声。根据不同线程的传输特点,采用TCP/IP协议建立通信程序包,分别设置句柄、终止、挂起以及执行函数,为不同线程的通信数据,匹配不同的通信协议。试验结果证明:对电力监控系统源设备的传输信号多线程通信时,通信信号波频变化最为平稳,在0~2000 s的采样区间内,未出现传输为0现象;对背景噪声去噪后,波形相比原始信号变化明显较为稳定,没有出现过高或过低的幅值变化。所提方法通信信号表达平稳、效率较高,对原始信号的保留效果较好,去噪能力很强。
基金The work was supported by the National Natural Science Foundation of China under Grant Nos. 61272142, 61103082, 61402492, 61170261, 61103193, the National High Technology Research and Development 863 Program of China under Grant Nos. 2012AA01A301, 2012AA010901, and the Program for New Century Excellent Talents in University of China.
基金Supported by the National High-Tech Research and Development (863) Program of China (No. 863-300-01-99) and the National Natural Science Foundation of China (No. 60173009)
基金Supported by the National High Technology Development 863 Program of China(Grant Nos.2007AA01Z114, 2006AA010201)the National Natural Science Foundation of China(Grant Nos.60703017, 60736012, 60325205, 60673146, 60603049)+1 种基金the National Grand Fundamental Research 973 Program of China(Grant Nos.2005CB321601, 2005CB321603)Beijing Natural Science Foundation(Grant No.4072024).