Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multit...Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multitasking mechanism in the circumstance of Windows. This method resolves the question of Real-time answering in the serial communication validly, reduces losing rate of data and improves reliability of system. This article presents a general method used in the serial communication which is practical.展开更多
Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CN...Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).展开更多
Transient fault detection mechanism is added to simultaneous multithreading architecture. By exploiting both ILP (Instruction Level Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading (SMT) Fa...Transient fault detection mechanism is added to simultaneous multithreading architecture. By exploiting both ILP (Instruction Level Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading (SMT) Fault Tolerance Processor can be expected to achieve better tradeoff between performance and hardware cost than traditional Fault Tolerance Processors. Detailed simulations of 3 of SPEC95 benchmarks show that executing two redundant programs on the fault-tolerant microarchitecture takes only 40%–61%longer than running a single version of the program. The new instruction fetch algorithm enhances the performance by 0.4%~1%to most of the benchmarks we choose randomly.展开更多
To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and...To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development.展开更多
In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware m...In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance.展开更多
The superiority of hypothetical quantum computers is not due to faster calculations but due to different schemes of calculations running on special hardware. The core of quantum computing follows the way a state of a ...The superiority of hypothetical quantum computers is not due to faster calculations but due to different schemes of calculations running on special hardware. The core of quantum computing follows the way a state of a quantum system is defined when basic things interact with each other. In conventional approach it is implemented through tensor product of qubits. In the geometric algebra formalism simultaneous availability of all the results for non-measured observables is based on the definition of states as points on three-dimensional sphere.展开更多
We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,d...We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability.展开更多
在一些较大面积的建筑物内,移动机器人的路径规划算法的效率仍然面临着较大的挑战。针对这类工作场景,提出了一种结合维诺区域分割和路径优化的路径规划算法(Voronoi region segmentation and path optimization,VSO),实现在大规模室内...在一些较大面积的建筑物内,移动机器人的路径规划算法的效率仍然面临着较大的挑战。针对这类工作场景,提出了一种结合维诺区域分割和路径优化的路径规划算法(Voronoi region segmentation and path optimization,VSO),实现在大规模室内场景下的快速路径规划。该算法使用广义维诺图(generalized Voronoi graph,GVG)从地图中构建拓扑图,在拓扑图上可以快速获得初始启发式路径。通过将采样过程约束在初始路径周围的区域,减少了对工作空间的过度探索。在此基础上,选择路径点将采样区域划分为多个子区域,之后在子区域中并行搜索路径来减少搜索空间并提升搜索速度。最后将连接各个子区域内的路径作为结果路径,并使用优化算法来平滑最终路径。仿真实验与机器人实验验证了该算法的实用性与有效性。展开更多
电力监控系统环境中存在过多噪声因素干扰,导致通信效率和质量低。为此,提出一种基于传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)与关联规则的多线程通信算法。将背景噪声看作电力监控信号的突变现...电力监控系统环境中存在过多噪声因素干扰,导致通信效率和质量低。为此,提出一种基于传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)与关联规则的多线程通信算法。将背景噪声看作电力监控信号的突变现象,查找回应突变函数的信号值,利用谐波分离算法,去除背景噪声。根据不同线程的传输特点,采用TCP/IP协议建立通信程序包,分别设置句柄、终止、挂起以及执行函数,为不同线程的通信数据,匹配不同的通信协议。试验结果证明:对电力监控系统源设备的传输信号多线程通信时,通信信号波频变化最为平稳,在0~2000 s的采样区间内,未出现传输为0现象;对背景噪声去噪后,波形相比原始信号变化明显较为稳定,没有出现过高或过低的幅值变化。所提方法通信信号表达平稳、效率较高,对原始信号的保留效果较好,去噪能力很强。展开更多
Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, thes...Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system.展开更多
Thread partition plays an important role in speculative multithreading (SpMT) for automatic parallelization of ir- regular programs. Using unified values of partition parameters to partition different applications l...Thread partition plays an important role in speculative multithreading (SpMT) for automatic parallelization of ir- regular programs. Using unified values of partition parameters to partition different applications leads to the fact that every ap- plication cannot own its optimal partition scheme. In this paper, five parameters affecting thread partition are extracted from heuristic rules. They are the dependence threshold (DT), lower limit of thread size (TSL), upper limit of thread size (TSU), lower limit of spawning distance (SDL), and upper limit of spawning distance (SDU). Their ranges are determined in accordance with heuristic rules, and their step-sizes are set empirically. Under the condition of setting speedup as an objective function, all com- binations of five threshold values form the solution space, and our aim is to search for the best combination to obtain the best thread granularity, thread dependence, and spawning distance, so that every application has its best partition scheme. The issue can be attributed to a single objective optimization problem. We use the artificial immune algorithm (AIA) to search for the optimal solution. On Prophet, which is a generic SpMT processor to evaluate the performance of multithreaded programs, Olden bench- marks are used to implement the process. Experiments show that we can obtain the optimal parameter values for every benchmark, and Olden benchmarks partitioned with the optimized parameter values deliver a performance improvement of 3.00% on a 4-core platform compared with a machine learning based approach, and 8.92% compared with a heuristics-based approach.展开更多
文摘Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multitasking mechanism in the circumstance of Windows. This method resolves the question of Real-time answering in the serial communication validly, reduces losing rate of data and improves reliability of system. This article presents a general method used in the serial communication which is practical.
文摘Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).
基金Supported by the National Natural Science Funda tion of China (60103002)
文摘Transient fault detection mechanism is added to simultaneous multithreading architecture. By exploiting both ILP (Instruction Level Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading (SMT) Fault Tolerance Processor can be expected to achieve better tradeoff between performance and hardware cost than traditional Fault Tolerance Processors. Detailed simulations of 3 of SPEC95 benchmarks show that executing two redundant programs on the fault-tolerant microarchitecture takes only 40%–61%longer than running a single version of the program. The new instruction fetch algorithm enhances the performance by 0.4%~1%to most of the benchmarks we choose randomly.
基金Supported by the National Natural Science Foun-dation of China (60503015)
文摘To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development.
基金supported partially by the National High Technical Research and Development Program of China (863 Program) under Grants No. 2011AA040101, No. 2008AA01Z134the National Natural Science Foundation of China under Grants No. 61003251, No. 61172049, No. 61173150+2 种基金the Doctoral Fund of Ministry of Education of China under Grant No. 20100006110015Beijing Municipal Natural Science Foundation under Grant No. Z111100054011078the 2012 Ladder Plan Project of Beijing Key Laboratory of Knowledge Engineering for Materials Science under Grant No. Z121101002812005
文摘In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance.
文摘The superiority of hypothetical quantum computers is not due to faster calculations but due to different schemes of calculations running on special hardware. The core of quantum computing follows the way a state of a quantum system is defined when basic things interact with each other. In conventional approach it is implemented through tensor product of qubits. In the geometric algebra formalism simultaneous availability of all the results for non-measured observables is based on the definition of states as points on three-dimensional sphere.
文摘We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability.
文摘在一些较大面积的建筑物内,移动机器人的路径规划算法的效率仍然面临着较大的挑战。针对这类工作场景,提出了一种结合维诺区域分割和路径优化的路径规划算法(Voronoi region segmentation and path optimization,VSO),实现在大规模室内场景下的快速路径规划。该算法使用广义维诺图(generalized Voronoi graph,GVG)从地图中构建拓扑图,在拓扑图上可以快速获得初始启发式路径。通过将采样过程约束在初始路径周围的区域,减少了对工作空间的过度探索。在此基础上,选择路径点将采样区域划分为多个子区域,之后在子区域中并行搜索路径来减少搜索空间并提升搜索速度。最后将连接各个子区域内的路径作为结果路径,并使用优化算法来平滑最终路径。仿真实验与机器人实验验证了该算法的实用性与有效性。
文摘电力监控系统环境中存在过多噪声因素干扰,导致通信效率和质量低。为此,提出一种基于传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)与关联规则的多线程通信算法。将背景噪声看作电力监控信号的突变现象,查找回应突变函数的信号值,利用谐波分离算法,去除背景噪声。根据不同线程的传输特点,采用TCP/IP协议建立通信程序包,分别设置句柄、终止、挂起以及执行函数,为不同线程的通信数据,匹配不同的通信协议。试验结果证明:对电力监控系统源设备的传输信号多线程通信时,通信信号波频变化最为平稳,在0~2000 s的采样区间内,未出现传输为0现象;对背景噪声去噪后,波形相比原始信号变化明显较为稳定,没有出现过高或过低的幅值变化。所提方法通信信号表达平稳、效率较高,对原始信号的保留效果较好,去噪能力很强。
基金The work was supported by the National Natural Science Foundation of China under Grant Nos. 61272142, 61103082, 61402492, 61170261, 61103193, the National High Technology Research and Development 863 Program of China under Grant Nos. 2012AA01A301, 2012AA010901, and the Program for New Century Excellent Talents in University of China.
文摘Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system.
基金supported by the National Natural Science Foundation of China(No.61173040)the Doctoral Fund of Ministry of Education of China(No.2013021110012)
文摘Thread partition plays an important role in speculative multithreading (SpMT) for automatic parallelization of ir- regular programs. Using unified values of partition parameters to partition different applications leads to the fact that every ap- plication cannot own its optimal partition scheme. In this paper, five parameters affecting thread partition are extracted from heuristic rules. They are the dependence threshold (DT), lower limit of thread size (TSL), upper limit of thread size (TSU), lower limit of spawning distance (SDL), and upper limit of spawning distance (SDU). Their ranges are determined in accordance with heuristic rules, and their step-sizes are set empirically. Under the condition of setting speedup as an objective function, all com- binations of five threshold values form the solution space, and our aim is to search for the best combination to obtain the best thread granularity, thread dependence, and spawning distance, so that every application has its best partition scheme. The issue can be attributed to a single objective optimization problem. We use the artificial immune algorithm (AIA) to search for the optimal solution. On Prophet, which is a generic SpMT processor to evaluate the performance of multithreaded programs, Olden bench- marks are used to implement the process. Experiments show that we can obtain the optimal parameter values for every benchmark, and Olden benchmarks partitioned with the optimized parameter values deliver a performance improvement of 3.00% on a 4-core platform compared with a machine learning based approach, and 8.92% compared with a heuristics-based approach.