Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CN...Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).展开更多
The superiority of hypothetical quantum computers is not due to faster calculations but due to different schemes of calculations running on special hardware. The core of quantum computing follows the way a state of a ...The superiority of hypothetical quantum computers is not due to faster calculations but due to different schemes of calculations running on special hardware. The core of quantum computing follows the way a state of a quantum system is defined when basic things interact with each other. In conventional approach it is implemented through tensor product of qubits. In the geometric algebra formalism simultaneous availability of all the results for non-measured observables is based on the definition of states as points on three-dimensional sphere.展开更多
We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,d...We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability.展开更多
Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, thes...Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system.展开更多
Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded co...Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread.展开更多
The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory ...The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.展开更多
Developing a high-performance public key cryptosystem is crucial for numerous modern security applications.The Elliptic Curve Cryptosystem(ECC)has performance and resource-saving advantages compared to other types of ...Developing a high-performance public key cryptosystem is crucial for numerous modern security applications.The Elliptic Curve Cryptosystem(ECC)has performance and resource-saving advantages compared to other types of asymmetric ciphers.However,the sequential design implementation for ECC does not satisfy the current applications’performance requirements.Therefore,several factors should be considered to boost the cryptosystem performance,including the coordinate system,the scalar multiplication algo-rithm,and the elliptic curve form.The tripling-oriented(3DIK)form is imple-mented in this work due to its minimal computational complexity compared to other elliptic curves forms.This experimental study explores the factors playing an important role in ECC performance to determine the best combi-nation that leads to developing high-speed ECC.The proposed cryptosystem uses parallel software implementation to speed up ECC performance.To our knowledge,previous studies have no similar software implementation for 3DIK ECC.Supported by using parallel design,projective coordinates,and a fast scalar multiplication algorithm,the proposed 3DIK ECC improved the speed of the encryption process compared with other counterparts and the usual sequential implementation.The highest performance level for 3DIK ECC was achieved when it was implemented using the Non-Adjacent Form algorithm and homogenous projection.Compared to the costly hardware implementations,the proposed software implementation is cost effective and can be easily adapted to other environments.In addition,the power con-sumption of the proposed ECC is analyzed and compared with other known cryptosystems.thus,the current study presents a detailed overview of the design and implementation of 3DIK ECC.展开更多
The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In...The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In this paper, we present an improved sequential algorithm which is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The experimental results show that the proposed scheme exhibits improved performance compared with the algorithm in [1]. More importantly, our method can be easily extended and implemented with lightweight threads to speed up the execution times. Good speedups can be obtained on shared-memory multicore systems.展开更多
Slack-Decode Simultaneously and Redundantly Threaded (SD-SRT) is proposed for detecting transient faults in processors. SD-SRT boosts the previously proposed SRT performance via definitely eliminating redundant inst...Slack-Decode Simultaneously and Redundantly Threaded (SD-SRT) is proposed for detecting transient faults in processors. SD-SRT boosts the previously proposed SRT performance via definitely eliminating redundant instructiou fetches. First, the fetch stage is moved out of the Spheres of Replication (SoR), and a unified instruction-fetch-queue (IFQ) is exploited by both the leading and trailing threads. Second, a scheme called slack-decode cooperates with the unified IFQ to harmonize proceeding of the two threads. The simulations show that SD-SRT outperforms original SRT in terms of IPC by 15%, and decreases I-cache access by 42%. Meanwhile, SD-SRT leads to a lessened size and complexity for hardware structures such as load-value-queue and store-buffer.展开更多
Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multit...Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multitasking mechanism in the circumstance of Windows. This method resolves the question of Real-time answering in the serial communication validly, reduces losing rate of data and improves reliability of system. This article presents a general method used in the serial communication which is practical.展开更多
Based on Simultancous Multithrtading (SMT), we propose a fault-tola antscheme called Tri-modular Redun-danlly and Simultaneously threaded processor with Recovery (TRSTR),TRSTR features as following: First, we introduc...Based on Simultancous Multithrtading (SMT), we propose a fault-tola antscheme called Tri-modular Redun-danlly and Simultaneously threaded processor with Recovery (TRSTR),TRSTR features as following: First, we introduce an arbitrator context into thtconventional SRT(Simultaneous and Redundantly Threaded), which acts as an arbitrator when results from the other twocontexts disagree, or acts as an ordinary thread generally, thus making full use of SMT'sparallelism. Second, we append reconfigurablefeature to sphere of replication in SRT, making it moreflexible for changing demands and situations Third, TRSFR has two working modes: Tri-Simultancouswith Voling (TSV) and Dual-Simultaneous with Arbitrator CDSA), which can switch at will. Finally, inaddition to transient-fault coverage, TRSTR has on-line self-checking and self-recover ingabilities, so as to shield off some permanent faults and reconfigure itself without stopping thecrucial job. improving its reliability and availability.展开更多
A set of data-processing middleware for a high-powered neutral beam injection(NBI) control system is presented in this paper.The middleware,based on TCP/IP and multi-threading technologies,focuses mainly on data pro...A set of data-processing middleware for a high-powered neutral beam injection(NBI) control system is presented in this paper.The middleware,based on TCP/IP and multi-threading technologies,focuses mainly on data processing and transmission.It separates the data processing and compression from data acquisition and storage.It provides universal transmitting interfaces for different software circumstances,such as WinCC,LabView and other measurement systems. The experimental data acquired on Windows,QNX and Linux platforms are processed by the middleware and sent to the monitoring applications.There are three middleware deployment models:serial processing,parallel processing and alternate serial processing.By using these models,the middleware solves real-time data-processing problems on heterogeneous environmental acquisition hardware with different operating systems and data applications.展开更多
Programs take on changing behavior at nmtime in a simultaneous multithreading (SMT) environment. How reasonably common resources are distributed among the threads significantly determines the throughput and fairness...Programs take on changing behavior at nmtime in a simultaneous multithreading (SMT) environment. How reasonably common resources are distributed among the threads significantly determines the throughput and fairness performance in SMT processors. Existing resource distribution methods either mainly rely on the front-end fetch policy, or make distribution decisions according to the limited information from the pipeline. It is difficult for them to efficiently catch the various resource requirements of the threads. This work presents a spatially triggered dissipative resource distribution (SDRD) policy for SMT processors, its two parts, the self-organization mechanism that is driven by the real-time instructions per cycle (IPC) performance and the introduction of chaos that tries to control the diversity Of trial resource distributions, work together to supply sustaining resource distribution optimization for changing program behavior. Simulation results show that SDRD with fine-grained diversity controlling is more effective than that with a coarse-grained one. And SDRD benefits much from its two well-coordinated parts, providing potential fairness gains as well as good throughput gains. Meanings and settings of important SDRD parameters are also discussed.展开更多
For the remote control of a neutral beam injection (NBI) system, a software NBIcsw is developed to work on the control server. It can meet the requirements of data transmission and operation-control between the NBI ...For the remote control of a neutral beam injection (NBI) system, a software NBIcsw is developed to work on the control server. It can meet the requirements of data transmission and operation-control between the NBI measurement and control layer (MCL) and the remote monitoring layer (RML). The NBIcsw runs on a Linux system, developed with client/server (C/S) mode and multithreading technology. It is shown through application that the software is with good efficiency.展开更多
To ensure the uniqueness and recognition of data and make it easy to analyze and process the data of all subsystems of the neutral beam injector (NBI), it is required that all subsystems have a unified system time. ...To ensure the uniqueness and recognition of data and make it easy to analyze and process the data of all subsystems of the neutral beam injector (NBI), it is required that all subsystems have a unified system time. In this paper, the timing synchronization software is presented which is related to many kinds of technologies, such as shared memory, multithreading, TCP protocol and so on. Shared memory helps the server save the information of clients and system time, multithreading can deal with different clients with different threads, the server works under Linux operating system, the client works under Linux operating system and Windows operating system. With the help of this design, synchronization of all subsystems can be achieved in less than one second, and this accuracy is enough for the NBI system and the reliability of data is thus ensured.展开更多
The recent development of telemetry system is driven by the fast development of technology in the field of computer and network. The systematic introduction is provided to: digital video and image processing, network ...The recent development of telemetry system is driven by the fast development of technology in the field of computer and network. The systematic introduction is provided to: digital video and image processing, network communication and the realization of those techniques in computer.展开更多
Tomasulo algorithm, a dynamic scheduling technique designed for float point unit(FPU) to exploit instruction level parallelism for single thread only is improved into T Tomasulo algorithm to support multiple parallel...Tomasulo algorithm, a dynamic scheduling technique designed for float point unit(FPU) to exploit instruction level parallelism for single thread only is improved into T Tomasulo algorithm to support multiple parallel contexts. FPUs can exploit the parallelisms both within single thread and among multiple threads, and FPUs can be used more effieiently.展开更多
Scalability is one of the utmost nonfunctional requirement of server applications,because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload.In order to achiev...Scalability is one of the utmost nonfunctional requirement of server applications,because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload.In order to achieve scalability,thread pool system(TPS)has been used extensively as a middleware service in server applications.The size of thread pool is the most significant factor,that affects the overall performance of servers.Determining the optimal size of thread pool dynamically on runtime is a challenging problem.The most widely used and simple method to tackle this problem is to keep the size of thread pool equal to the request rate,i.e.,the frequencyoriented thread pool(FOTP).The FOTPs are the most widely used TPSs in the industry,because of the implementation simplicity,the negligible overhead and the capability to use in any system.However,the frequency-based schemes only focused on one aspect of changes in the load,and that is the fluctuations in request rate.The request rate alone is an imperfect knob to scale thread pool.Thus,this paper presents a workload profiling based FOTP,that focuses on request size(service time of request)besides the request rate as a knob to scale thread pool on runtime,because we argue that the combination of both truly represents the load fluctuation in server-side applications.We evaluated the results of the proposed system against state of the art TPS of Oracle Corporation(by a client-server-based simulator)and concluded that our system outperformed in terms of both;the response times and throughput.展开更多
To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and...To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development.展开更多
文摘Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).
文摘The superiority of hypothetical quantum computers is not due to faster calculations but due to different schemes of calculations running on special hardware. The core of quantum computing follows the way a state of a quantum system is defined when basic things interact with each other. In conventional approach it is implemented through tensor product of qubits. In the geometric algebra formalism simultaneous availability of all the results for non-measured observables is based on the definition of states as points on three-dimensional sphere.
文摘We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability.
基金The work was supported by the National Natural Science Foundation of China under Grant Nos. 61272142, 61103082, 61402492, 61170261, 61103193, the National High Technology Research and Development 863 Program of China under Grant Nos. 2012AA01A301, 2012AA010901, and the Program for New Century Excellent Talents in University of China.
文摘Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system.
基金Supported by the National High Technology Development 863 Program of China(Grant Nos.2007AA01Z114, 2006AA010201)the National Natural Science Foundation of China(Grant Nos.60703017, 60736012, 60325205, 60673146, 60603049)+1 种基金the National Grand Fundamental Research 973 Program of China(Grant Nos.2005CB321601, 2005CB321603)Beijing Natural Science Foundation(Grant No.4072024).
文摘Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread.
基金supported by the National Key Research and Development Program of China(No.2016YFB0201300)。
文摘The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.
文摘Developing a high-performance public key cryptosystem is crucial for numerous modern security applications.The Elliptic Curve Cryptosystem(ECC)has performance and resource-saving advantages compared to other types of asymmetric ciphers.However,the sequential design implementation for ECC does not satisfy the current applications’performance requirements.Therefore,several factors should be considered to boost the cryptosystem performance,including the coordinate system,the scalar multiplication algo-rithm,and the elliptic curve form.The tripling-oriented(3DIK)form is imple-mented in this work due to its minimal computational complexity compared to other elliptic curves forms.This experimental study explores the factors playing an important role in ECC performance to determine the best combi-nation that leads to developing high-speed ECC.The proposed cryptosystem uses parallel software implementation to speed up ECC performance.To our knowledge,previous studies have no similar software implementation for 3DIK ECC.Supported by using parallel design,projective coordinates,and a fast scalar multiplication algorithm,the proposed 3DIK ECC improved the speed of the encryption process compared with other counterparts and the usual sequential implementation.The highest performance level for 3DIK ECC was achieved when it was implemented using the Non-Adjacent Form algorithm and homogenous projection.Compared to the costly hardware implementations,the proposed software implementation is cost effective and can be easily adapted to other environments.In addition,the power con-sumption of the proposed ECC is analyzed and compared with other known cryptosystems.thus,the current study presents a detailed overview of the design and implementation of 3DIK ECC.
文摘The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In this paper, we present an improved sequential algorithm which is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The experimental results show that the proposed scheme exhibits improved performance compared with the algorithm in [1]. More importantly, our method can be easily extended and implemented with lightweight threads to speed up the execution times. Good speedups can be obtained on shared-memory multicore systems.
文摘Slack-Decode Simultaneously and Redundantly Threaded (SD-SRT) is proposed for detecting transient faults in processors. SD-SRT boosts the previously proposed SRT performance via definitely eliminating redundant instructiou fetches. First, the fetch stage is moved out of the Spheres of Replication (SoR), and a unified instruction-fetch-queue (IFQ) is exploited by both the leading and trailing threads. Second, a scheme called slack-decode cooperates with the unified IFQ to harmonize proceeding of the two threads. The simulations show that SD-SRT outperforms original SRT in terms of IPC by 15%, and decreases I-cache access by 42%. Meanwhile, SD-SRT leads to a lessened size and complexity for hardware structures such as load-value-queue and store-buffer.
文摘Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multitasking mechanism in the circumstance of Windows. This method resolves the question of Real-time answering in the serial communication validly, reduces losing rate of data and improves reliability of system. This article presents a general method used in the serial communication which is practical.
基金Supported by the 10th5 Year National Defence Pre Research Project (41316.1.2)
文摘Based on Simultancous Multithrtading (SMT), we propose a fault-tola antscheme called Tri-modular Redun-danlly and Simultaneously threaded processor with Recovery (TRSTR),TRSTR features as following: First, we introduce an arbitrator context into thtconventional SRT(Simultaneous and Redundantly Threaded), which acts as an arbitrator when results from the other twocontexts disagree, or acts as an ordinary thread generally, thus making full use of SMT'sparallelism. Second, we append reconfigurablefeature to sphere of replication in SRT, making it moreflexible for changing demands and situations Third, TRSFR has two working modes: Tri-Simultancouswith Voling (TSV) and Dual-Simultaneous with Arbitrator CDSA), which can switch at will. Finally, inaddition to transient-fault coverage, TRSTR has on-line self-checking and self-recover ingabilities, so as to shield off some permanent faults and reconfigure itself without stopping thecrucial job. improving its reliability and availability.
基金supported by National Natural Science Foundation of China(No.10875146)
文摘A set of data-processing middleware for a high-powered neutral beam injection(NBI) control system is presented in this paper.The middleware,based on TCP/IP and multi-threading technologies,focuses mainly on data processing and transmission.It separates the data processing and compression from data acquisition and storage.It provides universal transmitting interfaces for different software circumstances,such as WinCC,LabView and other measurement systems. The experimental data acquired on Windows,QNX and Linux platforms are processed by the middleware and sent to the monitoring applications.There are three middleware deployment models:serial processing,parallel processing and alternate serial processing.By using these models,the middleware solves real-time data-processing problems on heterogeneous environmental acquisition hardware with different operating systems and data applications.
基金the Hi-Tech Research and Development Pro-gram (863) of China (No. 2006AA01Z431) the Key Science andTechnology Program of Zhejiang Province (Nos. 2007C11068 and2007C11088), China
文摘Programs take on changing behavior at nmtime in a simultaneous multithreading (SMT) environment. How reasonably common resources are distributed among the threads significantly determines the throughput and fairness performance in SMT processors. Existing resource distribution methods either mainly rely on the front-end fetch policy, or make distribution decisions according to the limited information from the pipeline. It is difficult for them to efficiently catch the various resource requirements of the threads. This work presents a spatially triggered dissipative resource distribution (SDRD) policy for SMT processors, its two parts, the self-organization mechanism that is driven by the real-time instructions per cycle (IPC) performance and the introduction of chaos that tries to control the diversity Of trial resource distributions, work together to supply sustaining resource distribution optimization for changing program behavior. Simulation results show that SDRD with fine-grained diversity controlling is more effective than that with a coarse-grained one. And SDRD benefits much from its two well-coordinated parts, providing potential fairness gains as well as good throughput gains. Meanings and settings of important SDRD parameters are also discussed.
基金supported by National Natural Science Foundation of China (No. 10875146)
文摘For the remote control of a neutral beam injection (NBI) system, a software NBIcsw is developed to work on the control server. It can meet the requirements of data transmission and operation-control between the NBI measurement and control layer (MCL) and the remote monitoring layer (RML). The NBIcsw runs on a Linux system, developed with client/server (C/S) mode and multithreading technology. It is shown through application that the software is with good efficiency.
基金supported by National Natural Science Foundation of China(No.11075183)the Knowledge Innovation Program of the Chinese Academy of Sciences(the study of neutral beam steady-state operation of the key technical and physical problems)
文摘To ensure the uniqueness and recognition of data and make it easy to analyze and process the data of all subsystems of the neutral beam injector (NBI), it is required that all subsystems have a unified system time. In this paper, the timing synchronization software is presented which is related to many kinds of technologies, such as shared memory, multithreading, TCP protocol and so on. Shared memory helps the server save the information of clients and system time, multithreading can deal with different clients with different threads, the server works under Linux operating system, the client works under Linux operating system and Windows operating system. With the help of this design, synchronization of all subsystems can be achieved in less than one second, and this accuracy is enough for the NBI system and the reliability of data is thus ensured.
文摘The recent development of telemetry system is driven by the fast development of technology in the field of computer and network. The systematic introduction is provided to: digital video and image processing, network communication and the realization of those techniques in computer.
文摘Tomasulo algorithm, a dynamic scheduling technique designed for float point unit(FPU) to exploit instruction level parallelism for single thread only is improved into T Tomasulo algorithm to support multiple parallel contexts. FPUs can exploit the parallelisms both within single thread and among multiple threads, and FPUs can be used more effieiently.
文摘Scalability is one of the utmost nonfunctional requirement of server applications,because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload.In order to achieve scalability,thread pool system(TPS)has been used extensively as a middleware service in server applications.The size of thread pool is the most significant factor,that affects the overall performance of servers.Determining the optimal size of thread pool dynamically on runtime is a challenging problem.The most widely used and simple method to tackle this problem is to keep the size of thread pool equal to the request rate,i.e.,the frequencyoriented thread pool(FOTP).The FOTPs are the most widely used TPSs in the industry,because of the implementation simplicity,the negligible overhead and the capability to use in any system.However,the frequency-based schemes only focused on one aspect of changes in the load,and that is the fluctuations in request rate.The request rate alone is an imperfect knob to scale thread pool.Thus,this paper presents a workload profiling based FOTP,that focuses on request size(service time of request)besides the request rate as a knob to scale thread pool on runtime,because we argue that the combination of both truly represents the load fluctuation in server-side applications.We evaluated the results of the proposed system against state of the art TPS of Oracle Corporation(by a client-server-based simulator)and concluded that our system outperformed in terms of both;the response times and throughput.
基金Supported by the National Natural Science Foun-dation of China (60503015)
文摘To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development.