Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CN...Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).展开更多
We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,d...We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability.展开更多
Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, thes...Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system.展开更多
Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded co...Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread.展开更多
The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory ...The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.展开更多
Developing a high-performance public key cryptosystem is crucial for numerous modern security applications.The Elliptic Curve Cryptosystem(ECC)has performance and resource-saving advantages compared to other types of ...Developing a high-performance public key cryptosystem is crucial for numerous modern security applications.The Elliptic Curve Cryptosystem(ECC)has performance and resource-saving advantages compared to other types of asymmetric ciphers.However,the sequential design implementation for ECC does not satisfy the current applications’performance requirements.Therefore,several factors should be considered to boost the cryptosystem performance,including the coordinate system,the scalar multiplication algo-rithm,and the elliptic curve form.The tripling-oriented(3DIK)form is imple-mented in this work due to its minimal computational complexity compared to other elliptic curves forms.This experimental study explores the factors playing an important role in ECC performance to determine the best combi-nation that leads to developing high-speed ECC.The proposed cryptosystem uses parallel software implementation to speed up ECC performance.To our knowledge,previous studies have no similar software implementation for 3DIK ECC.Supported by using parallel design,projective coordinates,and a fast scalar multiplication algorithm,the proposed 3DIK ECC improved the speed of the encryption process compared with other counterparts and the usual sequential implementation.The highest performance level for 3DIK ECC was achieved when it was implemented using the Non-Adjacent Form algorithm and homogenous projection.Compared to the costly hardware implementations,the proposed software implementation is cost effective and can be easily adapted to other environments.In addition,the power con-sumption of the proposed ECC is analyzed and compared with other known cryptosystems.thus,the current study presents a detailed overview of the design and implementation of 3DIK ECC.展开更多
The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In...The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In this paper, we present an improved sequential algorithm which is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The experimental results show that the proposed scheme exhibits improved performance compared with the algorithm in [1]. More importantly, our method can be easily extended and implemented with lightweight threads to speed up the execution times. Good speedups can be obtained on shared-memory multicore systems.展开更多
Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multit...Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multitasking mechanism in the circumstance of Windows. This method resolves the question of Real-time answering in the serial communication validly, reduces losing rate of data and improves reliability of system. This article presents a general method used in the serial communication which is practical.展开更多
A set of data-processing middleware for a high-powered neutral beam injection(NBI) control system is presented in this paper.The middleware,based on TCP/IP and multi-threading technologies,focuses mainly on data pro...A set of data-processing middleware for a high-powered neutral beam injection(NBI) control system is presented in this paper.The middleware,based on TCP/IP and multi-threading technologies,focuses mainly on data processing and transmission.It separates the data processing and compression from data acquisition and storage.It provides universal transmitting interfaces for different software circumstances,such as WinCC,LabView and other measurement systems. The experimental data acquired on Windows,QNX and Linux platforms are processed by the middleware and sent to the monitoring applications.There are three middleware deployment models:serial processing,parallel processing and alternate serial processing.By using these models,the middleware solves real-time data-processing problems on heterogeneous environmental acquisition hardware with different operating systems and data applications.展开更多
Based on Simultancous Multithrtading (SMT), we propose a fault-tola antscheme called Tri-modular Redun-danlly and Simultaneously threaded processor with Recovery (TRSTR),TRSTR features as following: First, we introduc...Based on Simultancous Multithrtading (SMT), we propose a fault-tola antscheme called Tri-modular Redun-danlly and Simultaneously threaded processor with Recovery (TRSTR),TRSTR features as following: First, we introduce an arbitrator context into thtconventional SRT(Simultaneous and Redundantly Threaded), which acts as an arbitrator when results from the other twocontexts disagree, or acts as an ordinary thread generally, thus making full use of SMT'sparallelism. Second, we append reconfigurablefeature to sphere of replication in SRT, making it moreflexible for changing demands and situations Third, TRSFR has two working modes: Tri-Simultancouswith Voling (TSV) and Dual-Simultaneous with Arbitrator CDSA), which can switch at will. Finally, inaddition to transient-fault coverage, TRSTR has on-line self-checking and self-recover ingabilities, so as to shield off some permanent faults and reconfigure itself without stopping thecrucial job. improving its reliability and availability.展开更多
For the remote control of a neutral beam injection (NBI) system, a software NBIcsw is developed to work on the control server. It can meet the requirements of data transmission and operation-control between the NBI ...For the remote control of a neutral beam injection (NBI) system, a software NBIcsw is developed to work on the control server. It can meet the requirements of data transmission and operation-control between the NBI measurement and control layer (MCL) and the remote monitoring layer (RML). The NBIcsw runs on a Linux system, developed with client/server (C/S) mode and multithreading technology. It is shown through application that the software is with good efficiency.展开更多
To ensure the uniqueness and recognition of data and make it easy to analyze and process the data of all subsystems of the neutral beam injector (NBI), it is required that all subsystems have a unified system time. ...To ensure the uniqueness and recognition of data and make it easy to analyze and process the data of all subsystems of the neutral beam injector (NBI), it is required that all subsystems have a unified system time. In this paper, the timing synchronization software is presented which is related to many kinds of technologies, such as shared memory, multithreading, TCP protocol and so on. Shared memory helps the server save the information of clients and system time, multithreading can deal with different clients with different threads, the server works under Linux operating system, the client works under Linux operating system and Windows operating system. With the help of this design, synchronization of all subsystems can be achieved in less than one second, and this accuracy is enough for the NBI system and the reliability of data is thus ensured.展开更多
The recent development of telemetry system is driven by the fast development of technology in the field of computer and network. The systematic introduction is provided to: digital video and image processing, network ...The recent development of telemetry system is driven by the fast development of technology in the field of computer and network. The systematic introduction is provided to: digital video and image processing, network communication and the realization of those techniques in computer.展开更多
Scalability is one of the utmost nonfunctional requirement of server applications,because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload.In order to achiev...Scalability is one of the utmost nonfunctional requirement of server applications,because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload.In order to achieve scalability,thread pool system(TPS)has been used extensively as a middleware service in server applications.The size of thread pool is the most significant factor,that affects the overall performance of servers.Determining the optimal size of thread pool dynamically on runtime is a challenging problem.The most widely used and simple method to tackle this problem is to keep the size of thread pool equal to the request rate,i.e.,the frequencyoriented thread pool(FOTP).The FOTPs are the most widely used TPSs in the industry,because of the implementation simplicity,the negligible overhead and the capability to use in any system.However,the frequency-based schemes only focused on one aspect of changes in the load,and that is the fluctuations in request rate.The request rate alone is an imperfect knob to scale thread pool.Thus,this paper presents a workload profiling based FOTP,that focuses on request size(service time of request)besides the request rate as a knob to scale thread pool on runtime,because we argue that the combination of both truly represents the load fluctuation in server-side applications.We evaluated the results of the proposed system against state of the art TPS of Oracle Corporation(by a client-server-based simulator)and concluded that our system outperformed in terms of both;the response times and throughput.展开更多
To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and...To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development.展开更多
Transient fault detection mechanism is added to simultaneous multithreading architecture. By exploiting both ILP (Instruction Level Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading (SMT) Fa...Transient fault detection mechanism is added to simultaneous multithreading architecture. By exploiting both ILP (Instruction Level Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading (SMT) Fault Tolerance Processor can be expected to achieve better tradeoff between performance and hardware cost than traditional Fault Tolerance Processors. Detailed simulations of 3 of SPEC95 benchmarks show that executing two redundant programs on the fault-tolerant microarchitecture takes only 40%–61%longer than running a single version of the program. The new instruction fetch algorithm enhances the performance by 0.4%~1%to most of the benchmarks we choose randomly.展开更多
With the development of satellite remote sensing technology, more and more requirements are put forward on the timeliness and stability of the satellite weather service system. The FY satellite rainfall estimate day k...With the development of satellite remote sensing technology, more and more requirements are put forward on the timeliness and stability of the satellite weather service system. The FY satellite rainfall estimate day knock off product algorithm runs longer, about 20 minutes, which affects the estimated rainfall product generated timeliness. Research and development of parallel optimization algorithms based on the needs of satellite meteorological services and their effectiveness in practical applications are necessary ways to enhance the high-performance and high-availability capabilities of satellite meteorological services. So aiming at this problem, we started the parallel algorithm research based on the analysis of precipitation estimation algorithm. Firstly, we explained the steps of precipitation estimated date knock off product algorithm;secondly, we analyzed the four main calculation module calculating the amount of algorithms;thirdly, multithreaded parallel algorithm and MPI parallelization was designed. Finally, the multithreaded parallel and MPI parallelization were realized. Experimental results show that the multithreaded parallel and MPI parallelization algorithm could greatly improve the overall degree of computational efficiency. And, MPI parallelization mode has a higher operating efficiency. The performance of parallel processing is closely related to the architecture of the computer. From the perspective of service scheduling and product algorithms, the MPI parallelization approach is adopted to achieve the purpose of improving service quality.展开更多
In order to distinguish and extract the topic information from other interferential information on the BBC news website for the study in social computing,the BBC News Hunter was proposed in this paper.The whole system...In order to distinguish and extract the topic information from other interferential information on the BBC news website for the study in social computing,the BBC News Hunter was proposed in this paper.The whole system consists of 6 subsystems,respectively named:UI,Control,Download,Analysis,Storage and Log.Numerical experiments show that satisfactory results can be obtained from the BBC news website,whose average accuracy as well as efficiency are acceptable.展开更多
文摘Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).
文摘We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability.
基金The work was supported by the National Natural Science Foundation of China under Grant Nos. 61272142, 61103082, 61402492, 61170261, 61103193, the National High Technology Research and Development 863 Program of China under Grant Nos. 2012AA01A301, 2012AA010901, and the Program for New Century Excellent Talents in University of China.
文摘Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system.
基金Supported by the National High Technology Development 863 Program of China(Grant Nos.2007AA01Z114, 2006AA010201)the National Natural Science Foundation of China(Grant Nos.60703017, 60736012, 60325205, 60673146, 60603049)+1 种基金the National Grand Fundamental Research 973 Program of China(Grant Nos.2005CB321601, 2005CB321603)Beijing Natural Science Foundation(Grant No.4072024).
文摘Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread.
基金supported by the National Key Research and Development Program of China(No.2016YFB0201300)。
文摘The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.
文摘Developing a high-performance public key cryptosystem is crucial for numerous modern security applications.The Elliptic Curve Cryptosystem(ECC)has performance and resource-saving advantages compared to other types of asymmetric ciphers.However,the sequential design implementation for ECC does not satisfy the current applications’performance requirements.Therefore,several factors should be considered to boost the cryptosystem performance,including the coordinate system,the scalar multiplication algo-rithm,and the elliptic curve form.The tripling-oriented(3DIK)form is imple-mented in this work due to its minimal computational complexity compared to other elliptic curves forms.This experimental study explores the factors playing an important role in ECC performance to determine the best combi-nation that leads to developing high-speed ECC.The proposed cryptosystem uses parallel software implementation to speed up ECC performance.To our knowledge,previous studies have no similar software implementation for 3DIK ECC.Supported by using parallel design,projective coordinates,and a fast scalar multiplication algorithm,the proposed 3DIK ECC improved the speed of the encryption process compared with other counterparts and the usual sequential implementation.The highest performance level for 3DIK ECC was achieved when it was implemented using the Non-Adjacent Form algorithm and homogenous projection.Compared to the costly hardware implementations,the proposed software implementation is cost effective and can be easily adapted to other environments.In addition,the power con-sumption of the proposed ECC is analyzed and compared with other known cryptosystems.thus,the current study presents a detailed overview of the design and implementation of 3DIK ECC.
文摘The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In this paper, we present an improved sequential algorithm which is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The experimental results show that the proposed scheme exhibits improved performance compared with the algorithm in [1]. More importantly, our method can be easily extended and implemented with lightweight threads to speed up the execution times. Good speedups can be obtained on shared-memory multicore systems.
文摘Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multitasking mechanism in the circumstance of Windows. This method resolves the question of Real-time answering in the serial communication validly, reduces losing rate of data and improves reliability of system. This article presents a general method used in the serial communication which is practical.
基金supported by National Natural Science Foundation of China(No.10875146)
文摘A set of data-processing middleware for a high-powered neutral beam injection(NBI) control system is presented in this paper.The middleware,based on TCP/IP and multi-threading technologies,focuses mainly on data processing and transmission.It separates the data processing and compression from data acquisition and storage.It provides universal transmitting interfaces for different software circumstances,such as WinCC,LabView and other measurement systems. The experimental data acquired on Windows,QNX and Linux platforms are processed by the middleware and sent to the monitoring applications.There are three middleware deployment models:serial processing,parallel processing and alternate serial processing.By using these models,the middleware solves real-time data-processing problems on heterogeneous environmental acquisition hardware with different operating systems and data applications.
基金Supported by the 10th5 Year National Defence Pre Research Project (41316.1.2)
文摘Based on Simultancous Multithrtading (SMT), we propose a fault-tola antscheme called Tri-modular Redun-danlly and Simultaneously threaded processor with Recovery (TRSTR),TRSTR features as following: First, we introduce an arbitrator context into thtconventional SRT(Simultaneous and Redundantly Threaded), which acts as an arbitrator when results from the other twocontexts disagree, or acts as an ordinary thread generally, thus making full use of SMT'sparallelism. Second, we append reconfigurablefeature to sphere of replication in SRT, making it moreflexible for changing demands and situations Third, TRSFR has two working modes: Tri-Simultancouswith Voling (TSV) and Dual-Simultaneous with Arbitrator CDSA), which can switch at will. Finally, inaddition to transient-fault coverage, TRSTR has on-line self-checking and self-recover ingabilities, so as to shield off some permanent faults and reconfigure itself without stopping thecrucial job. improving its reliability and availability.
基金supported by National Natural Science Foundation of China (No. 10875146)
文摘For the remote control of a neutral beam injection (NBI) system, a software NBIcsw is developed to work on the control server. It can meet the requirements of data transmission and operation-control between the NBI measurement and control layer (MCL) and the remote monitoring layer (RML). The NBIcsw runs on a Linux system, developed with client/server (C/S) mode and multithreading technology. It is shown through application that the software is with good efficiency.
基金supported by National Natural Science Foundation of China(No.11075183)the Knowledge Innovation Program of the Chinese Academy of Sciences(the study of neutral beam steady-state operation of the key technical and physical problems)
文摘To ensure the uniqueness and recognition of data and make it easy to analyze and process the data of all subsystems of the neutral beam injector (NBI), it is required that all subsystems have a unified system time. In this paper, the timing synchronization software is presented which is related to many kinds of technologies, such as shared memory, multithreading, TCP protocol and so on. Shared memory helps the server save the information of clients and system time, multithreading can deal with different clients with different threads, the server works under Linux operating system, the client works under Linux operating system and Windows operating system. With the help of this design, synchronization of all subsystems can be achieved in less than one second, and this accuracy is enough for the NBI system and the reliability of data is thus ensured.
文摘The recent development of telemetry system is driven by the fast development of technology in the field of computer and network. The systematic introduction is provided to: digital video and image processing, network communication and the realization of those techniques in computer.
文摘Scalability is one of the utmost nonfunctional requirement of server applications,because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload.In order to achieve scalability,thread pool system(TPS)has been used extensively as a middleware service in server applications.The size of thread pool is the most significant factor,that affects the overall performance of servers.Determining the optimal size of thread pool dynamically on runtime is a challenging problem.The most widely used and simple method to tackle this problem is to keep the size of thread pool equal to the request rate,i.e.,the frequencyoriented thread pool(FOTP).The FOTPs are the most widely used TPSs in the industry,because of the implementation simplicity,the negligible overhead and the capability to use in any system.However,the frequency-based schemes only focused on one aspect of changes in the load,and that is the fluctuations in request rate.The request rate alone is an imperfect knob to scale thread pool.Thus,this paper presents a workload profiling based FOTP,that focuses on request size(service time of request)besides the request rate as a knob to scale thread pool on runtime,because we argue that the combination of both truly represents the load fluctuation in server-side applications.We evaluated the results of the proposed system against state of the art TPS of Oracle Corporation(by a client-server-based simulator)and concluded that our system outperformed in terms of both;the response times and throughput.
基金Supported by the National Natural Science Foun-dation of China (60503015)
文摘To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development.
基金Supported by the National Natural Science Funda tion of China (60103002)
文摘Transient fault detection mechanism is added to simultaneous multithreading architecture. By exploiting both ILP (Instruction Level Parallelism) and TLP (Thread Level Parallelism), Simultaneous Multithreading (SMT) Fault Tolerance Processor can be expected to achieve better tradeoff between performance and hardware cost than traditional Fault Tolerance Processors. Detailed simulations of 3 of SPEC95 benchmarks show that executing two redundant programs on the fault-tolerant microarchitecture takes only 40%–61%longer than running a single version of the program. The new instruction fetch algorithm enhances the performance by 0.4%~1%to most of the benchmarks we choose randomly.
文摘With the development of satellite remote sensing technology, more and more requirements are put forward on the timeliness and stability of the satellite weather service system. The FY satellite rainfall estimate day knock off product algorithm runs longer, about 20 minutes, which affects the estimated rainfall product generated timeliness. Research and development of parallel optimization algorithms based on the needs of satellite meteorological services and their effectiveness in practical applications are necessary ways to enhance the high-performance and high-availability capabilities of satellite meteorological services. So aiming at this problem, we started the parallel algorithm research based on the analysis of precipitation estimation algorithm. Firstly, we explained the steps of precipitation estimated date knock off product algorithm;secondly, we analyzed the four main calculation module calculating the amount of algorithms;thirdly, multithreaded parallel algorithm and MPI parallelization was designed. Finally, the multithreaded parallel and MPI parallelization were realized. Experimental results show that the multithreaded parallel and MPI parallelization algorithm could greatly improve the overall degree of computational efficiency. And, MPI parallelization mode has a higher operating efficiency. The performance of parallel processing is closely related to the architecture of the computer. From the perspective of service scheduling and product algorithms, the MPI parallelization approach is adopted to achieve the purpose of improving service quality.
文摘In order to distinguish and extract the topic information from other interferential information on the BBC news website for the study in social computing,the BBC News Hunter was proposed in this paper.The whole system consists of 6 subsystems,respectively named:UI,Control,Download,Analysis,Storage and Log.Numerical experiments show that satisfactory results can be obtained from the BBC news website,whose average accuracy as well as efficiency are acceptable.