期刊文献+
共找到32篇文章
< 1 2 >
每页显示 20 50 100
A Multithreaded CGRA for Convolutional Neural Network Processing 被引量:1
1
作者 Kota Ando Shinya Takamaeda-Yamazaki +2 位作者 Masayuki Ikebe Tetsuya Asai Masato Motomura 《Circuits and Systems》 2017年第6期149-170,共22页
Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CN... Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers). 展开更多
关键词 CNN Convolutional NEURAL Network DEEP LEARNING multithreaded ARCHITECTURE CGRA
下载PDF
Parallelizable Calculation of Observables Values on Analog Quantum Computer
2
作者 Alexander Soiguine 《Journal of Applied Mathematics and Physics》 2024年第7期2400-2406,共7页
The superiority of hypothetical quantum computers is not due to faster calculations but due to different schemes of calculations running on special hardware. The core of quantum computing follows the way a state of a ... The superiority of hypothetical quantum computers is not due to faster calculations but due to different schemes of calculations running on special hardware. The core of quantum computing follows the way a state of a quantum system is defined when basic things interact with each other. In conventional approach it is implemented through tensor product of qubits. In the geometric algebra formalism simultaneous availability of all the results for non-measured observables is based on the definition of states as points on three-dimensional sphere. 展开更多
关键词 Geometric Algebra Wave Functions ENTANGLEMENT Maxwell Equations Three-Dimensional Sphere States OBSERVABLES Measurements GPU MULTITHREADING OPENCL
下载PDF
Multi-Plate Microbial Monitoring Terminal Based on Raspberry Pi 4B
3
作者 Qirong Luo Xichang Cai Tongyuan Liu 《Journal of Electronic Research and Application》 2024年第3期28-33,共6页
We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,d... We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability. 展开更多
关键词 Raspberry Pi 4B OBJECT-ORIENTED MULTITHREADING Serial port protocol and parsing TCP
下载PDF
An Efficient and Flexible Deterministic Framework for Multithreaded Programs 被引量:1
4
作者 卢凯 周旭 +2 位作者 王小平 Tom Bergan 陈沉 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第1期42-56,共15页
Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, thes... Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system. 展开更多
关键词 DETERMINISM MULTITHREADING FRAMEWORK FLEXIBLE
原文传递
Chip Multithreaded Consistency Model
5
作者 李祖松 郇丹丹 +1 位作者 胡伟武 唐志敏 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第2期298-304,F0003,共8页
Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded co... Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread. 展开更多
关键词 computer architecture GODSON-2 MULTITHREADING memory consistency model event ordering
原文传递
PsmArena:Partitioned Shared Memory for NUMA-Awareness in Multithreaded Scientific Applications
6
作者 Zhang Yang Aiqing Zhang Zeyao Mo 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2021年第3期287-295,共9页
The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory ... The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores. 展开更多
关键词 partitioned shared memory Non-Uniform Memory Access(NUMA) heap manager multithread manycore
原文传递
A Low-Cost and High-Performance Cryptosystem Using Tripling-Oriented Elliptic Curve
7
作者 Mohammad Alkhatib Wafa S.Aldalbahy 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期1807-1831,共25页
Developing a high-performance public key cryptosystem is crucial for numerous modern security applications.The Elliptic Curve Cryptosystem(ECC)has performance and resource-saving advantages compared to other types of ... Developing a high-performance public key cryptosystem is crucial for numerous modern security applications.The Elliptic Curve Cryptosystem(ECC)has performance and resource-saving advantages compared to other types of asymmetric ciphers.However,the sequential design implementation for ECC does not satisfy the current applications’performance requirements.Therefore,several factors should be considered to boost the cryptosystem performance,including the coordinate system,the scalar multiplication algo-rithm,and the elliptic curve form.The tripling-oriented(3DIK)form is imple-mented in this work due to its minimal computational complexity compared to other elliptic curves forms.This experimental study explores the factors playing an important role in ECC performance to determine the best combi-nation that leads to developing high-speed ECC.The proposed cryptosystem uses parallel software implementation to speed up ECC performance.To our knowledge,previous studies have no similar software implementation for 3DIK ECC.Supported by using parallel design,projective coordinates,and a fast scalar multiplication algorithm,the proposed 3DIK ECC improved the speed of the encryption process compared with other counterparts and the usual sequential implementation.The highest performance level for 3DIK ECC was achieved when it was implemented using the Non-Adjacent Form algorithm and homogenous projection.Compared to the costly hardware implementations,the proposed software implementation is cost effective and can be easily adapted to other environments.In addition,the power con-sumption of the proposed ECC is analyzed and compared with other known cryptosystems.thus,the current study presents a detailed overview of the design and implementation of 3DIK ECC. 展开更多
关键词 Security CRYPTOGRAPHY elliptic curves software implement greatation MULTITHREADING
下载PDF
关于提高基于OpenSSL软件性能的研究 被引量:1
8
作者 张妍 许云峰 张焕生 《河北科技大学学报》 CAS 2007年第2期157-161,共5页
OpenSSL是用来开发网络安全软件的一个开源软件包,使用OpenSSL可以缩短软件开发周期,提高软件的运行效率和稳定性。但是目前大部分基于OpenSSL开发的软件并不能完全发挥这个开源软件包所具有的高效率和高稳定性的特点。原因在于这些软... OpenSSL是用来开发网络安全软件的一个开源软件包,使用OpenSSL可以缩短软件开发周期,提高软件的运行效率和稳定性。但是目前大部分基于OpenSSL开发的软件并不能完全发挥这个开源软件包所具有的高效率和高稳定性的特点。原因在于这些软件的开发人员对OpenSSL的高级应用所知甚少。从提高软件运行效率及稳定性的角度对OpenSSL的高级应用进行论述,提出了2种用来提高软件运行效率和稳定性的解决方法。 展开更多
关键词 SSL OPENSSL Multithread 线程安全
下载PDF
Parallelization of a Branch and Bound Algorithm on Multicore Systems 被引量:1
9
作者 Chia-Shin Chung James Flynn Janche Sang 《Journal of Software Engineering and Applications》 2012年第8期621-629,共9页
The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In... The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In this paper, we present an improved sequential algorithm which is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The experimental results show that the proposed scheme exhibits improved performance compared with the algorithm in [1]. More importantly, our method can be easily extended and implemented with lightweight threads to speed up the execution times. Good speedups can be obtained on shared-memory multicore systems. 展开更多
关键词 Parallel Branch and BOUND multithreaded Programming MULTICORE System PERMUTATION FLOWSHOP Software REUSE
下载PDF
Slack-Decode Simultaneously and Redundantly Threaded Architecture 被引量:3
10
作者 杨华 崔刚 +1 位作者 刘宏伟 杨孝宗 《Journal of Donghua University(English Edition)》 EI CAS 2005年第3期1-6,共6页
Slack-Decode Simultaneously and Redundantly Threaded (SD-SRT) is proposed for detecting transient faults in processors. SD-SRT boosts the previously proposed SRT performance via definitely eliminating redundant inst... Slack-Decode Simultaneously and Redundantly Threaded (SD-SRT) is proposed for detecting transient faults in processors. SD-SRT boosts the previously proposed SRT performance via definitely eliminating redundant instructiou fetches. First, the fetch stage is moved out of the Spheres of Replication (SoR), and a unified instruction-fetch-queue (IFQ) is exploited by both the leading and trailing threads. Second, a scheme called slack-decode cooperates with the unified IFQ to harmonize proceeding of the two threads. The simulations show that SD-SRT outperforms original SRT in terms of IPC by 15%, and decreases I-cache access by 42%. Meanwhile, SD-SRT leads to a lessened size and complexity for hardware structures such as load-value-queue and store-buffer. 展开更多
关键词 transient fault redundant multithreading ARCHITECTURE
下载PDF
The Serial Communication Based on Multithreading Technique of Windows 被引量:2
11
作者 Chen Shu-zhen Shi Bo 《Wuhan University Journal of Natural Sciences》 CAS 2000年第3期328-328,共1页
Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multit... Present a kind of method which is used to communicate between serial serial port and peripheral equipment dynamicly and real-time using multithreading technique based on the basic principle of communication and multitasking mechanism in the circumstance of Windows. This method resolves the question of Real-time answering in the serial communication validly, reduces losing rate of data and improves reliability of system. This article presents a general method used in the serial communication which is practical. 展开更多
关键词 MULTITHREADING serial communication real-time query
下载PDF
TRSTR: A Fault-Tolerant Microprocessor Architecture Based on SMT 被引量:1
12
作者 YANGHua CUIGang YANGXiao-zong 《Wuhan University Journal of Natural Sciences》 CAS 2005年第1期51-55,共5页
Based on Simultancous Multithrtading (SMT), we propose a fault-tola antscheme called Tri-modular Redun-danlly and Simultaneously threaded processor with Recovery (TRSTR),TRSTR features as following: First, we introduc... Based on Simultancous Multithrtading (SMT), we propose a fault-tola antscheme called Tri-modular Redun-danlly and Simultaneously threaded processor with Recovery (TRSTR),TRSTR features as following: First, we introduce an arbitrator context into thtconventional SRT(Simultaneous and Redundantly Threaded), which acts as an arbitrator when results from the other twocontexts disagree, or acts as an ordinary thread generally, thus making full use of SMT'sparallelism. Second, we append reconfigurablefeature to sphere of replication in SRT, making it moreflexible for changing demands and situations Third, TRSFR has two working modes: Tri-Simultancouswith Voling (TSV) and Dual-Simultaneous with Arbitrator CDSA), which can switch at will. Finally, inaddition to transient-fault coverage, TRSTR has on-line self-checking and self-recover ingabilities, so as to shield off some permanent faults and reconfigure itself without stopping thecrucial job. improving its reliability and availability. 展开更多
关键词 FAULT-TOLERANT HIGH-PERFORMANCE simultaneous multithreading ARCHITECTURE
下载PDF
Data Processing Middleware in a High-Powered Neutral Beam Injection Control System 被引量:1
13
作者 盛鹏 胡纯栋 +4 位作者 宋士花 刘智民 赵远哲 张小丹 窦少彬 《Plasma Science and Technology》 SCIE EI CAS CSCD 2013年第6期593-598,共6页
A set of data-processing middleware for a high-powered neutral beam injection(NBI) control system is presented in this paper.The middleware,based on TCP/IP and multi-threading technologies,focuses mainly on data pro... A set of data-processing middleware for a high-powered neutral beam injection(NBI) control system is presented in this paper.The middleware,based on TCP/IP and multi-threading technologies,focuses mainly on data processing and transmission.It separates the data processing and compression from data acquisition and storage.It provides universal transmitting interfaces for different software circumstances,such as WinCC,LabView and other measurement systems. The experimental data acquired on Windows,QNX and Linux platforms are processed by the middleware and sent to the monitoring applications.There are three middleware deployment models:serial processing,parallel processing and alternate serial processing.By using these models,the middleware solves real-time data-processing problems on heterogeneous environmental acquisition hardware with different operating systems and data applications. 展开更多
关键词 neutral beam injection control system MIDDLEWARE MULTITHREADING
下载PDF
A spatially triggered dissipative resource distribution policy for SMT processors 被引量:1
14
作者 Hong-zhou CHEN Xue-zeng PAN +2 位作者 Ling-di PING Kui-jun LU Xiao-ping CHEN 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第8期1070-1082,共13页
Programs take on changing behavior at nmtime in a simultaneous multithreading (SMT) environment. How reasonably common resources are distributed among the threads significantly determines the throughput and fairness... Programs take on changing behavior at nmtime in a simultaneous multithreading (SMT) environment. How reasonably common resources are distributed among the threads significantly determines the throughput and fairness performance in SMT processors. Existing resource distribution methods either mainly rely on the front-end fetch policy, or make distribution decisions according to the limited information from the pipeline. It is difficult for them to efficiently catch the various resource requirements of the threads. This work presents a spatially triggered dissipative resource distribution (SDRD) policy for SMT processors, its two parts, the self-organization mechanism that is driven by the real-time instructions per cycle (IPC) performance and the introduction of chaos that tries to control the diversity Of trial resource distributions, work together to supply sustaining resource distribution optimization for changing program behavior. Simulation results show that SDRD with fine-grained diversity controlling is more effective than that with a coarse-grained one. And SDRD benefits much from its two well-coordinated parts, providing potential fairness gains as well as good throughput gains. Meanings and settings of important SDRD parameters are also discussed. 展开更多
关键词 Simultaneous multithreading (SMT) Resource distribution Dynamic optimization Dissipative structures
下载PDF
Design of Control Server Application Software for Neutral Beam Injection System
15
作者 施齐林 胡纯栋 +1 位作者 盛鹏 宋士花 《Plasma Science and Technology》 SCIE EI CAS CSCD 2012年第4期343-346,共4页
For the remote control of a neutral beam injection (NBI) system, a software NBIcsw is developed to work on the control server. It can meet the requirements of data transmission and operation-control between the NBI ... For the remote control of a neutral beam injection (NBI) system, a software NBIcsw is developed to work on the control server. It can meet the requirements of data transmission and operation-control between the NBI measurement and control layer (MCL) and the remote monitoring layer (RML). The NBIcsw runs on a Linux system, developed with client/server (C/S) mode and multithreading technology. It is shown through application that the software is with good efficiency. 展开更多
关键词 NBI OPC SOCKET MULTITHREADING C/S
下载PDF
Design of Timing Synchronization Software on EAST-NBI
16
作者 赵远哲 胡纯栋 +1 位作者 盛鹏 张小丹 《Plasma Science and Technology》 SCIE EI CAS CSCD 2013年第12期1237-1240,共4页
To ensure the uniqueness and recognition of data and make it easy to analyze and process the data of all subsystems of the neutral beam injector (NBI), it is required that all subsystems have a unified system time. ... To ensure the uniqueness and recognition of data and make it easy to analyze and process the data of all subsystems of the neutral beam injector (NBI), it is required that all subsystems have a unified system time. In this paper, the timing synchronization software is presented which is related to many kinds of technologies, such as shared memory, multithreading, TCP protocol and so on. Shared memory helps the server save the information of clients and system time, multithreading can deal with different clients with different threads, the server works under Linux operating system, the client works under Linux operating system and Windows operating system. With the help of this design, synchronization of all subsystems can be achieved in less than one second, and this accuracy is enough for the NBI system and the reliability of data is thus ensured. 展开更多
关键词 EAST NBI timing synchronization shared memory MULTITHREADING SERVER/CLIENT
下载PDF
Key Technology in Telemetry System
17
作者 Zheng Dong Wu Zhi-bin Chen Shu-zhen 《Wuhan University Journal of Natural Sciences》 EI CAS 1999年第4期454-458,共5页
The recent development of telemetry system is driven by the fast development of technology in the field of computer and network. The systematic introduction is provided to: digital video and image processing, network ... The recent development of telemetry system is driven by the fast development of technology in the field of computer and network. The systematic introduction is provided to: digital video and image processing, network communication and the realization of those techniques in computer. 展开更多
关键词 TELEMETRY WINSOCK multithread AVICAP
下载PDF
Improved Tomasulo algorithm
18
作者 崔光佐 胡铭曾 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 1999年第4期16-19,共4页
Tomasulo algorithm, a dynamic scheduling technique designed for float point unit(FPU) to exploit instruction level parallelism for single thread only is improved into T Tomasulo algorithm to support multiple parallel... Tomasulo algorithm, a dynamic scheduling technique designed for float point unit(FPU) to exploit instruction level parallelism for single thread only is improved into T Tomasulo algorithm to support multiple parallel contexts. FPUs can exploit the parallelisms both within single thread and among multiple threads, and FPUs can be used more effieiently. 展开更多
关键词 multithread SUPERSCALAR ARCHITECTURE Tomasulo ALGORITHM dynamic SCHEDULING INSTRUCTION level PARALLELISM
下载PDF
A Perfect Knob to Scale Thread Pool on Runtime
19
作者 Faisal Bahadur Arif Iqbal Umar +3 位作者 Insaf Ullah Fahad Algarni Muhammad Asghar Khan Samih M.Mostafa 《Computers, Materials & Continua》 SCIE EI 2022年第7期1483-1493,共11页
Scalability is one of the utmost nonfunctional requirement of server applications,because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload.In order to achiev... Scalability is one of the utmost nonfunctional requirement of server applications,because it maintains an effective performance parallel to the large fluctuating and sometimes unpredictable workload.In order to achieve scalability,thread pool system(TPS)has been used extensively as a middleware service in server applications.The size of thread pool is the most significant factor,that affects the overall performance of servers.Determining the optimal size of thread pool dynamically on runtime is a challenging problem.The most widely used and simple method to tackle this problem is to keep the size of thread pool equal to the request rate,i.e.,the frequencyoriented thread pool(FOTP).The FOTPs are the most widely used TPSs in the industry,because of the implementation simplicity,the negligible overhead and the capability to use in any system.However,the frequency-based schemes only focused on one aspect of changes in the load,and that is the fluctuations in request rate.The request rate alone is an imperfect knob to scale thread pool.Thus,this paper presents a workload profiling based FOTP,that focuses on request size(service time of request)besides the request rate as a knob to scale thread pool on runtime,because we argue that the combination of both truly represents the load fluctuation in server-side applications.We evaluated the results of the proposed system against state of the art TPS of Oracle Corporation(by a client-server-based simulator)and concluded that our system outperformed in terms of both;the response times and throughput. 展开更多
关键词 SCALABILITY performance MIDDLEWARE workload profiling MULTITHREADING thread pool
下载PDF
Redundant Multithreading Architecture Overview
20
作者 YANG Hua CUI Gang LIU Hongwei YANG Xiaozong 《Wuhan University Journal of Natural Sciences》 CAS 2006年第6期1793-1796,共4页
To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and... To overcome the ever-increasing susceptibility to transient-fault in processors, various redundant multithreading (RMT) architectures have been proposed, which is becoming a most effective approach for detecting and recovering from transient-fault. This paper surveys a wide range of RMT architectures-from the original AR-SMT(A-stream R-stream Simultaneous MultiThreading) to the most-recent SD-SRT (Slack-Decode Simultaneous Redundant Threading), presenting traverse analyses and comparisons among them, and hereby demonstrates its evolution and tendency. Finally, some directions and suggestions are put forward for the further RMT research and development. 展开更多
关键词 redundant multithreading PROCESSOR RELIABILITY
下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部