Graph processing has been widely used in many scenarios,from scientific computing to artificial intelligence.Graph processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ...Graph processing has been widely used in many scenarios,from scientific computing to artificial intelligence.Graph processing exhibits irregular computational parallelism and random memory accesses,unlike traditional workloads.Therefore,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph algorithm.While domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software codesigns.This article presents a graph processing ecosystem from hardware to software.We start by introducing a series of hardware accelerators as the foundation of this ecosystem.Subsequently,the codesigned parallel graph systems and their distributed techniques are presented to support graph applications.Finally,we introduce our efforts on novel graph applications and hardware architectures.Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.展开更多
Hardware/software partitioning is an essential step in hardware/software co-design.For large size problems,it is difficult to consider both solution quality and time.This paper presents an efficient GPU-based parallel...Hardware/software partitioning is an essential step in hardware/software co-design.For large size problems,it is difficult to consider both solution quality and time.This paper presents an efficient GPU-based parallel tabu search algorithm(GPTS)for HW/SW partitioning.A single GPU kernel of compacting neighborhood is proposed to reduce the amount of GPU global memory accesses theoretically.A kernel fusion strategy is further proposed to reduce the amount of GPU global memory accesses of GPTS.To further minimize the transfer overhead of GPTS between CPU and GPU,an optimized transfer strategy for GPU-based tabu evaluation is proposed,which considers that all the candidates do not satisfy the given constraint.Experiments show that GPTS outperforms state-of-the-art work of tabu search and is competitive with other methods for HW/SW partitioning.The proposed parallelization is significant when considering the ordinary GPU platform.展开更多
Ethernet over SDH/SONET (EOS) is a hotspot in today's data transmission technology for it combines the merits of both Ethernet and SDH/SONET. However, implementing an EOS system on a chip is complex and needs full...Ethernet over SDH/SONET (EOS) is a hotspot in today's data transmission technology for it combines the merits of both Ethernet and SDH/SONET. However, implementing an EOS system on a chip is complex and needs full verifications. This paper introduces our design of Hardware/Software co-verification platform for EOS design. The hardware platform contains a microprocessor board and an FPGA (Field Programmable Gate Array)-based verification board, and the corresponding software includes test benches running in FPGAs, controlling programs for the microprocessor and a console program with GUI (Graphical User Interface) interface for configuration, management and supervision. The design is cost-effective and has been successfully employed to verify several IP (Intellectual Property) blocks of our EOS chip. Moreover, it is flexible and can be applied as a general-purpose verification platform.展开更多
In this paper, the storage capacity of communication among cores and processors is taken into account and a maximum D-value-first algorithm is proposed. By improving the hardware parallelism in the task execution proc...In this paper, the storage capacity of communication among cores and processors is taken into account and a maximum D-value-first algorithm is proposed. By improving the hardware parallelism in the task execution process, the maximum storage requirements for communication are minimized. Experimental results with various directed acyclic graph models showed that compared with the earliest-task-first algorithm, the storage requirements for communication were reduced by 22.46%, on average, while the average of makespan only increased by 0.82%,.展开更多
This paper presents an algorithm that combines the chaos optimization algorithm with the maximum entropy ( COA-ME) by using entropy model based on chaos algorithm,in which the maximum entropy is used as the second met...This paper presents an algorithm that combines the chaos optimization algorithm with the maximum entropy ( COA-ME) by using entropy model based on chaos algorithm,in which the maximum entropy is used as the second method of searching the excellent solution. The search direction is improved by chaos optimization algorithm and realizes the selective acceptance of wrong solution. The experimental result shows that the presented algorithm can be used in the partitioning of hardware/software of reconfigurable system. It effectively reduces the local extremum problem,and search speed as well as performance of partitioning is improved.展开更多
In order to improve the efficiency of embedded software running on processor core, this paper proposes a hard-ware/software co-optimization approach for embedded software from the system point of view. The proposed st...In order to improve the efficiency of embedded software running on processor core, this paper proposes a hard-ware/software co-optimization approach for embedded software from the system point of view. The proposed stepwise methods aim at exploiting the structure and the resources of the processor as much as possible for software algorithm optimization. To achieve low memory usage and low frequency need for the same performance, this co-optimization approach was used to optimize embedded software of MP3 decoder based on a 16-bit fixed-point DSP core. After the optimization, the results of decoding 128 kbps, 44.1 kHz stereo MP3 on DSP evaluation platform need 45.9 MIPS and 20.4 kbytes memory space. The optimization rate achieves 65.6% for memory and 49.6% for frequency respectively compared with the results by compiler using floating-point computation. The experimental result indicates the availability of the hardware/software co-optimization approach depending on the algorithm and architecture.展开更多
This paper deals with a new hardware/software embedded system design methodology based on design pattern approach by development of a new design tool called smartcell. Three main constraints of embedded systems design...This paper deals with a new hardware/software embedded system design methodology based on design pattern approach by development of a new design tool called smartcell. Three main constraints of embedded systems design process are investigated: the complexity, the partitioning between hardware and software aspects and the reusability. Two intermediate models are carried out in order to solve the complexity problem. The partitioning problem deals with the proposed hardware/software partitioning algorithm based on Ant Colony Optimisation. The reusability problem is resolved by synthesis of intellectual property blocks. Specification and integration of an intelligent controller on heterogeneous platform are considered to illustrate the proposed approach.展开更多
We present a simulation framework for wireless sensor networks developed to allow the design exploration and the complete microprocessor-instruction-level debug of network formation, data congestion, nodes interaction...We present a simulation framework for wireless sensor networks developed to allow the design exploration and the complete microprocessor-instruction-level debug of network formation, data congestion, nodes interaction, all in one simulation environment. A specifically innovative feature is the co-emulation of selected nodes at clock-cycle-accurate hardware processing level, allowing code debug and exact execution latency evaluation (considering both protocol stack and application), together with other nodes at abstract protocol level, meeting a designer’s needs of simulation speed, scalability and reliability. The simulator is centered on the Zigbee protocol and can be retargeted for different node micro-architectures.展开更多
The functions and characteristics of software radio are discussed. Using techniques and method of software radio, the concept and advantages of a new kind of radio fuze, software radio fuze, are analysed. Several kind...The functions and characteristics of software radio are discussed. Using techniques and method of software radio, the concept and advantages of a new kind of radio fuze, software radio fuze, are analysed. Several kinds of hardware platform structures of the software radio fuze are studied and the key techniques are analysed. The software radio fuze will become the most promising radio fuze techniques in 21st century.展开更多
With the introduction of software defined hardware by DARPA Electronics Resurgence Initiative,software definition will be the basic attribute of information system.Benefiting from boundary certainty and algorithm aggr...With the introduction of software defined hardware by DARPA Electronics Resurgence Initiative,software definition will be the basic attribute of information system.Benefiting from boundary certainty and algorithm aggregation of domain applications,domain-oriented computing architecture has become the technical direction that considers the high flexibility and efficiency of information system.Aiming at the characteristics of data-intensive computing in different scenarios such as Internet of Things(IoT),big data,artificial intelligence(AI),this paper presents a domain-oriented software defined computing architecture,discusses the hierarchical interconnection structure,hybrid granularity computing element and its computational kernel extraction method,finally proves the flexibility and high efficiency of this architecture by experimental comparison.展开更多
Software can be seen almost everywhere and is now defining the world.Software has transitioned from an affiliate of hardware,to a network service that is present in every corner of our social lives.
Nowadays, from home monitoring to large airport security, a lot of digital video surveillance systems have been used. Digital surveillance system usually requires streaming video processing abilities. As an advanced v...Nowadays, from home monitoring to large airport security, a lot of digital video surveillance systems have been used. Digital surveillance system usually requires streaming video processing abilities. As an advanced video coding method, H.264 is introduced to reduce the large video data dramatically (usually by 70X or more). However, computational overhead occurs when coding and decoding H.264 video. In this paper, a System-on-a-Chip (SoC) based hardware acceleration solution for video codec is proposed, which can also be used for other software applications. The characteristics of the video codec are analyzed by using the profiling tool. The Hadamard function, which is the bottleneck of H.264, is identified not only by execution time but also another two attributes, such as cycle per loop and loop round. The Co-processor approach is applied to accelerate the Hadamard function by transforming it to hardware. Performance improvement, resource costs and energy consumption are compared and analyzed. Experimental results indicate that 76.5% energy deduction and 8.09X speedup can be reached after balancing these three key factors.展开更多
随着电子计算机的日益普及,今天在美国由—ware这个构词成分所组成的词屡屡可见。据美国哈佛大学的教育学教授 Howard Gardner声称,在近期召开的一次有关人工智能的学术会议之后,—ware词突然变得风靡起来。一位长期从事人类记忆研究的...随着电子计算机的日益普及,今天在美国由—ware这个构词成分所组成的词屡屡可见。据美国哈佛大学的教育学教授 Howard Gardner声称,在近期召开的一次有关人工智能的学术会议之后,—ware词突然变得风靡起来。一位长期从事人类记忆研究的神经生物家甚至说,他觉得自已还不能适应这种情况,完全成了“a student of wetware among thecomputer hackers”(处在电子计算机专家中间一名学生)。从80年代初开始,wetware一词便被用来指 human brain。本来也可以用另外几个词:skullware,grayware(gray展开更多
为提高造纸机真空部的自动化控制水平,提升造纸企业的成本控制能力,基于PLC和真空度传感器等硬件对变频器功率实施控制,进而实现对水泵和真空泵的动态调整,使水流量和变频器功率维持在较低水平,降低造纸机干燥部运行能耗。为实现该系统...为提高造纸机真空部的自动化控制水平,提升造纸企业的成本控制能力,基于PLC和真空度传感器等硬件对变频器功率实施控制,进而实现对水泵和真空泵的动态调整,使水流量和变频器功率维持在较低水平,降低造纸机干燥部运行能耗。为实现该系统的工业应用,提出了纸机能耗控制系统的主程序流程,同时采用WinCC(Windows Control Center,视窗控制中心)组态软件为上位机设计人机交互界面。该操作界面简洁美观、交互友好,可帮助车间工作人员实时掌握纸机真空部运行数据,合理调整真空部变频器运行功率,实现低能耗生产。展开更多
基金supported by the National Key Research and Development Program of China under Grant No.2023YFB4502300.
文摘Graph processing has been widely used in many scenarios,from scientific computing to artificial intelligence.Graph processing exhibits irregular computational parallelism and random memory accesses,unlike traditional workloads.Therefore,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph algorithm.While domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software codesigns.This article presents a graph processing ecosystem from hardware to software.We start by introducing a series of hardware accelerators as the foundation of this ecosystem.Subsequently,the codesigned parallel graph systems and their distributed techniques are presented to support graph applications.Finally,we introduce our efforts on novel graph applications and hardware architectures.Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.
基金This paper was supported by the National Natural Science Foundation of China(Grant No.61472289)National Key Research and Development Project(2016YFC0106305).We also would like to thank the anonymous reviewers for their valuable and constructive comments.
文摘Hardware/software partitioning is an essential step in hardware/software co-design.For large size problems,it is difficult to consider both solution quality and time.This paper presents an efficient GPU-based parallel tabu search algorithm(GPTS)for HW/SW partitioning.A single GPU kernel of compacting neighborhood is proposed to reduce the amount of GPU global memory accesses theoretically.A kernel fusion strategy is further proposed to reduce the amount of GPU global memory accesses of GPTS.To further minimize the transfer overhead of GPTS between CPU and GPU,an optimized transfer strategy for GPU-based tabu evaluation is proposed,which considers that all the candidates do not satisfy the given constraint.Experiments show that GPTS outperforms state-of-the-art work of tabu search and is competitive with other methods for HW/SW partitioning.The proposed parallelization is significant when considering the ordinary GPU platform.
文摘Ethernet over SDH/SONET (EOS) is a hotspot in today's data transmission technology for it combines the merits of both Ethernet and SDH/SONET. However, implementing an EOS system on a chip is complex and needs full verifications. This paper introduces our design of Hardware/Software co-verification platform for EOS design. The hardware platform contains a microprocessor board and an FPGA (Field Programmable Gate Array)-based verification board, and the corresponding software includes test benches running in FPGAs, controlling programs for the microprocessor and a console program with GUI (Graphical User Interface) interface for configuration, management and supervision. The design is cost-effective and has been successfully employed to verify several IP (Intellectual Property) blocks of our EOS chip. Moreover, it is flexible and can be applied as a general-purpose verification platform.
基金Supported by the National Natural Science Foundation of China(No.61179045 and No.61350009)
文摘In this paper, the storage capacity of communication among cores and processors is taken into account and a maximum D-value-first algorithm is proposed. By improving the hardware parallelism in the task execution process, the maximum storage requirements for communication are minimized. Experimental results with various directed acyclic graph models showed that compared with the earliest-task-first algorithm, the storage requirements for communication were reduced by 22.46%, on average, while the average of makespan only increased by 0.82%,.
基金Sponsored by the Natural Science Foundation of Heilongjiang Province( Grant No B2007-07)Industrial Research Projects in Qiqihaer( Grant No GYGG-09009)
文摘This paper presents an algorithm that combines the chaos optimization algorithm with the maximum entropy ( COA-ME) by using entropy model based on chaos algorithm,in which the maximum entropy is used as the second method of searching the excellent solution. The search direction is improved by chaos optimization algorithm and realizes the selective acceptance of wrong solution. The experimental result shows that the presented algorithm can be used in the partitioning of hardware/software of reconfigurable system. It effectively reduces the local extremum problem,and search speed as well as performance of partitioning is improved.
基金Project supported by the Key-Tech Program of Zhejiang Province,China (No. 021101559), and the Fok Ying Tong Education Founda-tion (No. 94031), China
文摘In order to improve the efficiency of embedded software running on processor core, this paper proposes a hard-ware/software co-optimization approach for embedded software from the system point of view. The proposed stepwise methods aim at exploiting the structure and the resources of the processor as much as possible for software algorithm optimization. To achieve low memory usage and low frequency need for the same performance, this co-optimization approach was used to optimize embedded software of MP3 decoder based on a 16-bit fixed-point DSP core. After the optimization, the results of decoding 128 kbps, 44.1 kHz stereo MP3 on DSP evaluation platform need 45.9 MIPS and 20.4 kbytes memory space. The optimization rate achieves 65.6% for memory and 49.6% for frequency respectively compared with the results by compiler using floating-point computation. The experimental result indicates the availability of the hardware/software co-optimization approach depending on the algorithm and architecture.
文摘This paper deals with a new hardware/software embedded system design methodology based on design pattern approach by development of a new design tool called smartcell. Three main constraints of embedded systems design process are investigated: the complexity, the partitioning between hardware and software aspects and the reusability. Two intermediate models are carried out in order to solve the complexity problem. The partitioning problem deals with the proposed hardware/software partitioning algorithm based on Ant Colony Optimisation. The reusability problem is resolved by synthesis of intellectual property blocks. Specification and integration of an intelligent controller on heterogeneous platform are considered to illustrate the proposed approach.
文摘We present a simulation framework for wireless sensor networks developed to allow the design exploration and the complete microprocessor-instruction-level debug of network formation, data congestion, nodes interaction, all in one simulation environment. A specifically innovative feature is the co-emulation of selected nodes at clock-cycle-accurate hardware processing level, allowing code debug and exact execution latency evaluation (considering both protocol stack and application), together with other nodes at abstract protocol level, meeting a designer’s needs of simulation speed, scalability and reliability. The simulator is centered on the Zigbee protocol and can be retargeted for different node micro-architectures.
文摘The functions and characteristics of software radio are discussed. Using techniques and method of software radio, the concept and advantages of a new kind of radio fuze, software radio fuze, are analysed. Several kinds of hardware platform structures of the software radio fuze are studied and the key techniques are analysed. The software radio fuze will become the most promising radio fuze techniques in 21st century.
基金supported by National Science and Technology Major Project granted No.2016ZX01012101
文摘With the introduction of software defined hardware by DARPA Electronics Resurgence Initiative,software definition will be the basic attribute of information system.Benefiting from boundary certainty and algorithm aggregation of domain applications,domain-oriented computing architecture has become the technical direction that considers the high flexibility and efficiency of information system.Aiming at the characteristics of data-intensive computing in different scenarios such as Internet of Things(IoT),big data,artificial intelligence(AI),this paper presents a domain-oriented software defined computing architecture,discusses the hierarchical interconnection structure,hybrid granularity computing element and its computational kernel extraction method,finally proves the flexibility and high efficiency of this architecture by experimental comparison.
文摘Software can be seen almost everywhere and is now defining the world.Software has transitioned from an affiliate of hardware,to a network service that is present in every corner of our social lives.
文摘Nowadays, from home monitoring to large airport security, a lot of digital video surveillance systems have been used. Digital surveillance system usually requires streaming video processing abilities. As an advanced video coding method, H.264 is introduced to reduce the large video data dramatically (usually by 70X or more). However, computational overhead occurs when coding and decoding H.264 video. In this paper, a System-on-a-Chip (SoC) based hardware acceleration solution for video codec is proposed, which can also be used for other software applications. The characteristics of the video codec are analyzed by using the profiling tool. The Hadamard function, which is the bottleneck of H.264, is identified not only by execution time but also another two attributes, such as cycle per loop and loop round. The Co-processor approach is applied to accelerate the Hadamard function by transforming it to hardware. Performance improvement, resource costs and energy consumption are compared and analyzed. Experimental results indicate that 76.5% energy deduction and 8.09X speedup can be reached after balancing these three key factors.
文摘随着电子计算机的日益普及,今天在美国由—ware这个构词成分所组成的词屡屡可见。据美国哈佛大学的教育学教授 Howard Gardner声称,在近期召开的一次有关人工智能的学术会议之后,—ware词突然变得风靡起来。一位长期从事人类记忆研究的神经生物家甚至说,他觉得自已还不能适应这种情况,完全成了“a student of wetware among thecomputer hackers”(处在电子计算机专家中间一名学生)。从80年代初开始,wetware一词便被用来指 human brain。本来也可以用另外几个词:skullware,grayware(gray
文摘为提高造纸机真空部的自动化控制水平,提升造纸企业的成本控制能力,基于PLC和真空度传感器等硬件对变频器功率实施控制,进而实现对水泵和真空泵的动态调整,使水流量和变频器功率维持在较低水平,降低造纸机干燥部运行能耗。为实现该系统的工业应用,提出了纸机能耗控制系统的主程序流程,同时采用WinCC(Windows Control Center,视窗控制中心)组态软件为上位机设计人机交互界面。该操作界面简洁美观、交互友好,可帮助车间工作人员实时掌握纸机真空部运行数据,合理调整真空部变频器运行功率,实现低能耗生产。