With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time...With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version.展开更多
Restricted Boltzmann Machines (RBMs) are an effective model for machine learning;however, they require a significant amount of processing time. In this study, we propose a highly parallel, highly flexible architecture...Restricted Boltzmann Machines (RBMs) are an effective model for machine learning;however, they require a significant amount of processing time. In this study, we propose a highly parallel, highly flexible architecture that combines small and completely parallel RBMs. This proposal addresses problems associated with calculation speed and exponential increases in circuit scale. We show that this architecture can optionally respond to the trade-offs between these two problems. Furthermore, our FPGA implementation performs at a 134 times processing speed up factor with respect to a conventional CPU.展开更多
Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CN...Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).展开更多
Demands for low-energy microcontrollers have been increasing in recent years. Since most microcontrollers achieve user programmability by integrating nonvolatile (NV) memories such as flash memories for storing their ...Demands for low-energy microcontrollers have been increasing in recent years. Since most microcontrollers achieve user programmability by integrating nonvolatile (NV) memories such as flash memories for storing their programs, the large power consumption required in accessing an NV memory has become a major problem. This problem becomes critical when the power supply voltage of NV microcontrollers is decreased. We can solve this problem by introducing an instruction cache, thus reducing the access frequency of the NV memory. Unlike general-purpose microprocessors, microcontrollers used for real-time applications in embedded systems must accurately calculate program execution time prior to its execution. Therefore, we introduce a “transparent” instruction cache, which does not change the existing NV microcontroller’s cycle-level execution time, for reducing power and energy consumption, but not for improving the processing speed. We have conducted detailed microar chitecture design based on the architecture of a major industrial microcontroller, and we evaluated power and energy consumption for several benchmark programs. Our evaluation shows that the proposed instruction cache can successfully reduce energy consumption in a fairly wide range of practical NV microcontroller configurations.展开更多
In the era of Internet of Things, the battery life of edge devices must be extended for sensing connection to the Internet. We aim to reduce the power consumption of the microprocessor embedded in such devices by usin...In the era of Internet of Things, the battery life of edge devices must be extended for sensing connection to the Internet. We aim to reduce the power consumption of the microprocessor embedded in such devices by using a novel dynamically reconfigurable accelerator. Conventional microprocessors consume a large amount of power for memory access, in registers, and for the control of the processor itself rather than computation;this decreases the energy efficiency. Dynamically reconfigurable accelerators reduce such redundant power by computing in parallel on reconfigurable switches and processing element arrays (often consisting of an arithmetic logic unit (ALU) and registers). We propose a novel dynamically reconfigurable accelerator “DYNaSTA” composed of a dynamically reconfigurable data path and static ALU arrays. The static ALU arrays process instructions in parallel without registers and improve energy efficiency. The dynamically reconfigurable data path includes registers and many switches dynamically reconfigured to resolve operand dependencies between instructions mapped on the static ALU array, and forwards appropriate operands to the static ALU array. Therefore, the DYNaSTA accelerator has more flexibility while improving the energy efficiency compared with the conventional dynamically reconfigurable accelerators. We simulated the power consumption of the proposed DYNaSTA accelerator and measured the fabricated chip. As a result, the power consumption was reduced by 69% to 86%, and the energy efficiency improved 4.5 to 13 times compared to a general RISC microprocessor.展开更多
Atomic switches can be used in future nanodevices and to realize conceptually novel electronics in new types of computer architecture because of their simple structure, ease of operation, stability, and reliability. T...Atomic switches can be used in future nanodevices and to realize conceptually novel electronics in new types of computer architecture because of their simple structure, ease of operation, stability, and reliability. The atomic switch is a single solid-state switch with inherent learning abilities that exhibits various nonlinear behaviors with network devices. However, previous studies focused on experiments and nonvolatile memory applications, and studies on the application of the physical properties of the atomic switch in computing were nonexistent. Therefore, we present a simple behavioral model of a molecular gap-type atomic switch that can be included in a simulator. The model was described by three simple equations that reproduced the bistability using a double-well potential and was able to easily be transferred to a simulator using arbitrary numerical values and be integrated into HSPICE. Simulations using the experimental parameters of the proposed atomic switch agreed with the experimental results. This model will allow circuit designers to explore new architectures, contributing to the development of new computing methods.展开更多
文摘With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version.
文摘Restricted Boltzmann Machines (RBMs) are an effective model for machine learning;however, they require a significant amount of processing time. In this study, we propose a highly parallel, highly flexible architecture that combines small and completely parallel RBMs. This proposal addresses problems associated with calculation speed and exponential increases in circuit scale. We show that this architecture can optionally respond to the trade-offs between these two problems. Furthermore, our FPGA implementation performs at a 134 times processing speed up factor with respect to a conventional CPU.
文摘Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with time-domain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).
文摘Demands for low-energy microcontrollers have been increasing in recent years. Since most microcontrollers achieve user programmability by integrating nonvolatile (NV) memories such as flash memories for storing their programs, the large power consumption required in accessing an NV memory has become a major problem. This problem becomes critical when the power supply voltage of NV microcontrollers is decreased. We can solve this problem by introducing an instruction cache, thus reducing the access frequency of the NV memory. Unlike general-purpose microprocessors, microcontrollers used for real-time applications in embedded systems must accurately calculate program execution time prior to its execution. Therefore, we introduce a “transparent” instruction cache, which does not change the existing NV microcontroller’s cycle-level execution time, for reducing power and energy consumption, but not for improving the processing speed. We have conducted detailed microar chitecture design based on the architecture of a major industrial microcontroller, and we evaluated power and energy consumption for several benchmark programs. Our evaluation shows that the proposed instruction cache can successfully reduce energy consumption in a fairly wide range of practical NV microcontroller configurations.
文摘In the era of Internet of Things, the battery life of edge devices must be extended for sensing connection to the Internet. We aim to reduce the power consumption of the microprocessor embedded in such devices by using a novel dynamically reconfigurable accelerator. Conventional microprocessors consume a large amount of power for memory access, in registers, and for the control of the processor itself rather than computation;this decreases the energy efficiency. Dynamically reconfigurable accelerators reduce such redundant power by computing in parallel on reconfigurable switches and processing element arrays (often consisting of an arithmetic logic unit (ALU) and registers). We propose a novel dynamically reconfigurable accelerator “DYNaSTA” composed of a dynamically reconfigurable data path and static ALU arrays. The static ALU arrays process instructions in parallel without registers and improve energy efficiency. The dynamically reconfigurable data path includes registers and many switches dynamically reconfigured to resolve operand dependencies between instructions mapped on the static ALU array, and forwards appropriate operands to the static ALU array. Therefore, the DYNaSTA accelerator has more flexibility while improving the energy efficiency compared with the conventional dynamically reconfigurable accelerators. We simulated the power consumption of the proposed DYNaSTA accelerator and measured the fabricated chip. As a result, the power consumption was reduced by 69% to 86%, and the energy efficiency improved 4.5 to 13 times compared to a general RISC microprocessor.
文摘Atomic switches can be used in future nanodevices and to realize conceptually novel electronics in new types of computer architecture because of their simple structure, ease of operation, stability, and reliability. The atomic switch is a single solid-state switch with inherent learning abilities that exhibits various nonlinear behaviors with network devices. However, previous studies focused on experiments and nonvolatile memory applications, and studies on the application of the physical properties of the atomic switch in computing were nonexistent. Therefore, we present a simple behavioral model of a molecular gap-type atomic switch that can be included in a simulator. The model was described by three simple equations that reproduced the bistability using a double-well potential and was able to easily be transferred to a simulator using arbitrary numerical values and be integrated into HSPICE. Simulations using the experimental parameters of the proposed atomic switch agreed with the experimental results. This model will allow circuit designers to explore new architectures, contributing to the development of new computing methods.