In this paper, we present a comprehensive numerical simulation of a point wave absorber in deep water. Analyses are performed in both the frequency and time domains. The converter is a two-body floating-point absorber...In this paper, we present a comprehensive numerical simulation of a point wave absorber in deep water. Analyses are performed in both the frequency and time domains. The converter is a two-body floating-point absorber (FPA) with one degree of freedom in the heave direction. Its two parts are connected by a linear mass-spring-damper system. The commercial ANSYS-AQWA software used in this study performs well in considering validations. The velocity potential is obtained by assuming incompressible and irrotational flow. As such, we investigated the effects of wave characteristics on energy conversion and device efficiency, including wave height and wave period, as well as the device diameter, draft, geometry, and damping coefficient. To validate the model, we compared our numerical results with those from similar experiments. Our study results can clearly help to maximize the converter's efficiency when considering specific conditions.展开更多
An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields...An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.展开更多
In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product uni...In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product unit and add-subtract unit. In these arithmetic units, operations are performed over complex data values. A modified fused floating-point two-term dot product and an enhanced model for the Radix-4 FFT butterfly unit are proposed. The modified fused two-term dot product is designed using Radix-16 booth multiplier. Radix-16 booth multiplier will reduce the switching activities compared to Radix-8 booth multiplier in existing system and also will reduce the area required. The proposed architecture is implemented efficiently for Radix-4 decimation in time(DIT) FFT butterfly with the two floating-point fused arithmetic units. The proposed enhanced architecture is synthesized, implemented, placed and routed on a FPGA device using Xilinx ISE tool. It is observed that the Radix-4 DIT fused floating-point FFT butterfly requires 50.17% less space and 12.16% reduced power compared to the existing methods and the proposed enhanced model requires 49.82% less space on the FPGA device compared to the proposed design. Also, reduced power consumption is addressed by utilizing the reusability technique, which results in 11.42% of power reduction of the enhanced model compared to the proposed design.展开更多
A numerical model based on a boundary element method (BEM) is developed to predict the performance of two-body selfreacting floating-point absorber (SRFPA) wave energy systems that operate predominantly in heave.The k...A numerical model based on a boundary element method (BEM) is developed to predict the performance of two-body selfreacting floating-point absorber (SRFPA) wave energy systems that operate predominantly in heave.The key numerical issues in applying the BEM are systematically discussed.In particular,some improvements and simplifications in the numerical scheme are developed to evaluate the free surface Green's function,which is a main element of difficulty in the BEM.For a locked SRFPA system,the present method is compared with the existing experiment and the Reynolds-averaged NavierStokes (RANS)-based method,where it is shown that the inviscid assumption leads to substantial over-prediction of the heave response.For the unlocked SRFPA model we study in this paper,the additional viscous damping primarily induced by flow separation and vortex shedding,is modelled as a quadratic drag force,which is proportional to the square of body velocity.The inclusion of viscous drag in present method significantly improves the prediction of the heave responses and the power absorption performance of the SRFPA system,obtaining results excellent agreement with experimental data and the RANS simulation results over a broad range of incident wave periods,except near resonance in larger wave height scenarios.It is found that the wave overtopping and the re-entering impact of out-of-water floating body are observed more frequently in larger waves,where these non-linear effects are the dominant damping sources and could significantly reduce the power output and the motion responses of the SRFPA system.展开更多
In this article, the least program behavior decomposition method (LPBD) is put forward from a program structure point of view. This method can be extensively used both in algorithms of automatic differentiation (AD) a...In this article, the least program behavior decomposition method (LPBD) is put forward from a program structure point of view. This method can be extensively used both in algorithms of automatic differentiation (AD) and in tools design, and does not require programs to be evenly separable but the cost in terms of operations count and memory is similar to methods using checkpointing. This article starts by summarizing the rules of adjointization and then presents the implementation of LPBD. Next, the definition of the separable program space, based on the fundamental assumptions (FA) of automatic differentiation, is given and the differentiation cost functions are derived. Also, two constants of fundamental importance in AD, s and m, are derived under FA. Under the assumption of even separability, the adjoint cost of simple and deep decomposition is subsequently discussed quantitatively using checkpointing. Finally, the adjoint costs in terms of operations count and memory through the LPBD method are shown to be uniformly dependent on the depth of structure or decomposition.展开更多
The algorithm and its implementation of the leading zero anticipation (LZA) are very vital for the performance of a high-speed floating-point adder in today's state of art microprocessor design. Unfortunately, in p...The algorithm and its implementation of the leading zero anticipation (LZA) are very vital for the performance of a high-speed floating-point adder in today's state of art microprocessor design. Unfortunately, in predicting "shift amount" by a conventional LZA design, the result could be off by one position. This paper presents a novel parallel error detection algorithm for a general-case LZA. The proposed approach enables parallel execution of conventional LZA and its error detection, so that the error-indicatlon signal can be generated earlier in the stage of normalization, thus reducing the critical path and improving overall performance. The circuit implementation of this algorithm also shows its advantages of area and power compared with other previous work.展开更多
文摘In this paper, we present a comprehensive numerical simulation of a point wave absorber in deep water. Analyses are performed in both the frequency and time domains. The converter is a two-body floating-point absorber (FPA) with one degree of freedom in the heave direction. Its two parts are connected by a linear mass-spring-damper system. The commercial ANSYS-AQWA software used in this study performs well in considering validations. The velocity potential is obtained by assuming incompressible and irrotational flow. As such, we investigated the effects of wave characteristics on energy conversion and device efficiency, including wave height and wave period, as well as the device diameter, draft, geometry, and damping coefficient. To validate the model, we compared our numerical results with those from similar experiments. Our study results can clearly help to maximize the converter's efficiency when considering specific conditions.
文摘An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.
文摘In this work, power efficient butterfly unit based FFT architecture is presented. The butterfly unit is designed using floating-point fused arithmetic units. The fused arithmetic units include two-term dot product unit and add-subtract unit. In these arithmetic units, operations are performed over complex data values. A modified fused floating-point two-term dot product and an enhanced model for the Radix-4 FFT butterfly unit are proposed. The modified fused two-term dot product is designed using Radix-16 booth multiplier. Radix-16 booth multiplier will reduce the switching activities compared to Radix-8 booth multiplier in existing system and also will reduce the area required. The proposed architecture is implemented efficiently for Radix-4 decimation in time(DIT) FFT butterfly with the two floating-point fused arithmetic units. The proposed enhanced architecture is synthesized, implemented, placed and routed on a FPGA device using Xilinx ISE tool. It is observed that the Radix-4 DIT fused floating-point FFT butterfly requires 50.17% less space and 12.16% reduced power compared to the existing methods and the proposed enhanced model requires 49.82% less space on the FPGA device compared to the proposed design. Also, reduced power consumption is addressed by utilizing the reusability technique, which results in 11.42% of power reduction of the enhanced model compared to the proposed design.
基金We would like to acknowledge the National Natural Science Foundation of China(Grants 51479114,51761135012)for supporting this work.
文摘A numerical model based on a boundary element method (BEM) is developed to predict the performance of two-body selfreacting floating-point absorber (SRFPA) wave energy systems that operate predominantly in heave.The key numerical issues in applying the BEM are systematically discussed.In particular,some improvements and simplifications in the numerical scheme are developed to evaluate the free surface Green's function,which is a main element of difficulty in the BEM.For a locked SRFPA system,the present method is compared with the existing experiment and the Reynolds-averaged NavierStokes (RANS)-based method,where it is shown that the inviscid assumption leads to substantial over-prediction of the heave response.For the unlocked SRFPA model we study in this paper,the additional viscous damping primarily induced by flow separation and vortex shedding,is modelled as a quadratic drag force,which is proportional to the square of body velocity.The inclusion of viscous drag in present method significantly improves the prediction of the heave responses and the power absorption performance of the SRFPA system,obtaining results excellent agreement with experimental data and the RANS simulation results over a broad range of incident wave periods,except near resonance in larger wave height scenarios.It is found that the wave overtopping and the re-entering impact of out-of-water floating body are observed more frequently in larger waves,where these non-linear effects are the dominant damping sources and could significantly reduce the power output and the motion responses of the SRFPA system.
文摘In this article, the least program behavior decomposition method (LPBD) is put forward from a program structure point of view. This method can be extensively used both in algorithms of automatic differentiation (AD) and in tools design, and does not require programs to be evenly separable but the cost in terms of operations count and memory is similar to methods using checkpointing. This article starts by summarizing the rules of adjointization and then presents the implementation of LPBD. Next, the definition of the separable program space, based on the fundamental assumptions (FA) of automatic differentiation, is given and the differentiation cost functions are derived. Also, two constants of fundamental importance in AD, s and m, are derived under FA. Under the assumption of even separability, the adjoint cost of simple and deep decomposition is subsequently discussed quantitatively using checkpointing. Finally, the adjoint costs in terms of operations count and memory through the LPBD method are shown to be uniformly dependent on the depth of structure or decomposition.
文摘The algorithm and its implementation of the leading zero anticipation (LZA) are very vital for the performance of a high-speed floating-point adder in today's state of art microprocessor design. Unfortunately, in predicting "shift amount" by a conventional LZA design, the result could be off by one position. This paper presents a novel parallel error detection algorithm for a general-case LZA. The proposed approach enables parallel execution of conventional LZA and its error detection, so that the error-indicatlon signal can be generated earlier in the stage of normalization, thus reducing the critical path and improving overall performance. The circuit implementation of this algorithm also shows its advantages of area and power compared with other previous work.