Due to the interdependency of frame synchronization(FS)and channel estimation(CE),joint FS and CE(JFSCE)schemes are proposed to enhance their functionalities and therefore boost the overall performance of wireless com...Due to the interdependency of frame synchronization(FS)and channel estimation(CE),joint FS and CE(JFSCE)schemes are proposed to enhance their functionalities and therefore boost the overall performance of wireless communication systems.Although traditional JFSCE schemes alleviate the influence between FS and CE,they show deficiencies in dealing with hardware imperfection(HI)and deterministic line-of-sight(LOS)path.To tackle this challenge,we proposed a cascaded ELM-based JFSCE to alleviate the influence of HI in the scenario of the Rician fading channel.Specifically,the conventional JFSCE method is first employed to extract the initial features,and thus forms the non-Neural Network(NN)solutions for FS and CE,respectively.Then,the ELMbased networks,named FS-NET and CE-NET,are cascaded to capture the NN solutions of FS and CE.Simulation and analysis results show that,compared with the conventional JFSCE methods,the proposed cascaded ELM-based JFSCE significantly reduces the error probability of FS and the normalized mean square error(NMSE)of CE,even against the impacts of parameter variations.展开更多
Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro...Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme.展开更多
The Internet of Vehicles(IoV)will carry a large amount of security and privacy-related data,which makes the secure communication between the IoV terminals increasingly critical.This paper studies the joint beamforming...The Internet of Vehicles(IoV)will carry a large amount of security and privacy-related data,which makes the secure communication between the IoV terminals increasingly critical.This paper studies the joint beamforming for physical-layer security transmission in the coexistence of Vehicle-to-Infrastructure(V2I)and Vehicle-toVehicle(V2V)communication with Reconfigurable Intelligent Surface(RIS)assistance,taking into account hardware impairments.A communication model for physical-layer security transmission is established when the eavesdropping user is present and the base station antenna has hardware impairments assisted by RIS.Based on this model,we propose to maximize the V2I physical-layer security transmission rate.To solve the coupled non-convex optimization problem,an alternating optimization algorithm based on second-order cone programming and semidefinite relaxation is proposed to obtain the optimal V2I base station transmit precoding and RIS reflect phase shift matrix.Finally,simulation results are presented to verify the convergence and superiority of our proposed algorithm while analyzing the impact of system parameters on the V2I physical-layer security transmission rate.The simulation results further demonstrate that the proposed robust beamforming algorithm considering hardware impairments will achieve an average performance improvement of 0.7 dB over a non-robustly designed algorithm.Furthermore,increasing the number of RIS reflective units from 10 to 50 results in an almost 2 dB enhancement in secure transmission rate.展开更多
The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization...The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization implementation of the S-box is proposed based on the composite field inverse operation in this paper. This proposed S-box implementation is modeled using Verilog language and synthesized using Design Complier software under the premise of ensuring the correctness of the simulation result. The synthesis results show that, compared to several current S-box implementation schemes, the proposed implementation of the S-box significantly reduces the area overhead and critical path delay, then gets higher hardware efficiency. This provides strong support for realizing efficient and compact S-box ASIC designs.展开更多
随着片上系统(System on Chip,SoC)芯片规模与功能复杂度的膨胀,硬件加速器已成为大规模SoC的重要组成部分。为了缩短产品交付时间,有必要开发硬件加速器仿真模型,以在SoC设计初期支撑架构的探索与评估。在对硬件加速器的特点与建模需...随着片上系统(System on Chip,SoC)芯片规模与功能复杂度的膨胀,硬件加速器已成为大规模SoC的重要组成部分。为了缩短产品交付时间,有必要开发硬件加速器仿真模型,以在SoC设计初期支撑架构的探索与评估。在对硬件加速器的特点与建模需求进行分析的基础上,提出一种基于AXI验证IP(Verification IP,VIP)、SystemVerilog信箱和旗语的硬件加速器建模方法。该方法支持完备的总线协议特性,同时支持多个处理引擎的并行处理与乱序输出。以实际SoC项目中的通信基带加速器为例,对提出的建模方法进行介绍,并进行相应的系统级仿真与分析。所提出的建模方法可实现对硬件加速器总线行为的高效建模,能够有力支撑SoC验证以及系统架构评估,缩短项目的开发周期。展开更多
基金supported in part by the Sichuan Science and Technology Program(Grant No.2023YFG0316)the Industry-University Research Innovation Fund of China University(Grant No.2021ITA10016)+1 种基金the Key Scientific Research Fund of Xihua University(Grant No.Z1320929)the Special Funds of Industry Development of Sichuan Province(Grant No.zyf-2018-056).
文摘Due to the interdependency of frame synchronization(FS)and channel estimation(CE),joint FS and CE(JFSCE)schemes are proposed to enhance their functionalities and therefore boost the overall performance of wireless communication systems.Although traditional JFSCE schemes alleviate the influence between FS and CE,they show deficiencies in dealing with hardware imperfection(HI)and deterministic line-of-sight(LOS)path.To tackle this challenge,we proposed a cascaded ELM-based JFSCE to alleviate the influence of HI in the scenario of the Rician fading channel.Specifically,the conventional JFSCE method is first employed to extract the initial features,and thus forms the non-Neural Network(NN)solutions for FS and CE,respectively.Then,the ELMbased networks,named FS-NET and CE-NET,are cascaded to capture the NN solutions of FS and CE.Simulation and analysis results show that,compared with the conventional JFSCE methods,the proposed cascaded ELM-based JFSCE significantly reduces the error probability of FS and the normalized mean square error(NMSE)of CE,even against the impacts of parameter variations.
基金This work was supported by Open Fund Project of State Key Laboratory of Intelligent Vehicle Safety Technology by Grant with No.IVSTSKL-202311Key Projects of Science and Technology Research Programme of Chongqing Municipal Education Commission by Grant with No.KJZD-K202301505+1 种基金Cooperation Project between Chongqing Municipal Undergraduate Universities and Institutes Affiliated to the Chinese Academy of Sciences in 2021 by Grant with No.HZ2021015Chongqing Graduate Student Research Innovation Program by Grant with No.CYS240801.
文摘Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme.
基金the Key Research and Development Plan of Jiangsu Province,grant number BE2020084-2the National Key Research and Development Program of China,grant number 2020YFB1600104.
文摘The Internet of Vehicles(IoV)will carry a large amount of security and privacy-related data,which makes the secure communication between the IoV terminals increasingly critical.This paper studies the joint beamforming for physical-layer security transmission in the coexistence of Vehicle-to-Infrastructure(V2I)and Vehicle-toVehicle(V2V)communication with Reconfigurable Intelligent Surface(RIS)assistance,taking into account hardware impairments.A communication model for physical-layer security transmission is established when the eavesdropping user is present and the base station antenna has hardware impairments assisted by RIS.Based on this model,we propose to maximize the V2I physical-layer security transmission rate.To solve the coupled non-convex optimization problem,an alternating optimization algorithm based on second-order cone programming and semidefinite relaxation is proposed to obtain the optimal V2I base station transmit precoding and RIS reflect phase shift matrix.Finally,simulation results are presented to verify the convergence and superiority of our proposed algorithm while analyzing the impact of system parameters on the V2I physical-layer security transmission rate.The simulation results further demonstrate that the proposed robust beamforming algorithm considering hardware impairments will achieve an average performance improvement of 0.7 dB over a non-robustly designed algorithm.Furthermore,increasing the number of RIS reflective units from 10 to 50 results in an almost 2 dB enhancement in secure transmission rate.
文摘The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization implementation of the S-box is proposed based on the composite field inverse operation in this paper. This proposed S-box implementation is modeled using Verilog language and synthesized using Design Complier software under the premise of ensuring the correctness of the simulation result. The synthesis results show that, compared to several current S-box implementation schemes, the proposed implementation of the S-box significantly reduces the area overhead and critical path delay, then gets higher hardware efficiency. This provides strong support for realizing efficient and compact S-box ASIC designs.
文摘随着片上系统(System on Chip,SoC)芯片规模与功能复杂度的膨胀,硬件加速器已成为大规模SoC的重要组成部分。为了缩短产品交付时间,有必要开发硬件加速器仿真模型,以在SoC设计初期支撑架构的探索与评估。在对硬件加速器的特点与建模需求进行分析的基础上,提出一种基于AXI验证IP(Verification IP,VIP)、SystemVerilog信箱和旗语的硬件加速器建模方法。该方法支持完备的总线协议特性,同时支持多个处理引擎的并行处理与乱序输出。以实际SoC项目中的通信基带加速器为例,对提出的建模方法进行介绍,并进行相应的系统级仿真与分析。所提出的建模方法可实现对硬件加速器总线行为的高效建模,能够有力支撑SoC验证以及系统架构评估,缩短项目的开发周期。