期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
PEPFL:A framework for a practical and efficient privacy-preserving federated learning
1
作者 Yange Chen Baocang Wang +3 位作者 Hang Jiang Pu Duan Yuan Ping Zhiyong Hong 《Digital Communications and Networks》 SCIE CSCD 2024年第2期355-368,共14页
As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and effic... As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and efficient solution has not been established in previous work due to the absence of efficient matrix computation and cryptography schemes in the privacy-preserving federated learning model,especially in partially homomorphic cryptosystems.In this paper,we propose a Practical and Efficient Privacy-preserving Federated Learning(PEPFL)framework.First,we present a lifted distributed ElGamal cryptosystem for federated learning,which can solve the multi-key problem in federated learning.Secondly,we develop a Practical Partially Single Instruction Multiple Data(PSIMD)parallelism scheme that can encode a plaintext matrix into single plaintext for encryption,improving the encryption efficiency and reducing the communication cost in partially homomorphic cryptosystem.In addition,based on the Convolutional Neural Network(CNN)and the designed cryptosystem,a novel privacy-preserving federated learning framework is designed by using Momentum Gradient Descent(MGD).Finally,we evaluate the security and performance of PEPFL.The experiment results demonstrate that the scheme is practicable,effective,and secure with low communication and computation costs. 展开更多
关键词 Federated learning Partially single instruction multiple data Momentum gradient descent ELGAMAL Multi-key Homomorphic encryption
下载PDF
COMPENSATION FOR THE MUTUAL COUPLING EFFECT FOR THE ESPRIT ALGORITHM IN SINGLE SNAPSHOT ARRAY PROCESSING
2
作者 Lian Xiaohua Zhou Jianjiang +1 位作者 Li Hailin Cai Wenqi 《Journal of Electronics(China)》 2007年第5期662-667,共6页
An effective method is introduced to compensate the effects of mutual coupling for the Estimation of Signal Parameter via Rotational Invariance Techniques (ESPRIT) direction finding algorithm in application of signal ... An effective method is introduced to compensate the effects of mutual coupling for the Estimation of Signal Parameter via Rotational Invariance Techniques (ESPRIT) direction finding algorithm in application of signal snapshot array processing.Changing the covariance matrix into a Teoplitz matrix can achieve high resolution in the Direction Of Arrive (DOA) estimation.How the mutual coupling affects the array antennas has been discussed and a new definition of mutual im- pedance has been used to characterize the mutual coupling effects between the array elements.Based on the new mutual impedance matrix,a practical method is presented to eliminate the effects of mutual coupling for ESPRIT in the single snapshot data processing.The simulation results show that, this new method not only properly reduces the effects of mutual coupling,but also maintains its steady performance even for weak signals. 展开更多
关键词 Mutual coupling Mutual impedance Estimation Signal Parameter via Rotational In-variance Techniques (ESPRIT) single snapshot data
下载PDF
A Sensitivity Study of Single Column Model
3
作者 董敏 许秦 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 1996年第3期313-324,共12页
A single column model (SCM) is constructed by extracting the physical subroutines from the NCAR Community Climate Model version 1 (CCM1).Simulated data are generated by CCM1 and used to validate the SCM and to study t... A single column model (SCM) is constructed by extracting the physical subroutines from the NCAR Community Climate Model version 1 (CCM1).Simulated data are generated by CCM1 and used to validate the SCM and to study the sensitivity of the SCM to errors in its input data.It is found that the SCM temperature predictions are moderately sensitive to errors in the input horizontal temperature flux convergence and moisture flux convergence.Two types of error are concerned in this study,random errors due to insufficient data resolution,and errors due to insufficient data area coverage.While the first type of error can be reduced by filtering and/or increasing the data resolution,it is shown that the second type of error can be reduced by enlarging the data area coverage and using a suitable method to compute the input flux convergence terms. 展开更多
关键词 single column model Input data errors Sensitivity study
下载PDF
Sorting Data Elements by SOCD Using Centralized Diamond Architecture
4
作者 Masumeh Damrudi Kamal Jadidy Aval 《Computer Technology and Application》 2011年第5期374-377,共4页
Several parallel sorting techniques on different architectures have been studied for many years. Due to the need for faster systems in today's world, parallelism can be used to accelerate applications. Nowadays, para... Several parallel sorting techniques on different architectures have been studied for many years. Due to the need for faster systems in today's world, parallelism can be used to accelerate applications. Nowadays, parallel operations are used to solve computer problems such as sort and search, which result in a reasonable speed. Sorting is one of the most important operations in computing world. The authors always try to find the best in different areas which the premier is speedup. In this paper, the authors issued a sort with O(logn) time complexity on PRAM EREW (Parallel Random Access Machine Exclusive Read Exclusive Write). The algorithm is designed in a manner that keeps the tradeoff between the number of processor elements in the architecture and execution time. The simulation of the algorithm proves the theoretical analysis of the algorithm. The results of this research can be utilized in developing faster embedded systems. Sorting on Centralized Diamond (SOCD) algorithm is issued on the novel Centralized Diamond architecture which takes the advantages of Single Instruction Multiple Data (SIMD) architecture. This architecture and the sort on it are intuitive and optimal. 展开更多
关键词 Parallel sorting diamond architecture single instruction multiple data (SIMD) parallel random access machine exclusive read exclusive write (PRAM EREW) sorting on centralized diamond (SOCD).
下载PDF
A TSE based design for MMSE and QRD of MIMO systems based on ASIP
5
作者 冯雪林 SHI Jinglin +3 位作者 CHEN Yang FU Yanlu ZHANG Qineng XIAO Feng 《High Technology Letters》 EI CAS 2023年第2期166-173,共8页
A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set process... A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations. 展开更多
关键词 multi-input and multi-output(MIMO) minimum mean-square error(MMSE) QR decomposition(QRD) Taylor series expansion(TSE) application specific instruction set processor(ASIP) instruction set architecture(ISA) single instruction multiple data(SIMD) very long instruction word(VLIW)
下载PDF
Research on the On-line Monitoring System of Battery Power
6
作者 Leilei XIE Youyu CHEN Long XIN 《International Journal of Technology Management》 2015年第2期31-33,共3页
Through the comparison of various acquisition technology and related technology theory of the existing scheme, the paper analyze and design the power battery testing platform of the data acquisition system, and give t... Through the comparison of various acquisition technology and related technology theory of the existing scheme, the paper analyze and design the power battery testing platform of the data acquisition system, and give the research design scheme of the utility model through the design of the software on PC and CAN bus, which makes the full synchronization requirements acquisition unit; improve the linearity and stability of total voltage and current acquisition by the integrated circuit, and improve the system sampling rate, effectively complete the corresponding index. Finally, through experimental verification, to ensure the completion of the technical indicators. 展开更多
关键词 single data acquisition CAN AH and WH metering power battery
下载PDF
Combining Task Scheduling in Power Adaptive Dynamic Reconfigurable System 被引量:2
7
作者 Hui Dong Le-Tian Huang +1 位作者 Jun-Shi Wang Terrence Mak 《Journal of Electronic Science and Technology》 CAS 2012年第4期296-301,共6页
Supplying the electronic equipment by exploiting ambient energy sources is a hot spot. In order to achieve the match between power supply and demands under the variance of environments at real time, a reconfigurable t... Supplying the electronic equipment by exploiting ambient energy sources is a hot spot. In order to achieve the match between power supply and demands under the variance of environments at real time, a reconfigurable technique is taken. In this paper, a dynamic power consumption model by using a lookup table as a unit is proposed. Then, we establish a system-level task scheduling model according to the task type. Based on single instruction multiple data (SIMD) architecture which contains a processing system and a control system with a Nios II processor, a practical dynamic reconfigurable system is built. The approach is evaluated on a hardware platform. The test results show that the system can automatically adjust the power consumption in case of external energy input changing. The utilization of the system dynamic power of their portion is from 80.05% to 91.75% during the first task assignment. During the entire processing cycle, the total energy efficiency is 97.67%. 展开更多
关键词 Nios II power adaptive recon-figuration single instruction multiple data (SIMD) taskscheduling model.
下载PDF
ALGORITHMS AND ARCHITECTURE IMPLEMENTATIONS OF MIMO OFDM BASEBAND RECEIVER BASED ON THE SIMD DSP CORE 被引量:1
8
作者 Hao Xuefei Chen Jie +1 位作者 Zhao Danfeng Zhou Chaoxian 《Journal of Electronics(China)》 2006年第5期763-768,共6页
This letter presents a programmable single-chip architecture for Multi-lnput and Multi-Output (M1MO) OFDM baseband receiver. The architecture comprises a Single Instruction Multiple Data (SIMD) DSP core and three ... This letter presents a programmable single-chip architecture for Multi-lnput and Multi-Output (M1MO) OFDM baseband receiver. The architecture comprises a Single Instruction Multiple Data (SIMD) DSP core and three coprocessors that are used for synchronization, FFT and channel decoder. In this MIMO OFDM system, the Zero Correlation Zone (ZCZ) code is used as the synchronization word preamble of packet in the physical layer in order to avoid the interference from other transmitting antennas. Furthermore, a simple channel estimation algorithm is proposed which is appropriate tbr the SIMD DSP computation. 展开更多
关键词 Multi-Input and Multi-Output (MIMO) OFDM Baseband receiver Zero Correlation Zone (ZCZ) code single Instruction Multiple data (SIMD) DSP
下载PDF
A parallel memory architecture for video coding
9
作者 Jian-ying PENG Xiao-lang YAN +1 位作者 De-xian LI Li-zhong CHEN 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第12期1644-1655,共12页
To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel ske... To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-pm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding. 展开更多
关键词 single instruction multiple data (SIMD) Video coding Parallel memory Skewing scheme
下载PDF
Efficient SIMD optimization for media processors
10
作者 Jian-peng ZHOU Ce SHI 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第4期524-530,共7页
Single instruction multiple data (SIMD) instructions are often implemented in modem media processors. Although SIMD instructions are useful in multimedia applications, most compilers do not have good support for SIM... Single instruction multiple data (SIMD) instructions are often implemented in modem media processors. Although SIMD instructions are useful in multimedia applications, most compilers do not have good support for SIMD instructions. This paper focuses on SIMD instructions generation for media processors. We present an efficient code optimization approach that is integrated into a retargetable C compiler. SIMD instructions are generated by finding and combining the same operations in programs. Experimental results for the UltraSPARC VIS instruction set show that a speedup factor up to 2.639 is obtained. 展开更多
关键词 Retargetable compiler single instruction multiple data (SIMD) instruction LCC
下载PDF
Hardware-Software Co-implementation of H.264 Decoder in SoC
11
作者 杨宇红 张文军 +1 位作者 熊恋学 饶振宁 《Journal of Shanghai Jiaotong university(Science)》 EI 2006年第3期335-339,共5页
With the increasing demand for flexible and efficient implementation of image and video processing algorithms, there should be a good tradeoff between hardware and software design method. This paper utilized the HW-SW... With the increasing demand for flexible and efficient implementation of image and video processing algorithms, there should be a good tradeoff between hardware and software design method. This paper utilized the HW-SW codesign method to implement the H.264 decoder in an SoC with an ARM core, a multimedia processor and a deblocking filter coprocessor. For the parallel processing features of the multimedia processor, clock cycles of decoding process can be dramatically reduced. And the hardware dedicated deblocking filter coprocessor can improve the efficiency a lot. With maximum clock frequency of 150 MHz, the whole system can achieve real time processing speed and flexibility. 展开更多
关键词 HW-SW co-implementation single instruction multiple data (SIMD) multimedia processor H.264 decoder COPROCESSOR
下载PDF
HXPY: A High-Performance Data Processing Package for Financial Time-Series Data
12
作者 郭家栋 彭靖姝 +1 位作者 苑航 倪明选 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第1期3-24,共22页
A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the succ... A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the successful adoption of machine learning models on financial data,where the importance of accuracy and timeliness demands highly effective computing frameworks.However,traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues,such as the outlier handling with stock suspension in Pandas and TA-Lib.In this paper,we propose HXPY,a high-performance data processing package with a C++/Python interface for financial time-series data.HXPY supports miscellaneous acceleration techniques such as the streaming algorithm,the vectorization instruction set,and memory optimization,together with various functions such as time window functions,group operations,down-sampling operations,cross-section operations,row-wise or column-wise operations,shape transformations,and alignment functions.The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts.From MiBs to GiBs data,HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times. 展开更多
关键词 dataframe time-series data SIMD(single instruction multiple data) CUDA(Compute Unified Device Architecture)
原文传递
Evaluating RISC-V Vector Instruction Set Architecture Extension with Computer Vision Workloads
13
作者 李若时 彭平 +2 位作者 邵志远 金海 郑然 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第4期807-820,共14页
Computer vision(CV)algorithms have been extensively used for a myriad of applications nowadays.As the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processi... Computer vision(CV)algorithms have been extensively used for a myriad of applications nowadays.As the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV algorithms.Single Instruction Multiple Data(SIMD)instructions,capable of conducting the same operation on multiple data items in a single instruction,are extensively employed to improve the efficiency of CV algorithms.In this paper,we evaluate the power and effectiveness of RISC-V vector extension(RV-V)on typical CV algorithms,such as Gray Scale,Mean Filter,and Edge Detection.By our examinations,we show that compared with the baseline OpenCV implementation using scalar instructions,the equivalent implementations using the RV-V(version 0.8)can reduce the instruction count of the same CV algorithm up to 24x,when processing the same input images.Whereas,the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V co-processor.In our evaluation,by using the vector co-processor(with eight execution lanes)of Xuantie C906,vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts. 展开更多
关键词 RISC-V vector extension single instruction multiple data(SIMD) computer vision OpenCV
原文传递
Novel algorithm for complex bit reversal:employing vector permutation and branch reduction methods
14
作者 Feng YU Ze-ke WANG Rui-feng GE 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第10期1492-1499,共8页
We present novel vector permutation and branch reduction methods to minimize the number of execution cycles for bit reversal algorithms.The new methods are applied to single instruction multiple data(SIMD) parallel im... We present novel vector permutation and branch reduction methods to minimize the number of execution cycles for bit reversal algorithms.The new methods are applied to single instruction multiple data(SIMD) parallel implementation of complex data floating-point fast Fourier transform(FFT).The number of operational clock cycles can be reduced by an average factor of 3.5 by using our vector permutation methods and by 1.1 by using our branch reduction methods,compared with conventional im-plementations.Experiments on MPC7448(a well-known SIMD reduced instruction set computing processor) demonstrate that our optimal bit-reversal algorithm consistently takes fewer than two cycles per element in complex array operations. 展开更多
关键词 Bit reversal Vector permutation Branch reduction single instruction multiple data (SIMD) Fast Fourier transform (FFT)
原文传递
Bypass-Enabled Thread Compaction for Divergent Control Flow in Graphics Processing Units
15
作者 LI Bingchao WEI Jizeng +1 位作者 GUO Wei SUN Jizhou 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第2期245-256,共12页
Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a war... Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a warp may jump to different paths after conditional branches.Such divergent control flow makes some lanes idle and hence reduces the SIMD utilization of GPUs.To alleviate the waste of SIMD lanes,threads from multiple warps can be collected together to improve the SIMD lane utilization by compacting threads into idle lanes.However,this mechanism induces extra barrier synchronizations since warps have to be stalled to wait for other warps for compactions,resulting in that no warps are scheduled in some cases.In this paper,we propose an approach to reduce the overhead of barrier synchronizat ions induced by compactions,In our approach,a compaction is bypassed by warps whose threads all jump to the same path after branches.Moreover,warps waiting for a compaction can also bypass this compaction when no warps are ready for issuing.In addition,a compaction is canceled if idle lanes can not be reduced via this compaction.The experimental results demonstrate that our approach provides an average improvement of 21%over the baseline GPU for applications with massive divergent branches,while recovering the performance loss induced by compactions by 13%on average for applications with many non-divergent control flows. 展开更多
关键词 graphics processing unit(GPU) single instruction ultiple data(SIMD) THREAD warps BYPASS
原文传递
A New Implementation of the Post-Stage Tasks of Motion Estimation Using SIMD Architecture
16
作者 张武健 邱晓海 +1 位作者 周润德 陈弘毅 《Tsinghua Science and Technology》 SCIE EI CAS 2001年第4期355-360,373,共7页
Usually a single MPEG2 video encoder chip realizes the multiple post stage tasks of motion estimation, such as motion vector refinement and prediction error generation, using multiple hardware modules. This paper p... Usually a single MPEG2 video encoder chip realizes the multiple post stage tasks of motion estimation, such as motion vector refinement and prediction error generation, using multiple hardware modules. This paper proposes a new architecture using only a single module to implement the post stage tasks of motion estimation, which has a single instruction stream over multiple data streams (SIMD). The new architecture is simple and more regular; capable of providing sufficient computational power and of adapting to the encoding flexibility required by the MPEG2 standard. Therefore, it is a more suitable architecture for the system on a chip. NEL Corporation (NTT Electronics, Japan) has integrated a circuit based on this architecture into the single MPEG2 MP@ML encoder chip, which uses the multiresolution telescopic search motion estimation algorithm. Using 0.25 μm CMOS, four metal layer technology, this circuit has 15.4 M gates with an area of about 29 mm 2. The operating clock frequency is 81 MHz. 展开更多
关键词 MPEG2 motion estimation single instruction stream over multiple data streams system on a chip
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部