期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
A Sensitivity Study of Single Column Model
1
作者 董敏 许秦 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 1996年第3期313-324,共12页
A single column model (SCM) is constructed by extracting the physical subroutines from the NCAR Community Climate Model version 1 (CCM1).Simulated data are generated by CCM1 and used to validate the SCM and to study t... A single column model (SCM) is constructed by extracting the physical subroutines from the NCAR Community Climate Model version 1 (CCM1).Simulated data are generated by CCM1 and used to validate the SCM and to study the sensitivity of the SCM to errors in its input data.It is found that the SCM temperature predictions are moderately sensitive to errors in the input horizontal temperature flux convergence and moisture flux convergence.Two types of error are concerned in this study,random errors due to insufficient data resolution,and errors due to insufficient data area coverage.While the first type of error can be reduced by filtering and/or increasing the data resolution,it is shown that the second type of error can be reduced by enlarging the data area coverage and using a suitable method to compute the input flux convergence terms. 展开更多
关键词 single column model Input data errors Sensitivity study
下载PDF
A TSE based design for MMSE and QRD of MIMO systems based on ASIP
2
作者 冯雪林 SHI Jinglin +3 位作者 CHEN Yang FU Yanlu ZHANG Qineng XIAO Feng 《High Technology Letters》 EI CAS 2023年第2期166-173,共8页
A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set process... A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations. 展开更多
关键词 multi-input and multi-output(MIMO) minimum mean-square error(MMSE) QR decomposition(QRD) Taylor series expansion(TSE) application specific instruction set processor(ASIP) instruction set architecture(ISA) single instruction multiple data(SIMD) very long instruction word(VLIW)
下载PDF
Combining Task Scheduling in Power Adaptive Dynamic Reconfigurable System 被引量:2
3
作者 Hui Dong Le-Tian Huang +1 位作者 Jun-Shi Wang Terrence Mak 《Journal of Electronic Science and Technology》 CAS 2012年第4期296-301,共6页
Supplying the electronic equipment by exploiting ambient energy sources is a hot spot. In order to achieve the match between power supply and demands under the variance of environments at real time, a reconfigurable t... Supplying the electronic equipment by exploiting ambient energy sources is a hot spot. In order to achieve the match between power supply and demands under the variance of environments at real time, a reconfigurable technique is taken. In this paper, a dynamic power consumption model by using a lookup table as a unit is proposed. Then, we establish a system-level task scheduling model according to the task type. Based on single instruction multiple data (SIMD) architecture which contains a processing system and a control system with a Nios II processor, a practical dynamic reconfigurable system is built. The approach is evaluated on a hardware platform. The test results show that the system can automatically adjust the power consumption in case of external energy input changing. The utilization of the system dynamic power of their portion is from 80.05% to 91.75% during the first task assignment. During the entire processing cycle, the total energy efficiency is 97.67%. 展开更多
关键词 Nios II power adaptive recon-figuration single instruction multiple data (SIMD) taskscheduling model.
下载PDF
HXPY: A High-Performance Data Processing Package for Financial Time-Series Data
4
作者 郭家栋 彭靖姝 +1 位作者 苑航 倪明选 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第1期3-24,共22页
A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the succ... A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the successful adoption of machine learning models on financial data,where the importance of accuracy and timeliness demands highly effective computing frameworks.However,traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues,such as the outlier handling with stock suspension in Pandas and TA-Lib.In this paper,we propose HXPY,a high-performance data processing package with a C++/Python interface for financial time-series data.HXPY supports miscellaneous acceleration techniques such as the streaming algorithm,the vectorization instruction set,and memory optimization,together with various functions such as time window functions,group operations,down-sampling operations,cross-section operations,row-wise or column-wise operations,shape transformations,and alignment functions.The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts.From MiBs to GiBs data,HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times. 展开更多
关键词 dataframe time-series data SIMD(single instruction multiple data) CUDA(Compute Unified Device Architecture)
原文传递
PEPFL:A framework for a practical and efficient privacy-preserving federated learning
5
作者 Yange Chen Baocang Wang +3 位作者 Hang Jiang Pu Duan Yuan Ping Zhiyong Hong 《Digital Communications and Networks》 SCIE 2024年第2期355-368,共14页
As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and effic... As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and efficient solution has not been established in previous work due to the absence of efficient matrix computation and cryptography schemes in the privacy-preserving federated learning model,especially in partially homomorphic cryptosystems.In this paper,we propose a Practical and Efficient Privacy-preserving Federated Learning(PEPFL)framework.First,we present a lifted distributed ElGamal cryptosystem for federated learning,which can solve the multi-key problem in federated learning.Secondly,we develop a Practical Partially Single Instruction Multiple Data(PSIMD)parallelism scheme that can encode a plaintext matrix into single plaintext for encryption,improving the encryption efficiency and reducing the communication cost in partially homomorphic cryptosystem.In addition,based on the Convolutional Neural Network(CNN)and the designed cryptosystem,a novel privacy-preserving federated learning framework is designed by using Momentum Gradient Descent(MGD).Finally,we evaluate the security and performance of PEPFL.The experiment results demonstrate that the scheme is practicable,effective,and secure with low communication and computation costs. 展开更多
关键词 Federated learning Partially single instruction multiple data Momentum gradient descent ElGamal Multi-key Homomorphic encryption
下载PDF
Evaluating RISC-V Vector Instruction Set Architecture Extension with Computer Vision Workloads
6
作者 李若时 彭平 +2 位作者 邵志远 金海 郑然 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第4期807-820,共14页
Computer vision(CV)algorithms have been extensively used for a myriad of applications nowadays.As the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processi... Computer vision(CV)algorithms have been extensively used for a myriad of applications nowadays.As the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV algorithms.Single Instruction Multiple Data(SIMD)instructions,capable of conducting the same operation on multiple data items in a single instruction,are extensively employed to improve the efficiency of CV algorithms.In this paper,we evaluate the power and effectiveness of RISC-V vector extension(RV-V)on typical CV algorithms,such as Gray Scale,Mean Filter,and Edge Detection.By our examinations,we show that compared with the baseline OpenCV implementation using scalar instructions,the equivalent implementations using the RV-V(version 0.8)can reduce the instruction count of the same CV algorithm up to 24x,when processing the same input images.Whereas,the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V co-processor.In our evaluation,by using the vector co-processor(with eight execution lanes)of Xuantie C906,vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts. 展开更多
关键词 RISC-V vector extension single instruction multiple data(SIMD) computer vision OpenCV
原文传递
Bypass-Enabled Thread Compaction for Divergent Control Flow in Graphics Processing Units
7
作者 李炳超 魏继增 +1 位作者 郭炜 孙济洲 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第2期245-256,共12页
Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a war... Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a warp may jump to different paths after conditional branches.Such divergent control flow makes some lanes idle and hence reduces the SIMD utilization of GPUs.To alleviate the waste of SIMD lanes,threads from multiple warps can be collected together to improve the SIMD lane utilization by compacting threads into idle lanes.However,this mechanism induces extra barrier synchronizations since warps have to be stalled to wait for other warps for compactions,resulting in that no warps are scheduled in some cases.In this paper,we propose an approach to reduce the overhead of barrier synchronizat ions induced by compactions,In our approach,a compaction is bypassed by warps whose threads all jump to the same path after branches.Moreover,warps waiting for a compaction can also bypass this compaction when no warps are ready for issuing.In addition,a compaction is canceled if idle lanes can not be reduced via this compaction.The experimental results demonstrate that our approach provides an average improvement of 21%over the baseline GPU for applications with massive divergent branches,while recovering the performance loss induced by compactions by 13%on average for applications with many non-divergent control flows. 展开更多
关键词 graphics processing unit(GPU) single instruction ultiple data(SIMD) THREAD warps BYPASS
原文传递
A New Implementation of the Post-Stage Tasks of Motion Estimation Using SIMD Architecture
8
作者 张武健 邱晓海 +1 位作者 周润德 陈弘毅 《Tsinghua Science and Technology》 SCIE EI CAS 2001年第4期355-360,373,共7页
Usually a single MPEG2 video encoder chip realizes the multiple post stage tasks of motion estimation, such as motion vector refinement and prediction error generation, using multiple hardware modules. This paper p... Usually a single MPEG2 video encoder chip realizes the multiple post stage tasks of motion estimation, such as motion vector refinement and prediction error generation, using multiple hardware modules. This paper proposes a new architecture using only a single module to implement the post stage tasks of motion estimation, which has a single instruction stream over multiple data streams (SIMD). The new architecture is simple and more regular; capable of providing sufficient computational power and of adapting to the encoding flexibility required by the MPEG2 standard. Therefore, it is a more suitable architecture for the system on a chip. NEL Corporation (NTT Electronics, Japan) has integrated a circuit based on this architecture into the single MPEG2 MP@ML encoder chip, which uses the multiresolution telescopic search motion estimation algorithm. Using 0.25 μm CMOS, four metal layer technology, this circuit has 15.4 M gates with an area of about 29 mm 2. The operating clock frequency is 81 MHz. 展开更多
关键词 MPEG2 motion estimation single instruction stream over multiple data streams system on a chip
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部