期刊文献+
共找到411篇文章
< 1 2 21 >
每页显示 20 50 100
Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs 被引量:1
1
作者 Ruixing Wang Tongxiang Gu Ming Li 《Journal of Computer and Communications》 2017年第6期65-83,共19页
As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo a... As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn’t consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang’s model on average for most general matrices. 展开更多
关键词 sparse matrix-vector multiplication Performance Prediction GPU Normal DISTRIBUTION UNIFORM DISTRIBUTION
下载PDF
A quantum algorithm for Toeplitz matrix-vector multiplication
2
作者 高尚 杨宇光 《Chinese Physics B》 SCIE EI CAS CSCD 2023年第10期248-253,共6页
Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quant... Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quantum algorithm for Toeplitz matrix-vector multiplication with time complexity O(κpolylogn),whereκand 2n are the condition number and the dimension of the circulant matrix extended from the Toeplitz matrix,respectively.For the case with an unknown generating function,we also give a corresponding non-asymptotic quantum version that eliminates the dependency on the L_(1)-normρof the displacement of the structured matrices.Due to the good use of the special properties of Toeplitz matrices,the proposed quantum algorithms are sufficiently accurate and efficient compared to the existing quantum algorithms under certain circumstances. 展开更多
关键词 quantum algorithm Toeplitz matrix-vector multiplication circulant matrix
下载PDF
Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
3
作者 刘力 LiuLi Yang Guang wen 《High Technology Letters》 EI CAS 2013年第4期339-345,共7页
This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the ... This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage. 展开更多
关键词 sparse matrix multiplication cache miss SCALABILITY multi-core CPU GPU
下载PDF
Multiple Endmember Hyperspectral Sparse Unmixing Based on Improved OMP Algorithm 被引量:1
4
作者 Chunhui Zhao Haifeng Zhu +1 位作者 Shiling Cui Bin Qi 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2015年第5期97-104,共8页
In conventional linear spectral mixture analysis model,a class is represented by a single endmember.However,the intra-class spectral variability is usually very large,which makes it difficult to represent a class,and ... In conventional linear spectral mixture analysis model,a class is represented by a single endmember.However,the intra-class spectral variability is usually very large,which makes it difficult to represent a class,and in this case,it leads to incorrect unmixing results. Some proposed algorithms play a positive role in overcoming the endmember variability,but there are shortcomings on computation intensive,unsatisfactory unmixing results and so on. Recently,sparse regression has been applied to unmixing,assuming each mixed pixel can be expressed as a linear combination of only a few spectra in a spectral library. It is essentially the same as multiple endmember spectral unmixing. OMP( orthogonal matching pursuit),a sparse reconstruction algorithm,has advantages of simple structure and high efficiency. However,it does not take into account the constraints of abundance non-negativity and abundance sum-to-one( ANC and ASC),leading to undesirable unmixing results. In order to solve these issues,this paper presents an improved OMP algorithm( fully constraint OMP,FOMP) for multiple endmember hyperspectral sparse unmixing. The proposed algorithm overcomes the shortcomings of OMP,and on the other hand,it solves the problem of endmember variability.The ANC and ASC constraints are firstly added into the OMP algorithm,and then the endmember set is refined by the relative increase in root-mean-square-error( RMSE) to avoid over-fitting,finally pixels are unmixed by their optimal endmember set. The simulated and real hyperspectral data experiments show that FOPM unmixing results are ideally comparable and abundance RMSE reduces much lower than OMP and simple spectral mixture analysis( s SMA),and has a strong anti-noise performance. It proves that multiple endmember spectral mixture analysis is more reasonable. 展开更多
关键词 HYPERSPECTRAL image sparse representation multiplE ENDMEMBER spectral UNMIXING OMP ANC and ASC
下载PDF
Polar Coded Iterative Multiuser Detection for Sparse Code Multiple Access System 被引量:1
5
作者 Hang MU Youhua Tang +3 位作者 Li Li Zheng Ma Pingzhi Fan Weiqiang Xu 《China Communications》 SCIE CSCD 2018年第11期51-61,共11页
Polar coded sparse code multiple access(SCMA) system is conceived in this paper. A simple but new iterative multiuser detection framework is proposed, which consists of a message passing algorithm(MPA) based multiuser... Polar coded sparse code multiple access(SCMA) system is conceived in this paper. A simple but new iterative multiuser detection framework is proposed, which consists of a message passing algorithm(MPA) based multiuser detector and a soft-input soft-output(SISO) successive cancellation(SC) polar decoder. In particular, the SISO polar decoding process is realized by a specifically designed soft re-encoder, which is concatenated to the original SC decoder. This soft re-encoder is capable of reconstructing the soft information of the entire polar codeword based on previously detected log-likelihood ratios(LLRs) of information bits. Benefiting from the soft re-encoding algorithm, the resultant iterative detection strategy is able to obtain a salient coding gain. Our simulation results demonstrate that significant improvement in error performance is achieved by the proposed polar-coded SCMA in additive white Gaussian noise(AWGN) channels, where the performance of the conventional SISO belief propagation(BP) polar decoder aided SCMA, the turbo coded SCMA and the low-density parity-check(LDPC) coded SCMA are employed as benchmarks. 展开更多
关键词 iterative multiuser receiver polarcode sparse code multiple access (SCMA)
下载PDF
Design Framework of Unsourced Multiple Access for 6G Massive IoT
6
作者 Chunlin Yan Siying Lyu +2 位作者 Sen Wang Yuhong Huang Xiaodong Xu 《China Communications》 SCIE CSCD 2024年第1期1-12,共12页
In this paper,ambient IoT is used as a typical use case of massive connections for the sixth generation(6G)mobile communications where we derive the performance requirements to facilitate the evaluation of technical s... In this paper,ambient IoT is used as a typical use case of massive connections for the sixth generation(6G)mobile communications where we derive the performance requirements to facilitate the evaluation of technical solutions.A rather complete design of unsourced multiple access is proposed in which two key parts:a compressed sensing module for active user detection,and a sparse interleaver-division multiple access(SIDMA)module are simulated side by side on a same platform at balanced signal to noise ratio(SNR)operating points.With a proper combination of compressed sensing matrix,a convolutional encoder,receiver algorithms,the simulated performance results appear superior to the state-of-the-art benchmark,yet with relatively less complicated processing. 展开更多
关键词 channel coding compressed sensing massive Internet-of-Things(IoT) sparse interleaverdivision multiple access(SIDMA) the sixth generation(6G)mobile communications unsourced multiple access
下载PDF
A NEW SUFFICIENT CONDITION FOR SPARSE RECOVERY WITH MULTIPLE ORTHOGONAL LEAST SQUARES
7
作者 Haifeng LI Jing ZHANG 《Acta Mathematica Scientia》 SCIE CSCD 2022年第3期941-956,共16页
A greedy algorithm used for the recovery of sparse signals,multiple orthogonal least squares(MOLS)have recently attracted quite a big of attention.In this paper,we consider the number of iterations required for the MO... A greedy algorithm used for the recovery of sparse signals,multiple orthogonal least squares(MOLS)have recently attracted quite a big of attention.In this paper,we consider the number of iterations required for the MOLS algorithm for recovery of a K-sparse signal x∈R^(n).We show that MOLS provides stable reconstruction of all K-sparse signals x from y=Ax+w in|6K/ M|iterations when the matrix A satisfies the restricted isometry property(RIP)with isometry constantδ_(7K)≤0.094.Compared with the existing results,our sufficient condition is not related to the sparsity level K. 展开更多
关键词 sparse signal recovery multiple orthogonal least squares(MOLS) sufficient condition restricted isometry property(RIP)
下载PDF
Sparse Code Multiple Access-Towards Massive Connectivity and Low Latency 5G Communications 被引量:3
8
作者 Lei Wang Xiuqiang Xu +2 位作者 Yiqun Wu Shuangshuang Xing Yan Chen 《电信网技术》 2015年第5期6-15,共10页
Sparse code multiple access(SCMA) is a novel non-orthogonal multiple access technology considered as a key component in 5G air interface design. In SCMA, the incoming bits are directly mapped to multi-dimensional cons... Sparse code multiple access(SCMA) is a novel non-orthogonal multiple access technology considered as a key component in 5G air interface design. In SCMA, the incoming bits are directly mapped to multi-dimensional constellation vectors known as SCMA codewords, which are then mapped onto blocks of physical resource elements in a sparse manner. The number of codewords that can be non-orthogonally multiplexed in each SCMA block is much larger than the number of resource elements therein, so the system is overloaded and can support larger number of users. The joint optimization of multi-dimensional modulation and low density spreading in SCMA codebook design ensures the SCMA receiver to recover the coded bits with high reliability and low complexity. The flexibility in design and the robustness in performance further prove SCMA to be a promising technology to meet the 5G communication demands such as massive connectivity and low latency transmissions. 展开更多
关键词 SCMA 电信技术 多址接入 编码
下载PDF
Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems
9
作者 Sina Mortazavi Reza Hosseini 《Journal of Electrical Engineering》 2018年第2期124-128,共5页
We consider the problem of constructing one sparse signal from a few measurements. This problem has been extensively addressed in the literature, providing many sub-optimal methods that assure convergence to a locally... We consider the problem of constructing one sparse signal from a few measurements. This problem has been extensively addressed in the literature, providing many sub-optimal methods that assure convergence to a locally optimal solution under specific conditions. There are a few measurements associated with every signal, where the size of each measurement vector is less than the sparse signal's size. All of the sparse signals have the same unknown support. We generalize an existing algorithm for the recovery of one sparse signal from a single measurement to this problem and analyze its performances through simulations. We also compare the construction performance with other existing algorithms. Finally, the proposed method also shows advantages over the OMP (Orthogonal Matching Pursuit) algorithm in terms of the computational complexity. 展开更多
关键词 sparse signal recovery iterative methods multiple measurements
下载PDF
Sparse channel estimation for MIMO-OFDM systems using distributed compressed sensing 被引量:1
10
作者 刘翼 梅文博 +1 位作者 杜慧茜 汪宏宇 《Journal of Beijing Institute of Technology》 EI CAS 2016年第4期540-546,共7页
A sparse channel estimation method is proposed for doubly selective channels in multiple- input multiple-output ( MIMO ) orthogonal frequency division multiplexing ( OFDM ) systems. Based on the basis expansion mo... A sparse channel estimation method is proposed for doubly selective channels in multiple- input multiple-output ( MIMO ) orthogonal frequency division multiplexing ( OFDM ) systems. Based on the basis expansion model (BEM) of the channel, the joint-sparsity of MIMO-OFDM channels is described. The sparse characteristics enable us to cast the channel estimation as a distributed compressed sensing (DCS) problem. Then, a low complexity DCS-based estimation scheme is designed. Compared with the conventional compressed channel estimators based on the compressed sensing (CS) theory, the DCS-based method has an improved efficiency because it reconstructs the MIMO channels jointly rather than addresses them separately. Furthermore, the group-sparse structure of each single channel is also depicted. To effectively use this additional structure of the sparsity pattern, the DCS algorithm is modified. The modified algorithm can further enhance the estimation performance. Simulation results demonstrate the superiority of our method over fast fading channels in MIMO-OFDM systems. 展开更多
关键词 multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM distributed compressed sensing doubly selective channel group-sparse basis expansionmodel
下载PDF
Semi-Supervised Dimensionality Reduction of Hyperspectral Image Based on Sparse Multi-Manifold Learning
11
作者 Hong Huang Fulin Luo +1 位作者 Zezhong Ma Hailiang Feng 《Journal of Computer and Communications》 2015年第11期33-39,共7页
In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploit... In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images. 展开更多
关键词 HYPERSPECTRAL IMAGE Classification Dimensionality Reduction multiple MANIFOLDS Structure sparse REPRESENTATION SEMI-SUPERVISED Learning
下载PDF
Nonlinear industrial process fault diagnosis with latent label consistency and sparse Gaussian feature learning
12
作者 LI Xian-ling ZHANG Jian-feng +2 位作者 ZHAO Chun-hui DING Jin-liang SUN You-xian 《Journal of Central South University》 SCIE EI CAS CSCD 2022年第12期3956-3973,共18页
With the increasing complexity of industrial processes, the high-dimensional industrial data exhibit a strong nonlinearity, bringing considerable challenges to the fault diagnosis of industrial processes. To efficient... With the increasing complexity of industrial processes, the high-dimensional industrial data exhibit a strong nonlinearity, bringing considerable challenges to the fault diagnosis of industrial processes. To efficiently extract deep meaningful features that are crucial for fault diagnosis, a sparse Gaussian feature extractor(SGFE) is designed to learn a nonlinear mapping that projects the raw data into the feature space with the fault label dimension. The feature space is described by the one-hot encoding of the fault category label as an orthogonal basis. In this way, the deep sparse Gaussian features related to fault categories can be gradually learned from the raw data by SGFE. In the feature space,the sparse Gaussian(SG) loss function is designed to constrain the distribution of features to multiple sparse multivariate Gaussian distributions. The sparse Gaussian features are linearly separable in the feature space, which is conducive to improving the accuracy of the downstream fault classification task. The feasibility and practical utility of the proposed SGFE are verified by the handwritten digits MNIST benchmark and Tennessee-Eastman(TE) benchmark process,respectively. 展开更多
关键词 nonlinear fault diagnosis multiple multivariate Gaussian distributions sparse Gaussian feature learning Gaussian feature extractor
下载PDF
Low complexity MIMO sonar imaging using a virtual sparse linear array
13
作者 Xionghou Liu Chao Sun +2 位作者 Yixin Yang Jie Zhuo Yina Han 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2016年第2期370-378,共9页
A multiple-input multiple-output(MIMO) sonar can synthesize a large-aperture virtual uniform linear array(ULA) from a small number of physical elements. However, the large aperture is obtained at the cost of a gre... A multiple-input multiple-output(MIMO) sonar can synthesize a large-aperture virtual uniform linear array(ULA) from a small number of physical elements. However, the large aperture is obtained at the cost of a great number of matched filters with much heavy computation load. To reduce the computation load, a MIMO sonar imaging method using a virtual sparse linear array(SLA) is proposed, which contains the offline and online processing. In the offline processing, the virtual ULA of the MIMO sonar is thinned to a virtual SLA by the simulated annealing algorithm, and matched filters corresponding to inactive virtual elements are removed. In the online processing, outputs of matched filters corresponding to active elements are collected for further multibeam processing and hence, the number of matched filters in the echo processing procedure is effectively reduced. Numerical simulations show that the proposed method can reduce the computation load effectively while obtaining a similar imaging performance as the traditional method. 展开更多
关键词 multiple-input multiple-output(MIMO) sonar simulated annealing sonar imaging sparse arrays
下载PDF
基于MCS-SBL算法的配电网故障定位方法 被引量:1
14
作者 周群 刘梓琳 +2 位作者 冷敏瑞 印月 何川 《电力系统及其自动化学报》 CSCD 北大核心 2024年第3期30-38,共9页
配电网拓扑结构复杂,传统方法往往需要大量测点信息且难以实现快速有效的故障定位,本文提出基于少量测点信息的故障定位方法。首先,利用等效原理建立一个欠定的故障节点电压方程;其次,利用多重测量向量模型的贝叶斯压缩感知算法求解方程... 配电网拓扑结构复杂,传统方法往往需要大量测点信息且难以实现快速有效的故障定位,本文提出基于少量测点信息的故障定位方法。首先,利用等效原理建立一个欠定的故障节点电压方程;其次,利用多重测量向量模型的贝叶斯压缩感知算法求解方程,根据重构稀疏电流矩阵的非零元素位置求解故障区域,实现故障定位;最后,在IEEE33节点配电系统上进行仿真实验,结果表明,所提方法仅需要少量测点的故障前后正序电压分量便可有效定位故障,计算速度较快,并且基本不受故障类型、过渡电阻的影响,同时适用于单故障和多重故障的场景,具有较强的抗噪能力。 展开更多
关键词 配电网 故障定位 多重测量向量模型 稀疏电流 压缩感知
下载PDF
基于稀疏正则化的加权叠加集成多标签分类
15
作者 肖建芳 刘缅芳 《计算机应用与软件》 北大核心 2024年第5期286-297,共12页
为了充分挖掘成对标签的相关性以及分类器权重与分类器选择之间的关系,提出一种基于稀疏正则化的加权叠加集成多标签分类方法。提出一个稀疏正则化的加权叠加集成模型,以便于多标签分类器的选择和集成成员的构建。利用分类器权值和标签... 为了充分挖掘成对标签的相关性以及分类器权重与分类器选择之间的关系,提出一种基于稀疏正则化的加权叠加集成多标签分类方法。提出一个稀疏正则化的加权叠加集成模型,以便于多标签分类器的选择和集成成员的构建。利用分类器权值和标签相关性来提高分类性能。进一步提出基于加速近端梯度和块坐标下降技术的优化算法来有效地获得最优解。在多个数据集上的实验结果表明,该方法能够有效实现较高精度的多标签分类。 展开更多
关键词 多标签分类 相关性 稀疏正则化 权值
下载PDF
基于近似消息传递的NOMA系统信道和脉冲噪声联合估计方法
16
作者 李有明 马冲亚 +1 位作者 吴永宏 国强 《电信科学》 北大核心 2024年第9期44-53,共10页
针对非高斯脉冲噪声背景下的非正交多址接入(non-orthogonal multiple access,NOMA)系统的信道估计问题,利用信道和脉冲噪声的稀疏特性,提出一种基于近似消息传递的信道和脉冲噪声联合估计方法。首先构建全子载波的压缩感知方程,然后基... 针对非高斯脉冲噪声背景下的非正交多址接入(non-orthogonal multiple access,NOMA)系统的信道估计问题,利用信道和脉冲噪声的稀疏特性,提出一种基于近似消息传递的信道和脉冲噪声联合估计方法。首先构建全子载波的压缩感知方程,然后基于稀疏贝叶斯学习理论提出一种信道、脉冲噪声和数据符号的联合估计优化问题。为解决这一超参量非线性非凸问题,设计了一种基于高斯广义近似消息传递和稀疏贝叶斯学习理论的期望最大化实现算法。仿真结果表明,与基于期望最大化的稀疏贝叶斯学习方法相比,所提算法在信道和脉冲噪声估计的均方误差、误码率等方面性能虽略有下降,但算法复杂度降低了1个数量级。 展开更多
关键词 非正交多址接入 信道估计 脉冲噪声估计 稀疏贝叶斯学习 近似消息传递
下载PDF
TEB:GPU上矩阵分解重构的高效SpMV存储格式
17
作者 王宇华 张宇琪 +2 位作者 何俊飞 徐悦竹 崔环宇 《计算机科学与探索》 CSCD 北大核心 2024年第4期1094-1108,共15页
稀疏矩阵向量乘法(SpMV)是科学与工程领域中一个至关重要的计算过程,CSR(compressed sparse row)格式是最常用的稀疏矩阵存储格式之一,在图形处理器(GPU)平台上实现并行SpMV的过程中,其只存储稀疏矩阵的非零元,避免零元素填充所带来的... 稀疏矩阵向量乘法(SpMV)是科学与工程领域中一个至关重要的计算过程,CSR(compressed sparse row)格式是最常用的稀疏矩阵存储格式之一,在图形处理器(GPU)平台上实现并行SpMV的过程中,其只存储稀疏矩阵的非零元,避免零元素填充所带来的计算冗余,节约存储空间,但存在着负载不均衡的问题,浪费了计算资源。针对上述问题,对近年来效果良好的存储格式进行了研究,提出了一种逐行分解重组存储格式——TEB(threshold-exchangeorder block)格式。该格式采用启发式阈值选择算法确定合适分割阈值,并结合基于重排序的行归并算法,对稀疏矩阵进行重构分解,使得块与块之间非零元个数尽可能得相近,其次结合CUDA(computer unified device architecture)线程技术,提出了基于TEB存储格式的子块间并行SpMV算法,能够合理分配计算资源,解决负载不均衡问题,从而提高SpMV并行计算效率。为了验证TEB存储格式的有效性,在NVIDIA Tesla V100平台上进行实验,结果表明TEB相较于PBC(partition-block-CSR)、AMF-CSR(adaptive multi-row folding of CSR)、CSR-Scalar(compressed sparse row-scalar)和CSR5(compressed sparse row 5)存储格式,在SpMV的时间性能方面平均可提升3.23、5.83、2.33和2.21倍;在浮点计算性能方面,平均可提高3.36、5.95、2.29和2.13倍。 展开更多
关键词 稀疏矩阵向量乘法(SpMV) 重新排序 CSR格式 负载均衡 存储格式 图形处理器(GPU)
下载PDF
基于XGboost-DF的电力系统暂态稳定评估方法
18
作者 李楠 张家恒 《电测与仪表》 北大核心 2024年第10期119-127,共9页
针对现代互联电网扰动后失稳模式不再单一,多摆失稳频频发生的现象,文中提出一种基于极限梯度提升-深度森林的暂态稳定评估方法。利用母线电压轨迹簇构建人工特征集,通过极限梯度提升方法对特征集进行监督特征编码;利用深度森林对监督... 针对现代互联电网扰动后失稳模式不再单一,多摆失稳频频发生的现象,文中提出一种基于极限梯度提升-深度森林的暂态稳定评估方法。利用母线电压轨迹簇构建人工特征集,通过极限梯度提升方法对特征集进行监督特征编码;利用深度森林对监督编码后的稀疏矩阵进行三分类,进而建立起大规模数据集和失稳模式的映射关系;在IEEE 39节点和IEEE 140节点系统上进行仿真分析,所提方法具有很高的准确率和抗噪性能,能有效降低多摆失稳的误判率,并且在同步相量测量单元缺失情况下仍有较强的鲁棒性。 展开更多
关键词 暂态稳定评估 多摆失稳 极限梯度提升 深度森林 稀疏矩阵
下载PDF
NM-SpMM:面向国产异构向量处理器的半结构化稀疏矩阵乘算法
19
作者 姜晶菲 何源宏 +2 位作者 许金伟 许诗瑶 钱希福 《计算机工程与科学》 CSCD 北大核心 2024年第7期1141-1150,共10页
深度神经网络在自然语言处理、计算机视觉等领域取得了优异的成果,由于智能应用处理数据规模的增长和大模型的快速发展,对深度神经网络的推理性能要求越来越高,N∶M半结构化稀疏化技术成为平衡算力需求和应用效果的热点技术之一。国产... 深度神经网络在自然语言处理、计算机视觉等领域取得了优异的成果,由于智能应用处理数据规模的增长和大模型的快速发展,对深度神经网络的推理性能要求越来越高,N∶M半结构化稀疏化技术成为平衡算力需求和应用效果的热点技术之一。国产异构向量处理器FT-M7032为智能模型处理中的数据并行和指令并行开发提供了较大空间。针对N∶M半结构化稀疏模型计算稀疏模式多样性,提出了一种面向FT-M7032的可灵活配置的稀疏矩阵乘算法NM-SpMM。NM-SpMM设计了一种高效的压缩偏移地址稀疏编码格式COA,避免了半结构化参数配置对稀疏数据访存计算的影响。基于COA编码,NM-SpMM对不同维度稀疏矩阵计算进行了细粒度优化。在FT-M7032单核上的实验结果表明,相较于稠密矩阵乘,NM-SpMM能获得1.73~21.00倍的加速,相较于采用CuSPARSE稀疏计算库的NVIDIA V100 GPU,能获得0.04~1.04倍的加速。 展开更多
关键词 深度神经网络 图形处理器 向量处理器 稀疏矩阵乘 流水线
下载PDF
时空图卷积网络的骨架识别硬件加速器设计
20
作者 谭会生 严舒琪 杨威 《电子测量技术》 北大核心 2024年第11期36-43,共8页
随着人工智能技术的不断发展,神经网络的数据规模逐渐扩大,神经网络的计算量也迅速攀升。为了减少时空图卷积神经网络的计算量,降低硬件实现的资源消耗,提升人体骨架识别时空图卷积神经网络(ST-GCN)实际应用系统的处理速度,利用现场可... 随着人工智能技术的不断发展,神经网络的数据规模逐渐扩大,神经网络的计算量也迅速攀升。为了减少时空图卷积神经网络的计算量,降低硬件实现的资源消耗,提升人体骨架识别时空图卷积神经网络(ST-GCN)实际应用系统的处理速度,利用现场可编程门阵列(FPGA),设计开发了一个基于时空图卷积神经网络的骨架识别硬件加速器。通过对原网络模型进行结构优化与数据量化,减少了FPGA实现约75%的计算量;利用邻接矩阵稀疏性的特点,提出了一种稀疏性矩阵乘加运算的优化方法,减少了约60%的乘法器资源消耗。经过对人体骨架识别实验验证,结果表明,在时钟频率100 MHz下,相较于CPU,FPGA加速ST-GCN单元,加速比达到30.53;FPGA加速人体骨架识别,加速比达到6.86。 展开更多
关键词 人体骨架识别 时空图卷积神经网络(ST-GCN) 硬件加速器 现场可编程门阵列(FPGA) 稀疏矩阵乘加运算硬件优化
下载PDF
上一页 1 2 21 下一页 到第
使用帮助 返回顶部