摘要
对机器学习领域内非线性机器学习中的异或逻辑问题进行了深入探讨和分析,并阐述了该算法相应的并行实现方法。之后,通过在主流的Nvidia GPU Kepler架构上进行实际测试以及性能分析工具的使用,确定了该类机器学习算法的主要性能瓶颈。在此基础上,对该算法的最主要的性能瓶颈仿函数进行了优化。从数学理论上推导出了仿函数等价的变换公式并给出了新的计算模式。运用新的计算方法可以大幅度的减少关键路径上的计算量,最终得到了3.5倍的性能提高。
The machine learning algorithm of nonlinear XOR method is explored and analyzed in this paper and then we present the related parallel implementations.After that,we identified the performance bottleneck by testing on the popular hardware of NVIDIA's GPU of Kepler architecture and applying performance analysis tools to this kind of machine learning algorithm.Based on analysis results,we optimized the kernel function which also is the major performance bottleneck.Furthermore,the new mathematic formula and computational model are developed so that it can reduce lots of computations in critical path compared with original algorithm.Finally,more than 3.5X speedup are gained by using our proposal.
出处
《电子测量技术》
2014年第3期47-50,共4页
Electronic Measurement Technology