The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available ...The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel®Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel®Xeon PhiTM coprocessor.展开更多
This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering a...This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering algorithm like METIS and putting columns/rows for which the Schur complement is needed into the top node of the elimination tree. Any problem with a degenerate part of the initial matrix can be resolved with the help of iterative refinement. The proposed approach is close to the “multifrontal” one which was implemented by Ian Duff and others in 1980s. Schur complement computations described in this paper are available in Intel®Math Kernel Library (Intel®MKL). In this paper we present the algorithm for Schur complement computations, experiments that demonstrate a negligible increase in the number of elements in the factored matrix, and comparison with existing alternatives.展开更多
With the increasing importance of cloud services worldwide, the cloud infrastructure and platform management has become critical for cloud service providers. In this paper, a novel architecture of intelligent server m...With the increasing importance of cloud services worldwide, the cloud infrastructure and platform management has become critical for cloud service providers. In this paper, a novel architecture of intelligent server management framework is proposed. In this framework, the communication layer is based on the Extensible Messaging and Presence Protocol (XMPP), which was developed for instant messaging and has been proven to be highly mature and suitable for mobile and large scalable deployment due to its extensibility and efficiency. The proposed architecture can simplify server management and increase flexibility and scalability when managing hundreds of thousands of servers in the cloud era.展开更多
Artificial intelligence(AI)algorithms achieve outstanding results in many applicationdomains such as computer vision and natural language processing The performance ofAl models is the outcome of complex and costly mod...Artificial intelligence(AI)algorithms achieve outstanding results in many applicationdomains such as computer vision and natural language processing The performance ofAl models is the outcome of complex and costly model architecture design and trainingprocesses.Hence,it is paramount for model owners to protect their AI models frompiracy-model cloning,illegitimate distribution and use.IP protection mechanisms havebeen applied to Al models,and in particular to deep neural networks,to verify themodel ownership.State-of-the-art AI model ownership protection techniques have beensurveyed.The pros and cons of Al model ownership protection have been reported.The majonity of previous works are focused on watermarking,while more advancedmethods such fingerprinting and attestation are promising but not yet explored indepth.This study has been concluded by discussing possible research directions in thearea.展开更多
基于ABAQUS软件构建激光金属沉积(laser metal deposition,LMD)316L不锈钢增材制造的有限元模型,采用生死单元和双椭球移动热源结合的方式进行数值模拟,研究单道单层LMD过程中的温度场以及不同工艺参数对温度场、不同区域特征点温度梯...基于ABAQUS软件构建激光金属沉积(laser metal deposition,LMD)316L不锈钢增材制造的有限元模型,采用生死单元和双椭球移动热源结合的方式进行数值模拟,研究单道单层LMD过程中的温度场以及不同工艺参数对温度场、不同区域特征点温度梯度的影响。进一步探究了单道多层LMD过程中熔池的温度变化和各层之间的热循环规律。设计相关实验,验证数值模拟结果。结果表明:降低扫描速度或提高激光功率,熔池作用范围会变大。激光功率对温度梯度影响更大,尤其是纵向温度梯度。随着LMD层数的增加,且由于往复扫描的工艺路径,温度梯度显著增加,因此成形零件容易发生弯曲变形。单道多层薄壁件的整体形貌进一步说明了LMD温度梯度模拟的准确性。展开更多
提出并实现了一个本地轻量化课程教学智能辅助系统.该系统利用IPEX-LLM(Intel PyTorch extention for large language model)加速库,在计算资源受限的设备上高效部署并运行经过QLoRA(quantum-logic optimized resource allocation)框架...提出并实现了一个本地轻量化课程教学智能辅助系统.该系统利用IPEX-LLM(Intel PyTorch extention for large language model)加速库,在计算资源受限的设备上高效部署并运行经过QLoRA(quantum-logic optimized resource allocation)框架微调的大语言模型,并结合增强检索技术,实现了智能问答、智能出题、教学大纲生成、教学演示文档生成等4个主要功能模块的课程灵活定制,在帮助教师提高教学备课和授课的质量与效率、保护数据隐私的同时,支撑学生个性化学习并提供实时反馈.在性能实验中,以集成优化后的Chatglm3-6B模型为例,该系统处理64-token输出任务时仅需4.08 s,验证了其在资源受限环境下快速推理的能力.在实践案例分析中,通过与原生Chatgml-6B和ChatGPT4.0在功能实现上的对比,进一步表明了该系统具备优越的准确性和实用性.展开更多
文摘The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel®Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel®Xeon PhiTM coprocessor.
文摘This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering algorithm like METIS and putting columns/rows for which the Schur complement is needed into the top node of the elimination tree. Any problem with a degenerate part of the initial matrix can be resolved with the help of iterative refinement. The proposed approach is close to the “multifrontal” one which was implemented by Ian Duff and others in 1980s. Schur complement computations described in this paper are available in Intel®Math Kernel Library (Intel®MKL). In this paper we present the algorithm for Schur complement computations, experiments that demonstrate a negligible increase in the number of elements in the factored matrix, and comparison with existing alternatives.
文摘With the increasing importance of cloud services worldwide, the cloud infrastructure and platform management has become critical for cloud service providers. In this paper, a novel architecture of intelligent server management framework is proposed. In this framework, the communication layer is based on the Extensible Messaging and Presence Protocol (XMPP), which was developed for instant messaging and has been proven to be highly mature and suitable for mobile and large scalable deployment due to its extensibility and efficiency. The proposed architecture can simplify server management and increase flexibility and scalability when managing hundreds of thousands of servers in the cloud era.
基金supported by the European Union Horizon 2020 research and innovation program under CPSoSAware project(grant no.871738)by Science Foundation Ireland,grant no.12/RC/2289-P2,Insight Centre for Data Analytics。
文摘Artificial intelligence(AI)algorithms achieve outstanding results in many applicationdomains such as computer vision and natural language processing The performance ofAl models is the outcome of complex and costly model architecture design and trainingprocesses.Hence,it is paramount for model owners to protect their AI models frompiracy-model cloning,illegitimate distribution and use.IP protection mechanisms havebeen applied to Al models,and in particular to deep neural networks,to verify themodel ownership.State-of-the-art AI model ownership protection techniques have beensurveyed.The pros and cons of Al model ownership protection have been reported.The majonity of previous works are focused on watermarking,while more advancedmethods such fingerprinting and attestation are promising but not yet explored indepth.This study has been concluded by discussing possible research directions in thearea.
文摘基于ABAQUS软件构建激光金属沉积(laser metal deposition,LMD)316L不锈钢增材制造的有限元模型,采用生死单元和双椭球移动热源结合的方式进行数值模拟,研究单道单层LMD过程中的温度场以及不同工艺参数对温度场、不同区域特征点温度梯度的影响。进一步探究了单道多层LMD过程中熔池的温度变化和各层之间的热循环规律。设计相关实验,验证数值模拟结果。结果表明:降低扫描速度或提高激光功率,熔池作用范围会变大。激光功率对温度梯度影响更大,尤其是纵向温度梯度。随着LMD层数的增加,且由于往复扫描的工艺路径,温度梯度显著增加,因此成形零件容易发生弯曲变形。单道多层薄壁件的整体形貌进一步说明了LMD温度梯度模拟的准确性。
文摘提出并实现了一个本地轻量化课程教学智能辅助系统.该系统利用IPEX-LLM(Intel PyTorch extention for large language model)加速库,在计算资源受限的设备上高效部署并运行经过QLoRA(quantum-logic optimized resource allocation)框架微调的大语言模型,并结合增强检索技术,实现了智能问答、智能出题、教学大纲生成、教学演示文档生成等4个主要功能模块的课程灵活定制,在帮助教师提高教学备课和授课的质量与效率、保护数据隐私的同时,支撑学生个性化学习并提供实时反馈.在性能实验中,以集成优化后的Chatglm3-6B模型为例,该系统处理64-token输出任务时仅需4.08 s,验证了其在资源受限环境下快速推理的能力.在实践案例分析中,通过与原生Chatgml-6B和ChatGPT4.0在功能实现上的对比,进一步表明了该系统具备优越的准确性和实用性.