期刊文献+
共找到27篇文章
< 1 2 >
每页显示 20 50 100
Neutronic calculations of the China dual-functional lithium–lead test blanket module with the parallel discrete ordinates code Hydra 被引量:2
1
作者 Guang-Chun Zhang Jie Liu +2 位作者 Liang-Zhi Cao Hong-Chun Wu Xian-Bao Yuan 《Nuclear Science and Techniques》 SCIE CAS CSCD 2020年第8期1-12,共12页
The China dual-functional lithium–lead test blanket module(DFLL-TBM) is a liquid Li Pb blanket concept developed by the Institute of Nuclear Energy Safety Technology of the Chinese Academy of Sciences for testing in ... The China dual-functional lithium–lead test blanket module(DFLL-TBM) is a liquid Li Pb blanket concept developed by the Institute of Nuclear Energy Safety Technology of the Chinese Academy of Sciences for testing in ITER to validate relevant tritium breeding and shielding technologies. In this study, neutronic calculations of DFLL-TBM were carried out using a massively parallel three-dimensional transport code, Hydra, with the Fusion Evaluated Nuclear Data Library/MG. Hydra was developed by the Nuclear Engineering Computational Physics Lab based on the discrete ordinates method and has been devoted to neutronic analysis and shielding evaluation for nuclear facilities. An in-house Monte Carlo code(MCX) was employed to verify the discretized calculation model used by Hydra for the DFLL-TBM calculations. The results showed two key aspects:(1) In most material zones,Hydra solutions are in good agreement with the reference MCX results within 1%, and the maximal relative difference of the neutron flux is merely 3%, demonstrating the correctness of the calculation model;(2) while the current DFLL-TBM design meets the operation shielding requirement of ITER for 4 years, it does not satisfy the tritium self-sufficiency requirement. Compared to the two-step approach, Hydra produces higher accuracies as it does not rely on the homogenization technique during the calculation process. The parallel efficiency tests of Hydra using the DFLL-TBM model also showed that this code maintains a high parallel efficiency on O(100) processors and, as a result, is able to significantly improve computing performance through parallelization. Parameter studies have been carried out by varying the thickness of the beryllium armor layer and the tritium breeding zone to understand the influence of the beryllium layer and breeding zone thickness on tritium breeding performance. This establishes a foundation for further improvement in the tritium production performance of DFLL-TBM. 展开更多
关键词 Discrete ordinates method DFLL-TBM Neutronic analysis Tritium breeding performance
下载PDF
GPU acceleration of subgraph isomorphism search in large scale graph 被引量:1
2
作者 杨博 卢凯 +2 位作者 高颖慧 王小平 徐凯 《Journal of Central South University》 SCIE EI CAS CSCD 2015年第6期2238-2249,共12页
A novel framework for parallel subgraph isomorphism on GPUs is proposed, named GPUSI, which consists of GPU region exploration and GPU subgraph matching. The GPUSI iteratively enumerates subgraph instances and solves ... A novel framework for parallel subgraph isomorphism on GPUs is proposed, named GPUSI, which consists of GPU region exploration and GPU subgraph matching. The GPUSI iteratively enumerates subgraph instances and solves the subgraph isomorphism in a divide-and-conquer fashion. The framework completely relies on the graph traversal, and avoids the explicit join operation. Moreover, in order to improve its performance, a task-queue based method and the virtual-CSR graph structure are used to balance the workload among warps, and warp-centric programming model is used to balance the workload among threads in a warp. The prototype of GPUSI is implemented, and comprehensive experiments of various graph isomorphism operations are carried on diverse large graphs. The experiments clearly demonstrate that GPUSI has good scalability and can achieve speed-up of 1.4–2.6 compared to the state-of-the-art solutions. 展开更多
关键词 图形结构 GPU 图同构 搜索 综合实验 同构问题 区域勘探 分而治之
下载PDF
Scalability of 3D deterministic particle transport on the Intel MIC architecture 被引量:1
3
作者 王庆林 刘杰 +1 位作者 龚春叶 邢座程 《Nuclear Science and Techniques》 SCIE CAS CSCD 2015年第5期88-97,共10页
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer ar... The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In this paper, we present a scalability investigation of one energy group time-independent deterministic discrete ordinates neutron transport in 3D Cartesian geometry(Sweep3D) on Intel's Many Integrated Core(MIC) architecture, which can provide up to 62 cores with four hardware threads per core now and will own up to 72 in the future. The parallel programming model, Open MP, and vector intrinsic functions are used to exploit thread parallelism and vector parallelism for the discrete ordinates method, respectively. The results on a 57-core MIC coprocessor show that the implementation of Sweep3 D on MIC has good scalability in performance. In addition, the application of the Roofline model to assess the implementation and performance comparison between MIC and Tesla K20 C Graphics Processing Unit(GPU) are also reported. 展开更多
关键词 计算机体系结构 可扩展性 粒子输运 三维几何 英特尔 麦克风 离散坐标法 计算性能
下载PDF
Fast garment simulation with aid of hybrid bones
4
作者 吴博 陈寅 +2 位作者 徐凯 程志全 熊岳山 《Journal of Central South University》 SCIE EI CAS CSCD 2015年第6期2218-2226,共9页
A data-driven method was proposed to realistically animate garments on human poses in reduced space. Firstly, a gradient based method was extended to generate motion sequences and garments were simulated on the sequen... A data-driven method was proposed to realistically animate garments on human poses in reduced space. Firstly, a gradient based method was extended to generate motion sequences and garments were simulated on the sequences as our training data. Based on the examples, the proposed method can fast output realistic garments on new poses. Our framework can be mainly divided into offline phase and online phase. During the offline phase, based on linear blend skinning(LBS), rigid bones and flex bones were estimated for human bodies and garments, respectively. Then, rigid bone weight maps on garment vertices were learned from examples. In the online phase, new human poses were treated as input to estimate rigid bone transformations. Then, both rigid bones and flex bones were used to drive garments to fit the new poses. Finally, a novel formulation was also proposed to efficiently deal with garment-body penetration. Experiments manifest that our method is fast and accurate. The intersection artifacts are fast removed and final garment results are quite realistic. 展开更多
关键词 服装模拟 混合 数据驱动 运动序列 基于实例 输入估计 训练数据 FLEX
下载PDF
Improving vertex-frontier based GPU breadth-first search
5
作者 杨博 卢凯 +3 位作者 高颖慧 徐凯 王小平 程志权 《Journal of Central South University》 SCIE EI CAS 2014年第10期3828-3836,共9页
Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effecti... Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s. 展开更多
关键词 广度优先搜索 GPU 顶点 NVIDIA TESLA 图形处理 BFS 负载平衡
下载PDF
Experimental verification of the parasitic bipolar amplification effect in PMOS single event transients
6
作者 何益百 陈书明 《Chinese Physics B》 SCIE EI CAS CSCD 2014年第7期775-779,共5页
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event cha... The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section. 展开更多
关键词 single event effect single event transient parasitic bipolar amplification heavy ion experiments
下载PDF
Implementation of ternary Shor's algorithm based on vibrational states of an ion in anharmonic potential
7
作者 刘威 陈书明 +3 位作者 张见 吴春旺 吴伟 陈平形 《Chinese Physics B》 SCIE EI CAS CSCD 2015年第3期157-165,共9页
It is widely believed that Shor's factoring algorithm provides a driving force to boost the quantum computing research.However, a serious obstacle to its binary implementation is the large number of quantum gates. No... It is widely believed that Shor's factoring algorithm provides a driving force to boost the quantum computing research.However, a serious obstacle to its binary implementation is the large number of quantum gates. Non-binary quantum computing is an efficient way to reduce the required number of elemental gates. Here, we propose optimization schemes for Shor's algorithm implementation and take a ternary version for factorizing 21 as an example. The optimized factorization is achieved by a two-qutrit quantum circuit, which consists of only two single qutrit gates and one ternary controlled-NOT gate. This two-qutrit quantum circuit is then encoded into the nine lower vibrational states of an ion trapped in a weakly anharmonic potential. Optimal control theory(OCT) is employed to derive the manipulation electric field for transferring the encoded states. The ternary Shor's algorithm can be implemented in one single step. Numerical simulation results show that the accuracy of the state transformations is about 0.9919. 展开更多
关键词 ternary Shor's algorithm anharmonic ion trapping optimal control theory vibrational state
下载PDF
A regularized magnetotelluric inversion with a minimum support gradient constraint
8
作者 Junjun Zhou Xiangyun Hu Tiaojie Xiao 《Earthquake Science》 2020年第3期130-140,共11页
Magnetotelluric(MT)inversion is an illposed problem and the standard way to address it is through regularization,by adding a stabilizing functional to the data objective functional in order to obtain a stable solution... Magnetotelluric(MT)inversion is an illposed problem and the standard way to address it is through regularization,by adding a stabilizing functional to the data objective functional in order to obtain a stable solution.The traditional stabilizing functionals,in which a low-order differential operator is used,yield a smooth solution that may not be appropriate when anomalies occur in block patterns.In some cases the focused imaging of a sharp electrical boundary is necessary.Even though various experiments have used stabilizing functionals that are suitable to obtain a clear and sharp boundary,such as the minimum support(MS)and the minimum gradient support(MGS)functionals,there are still some limitations in practice.In this paper,the minimum support gradient(MSG)is proposed as the stabilizing functional.Under the uniform regularization framework,a regularized inversion with a variety of stabilizing functionals is performed and the inversion results are compared.This study shows that MSG inversion can not only obtain a clearly focused inversion but also a quite stable and robust one. 展开更多
关键词 MAGNETOTELLURIC focus inversion sharp boundary regulafization
下载PDF
DIPP—An LLC Replacement Policy for On-chip Dynamic Heterogeneous Multi-core Architecture
9
作者 Zhang Yang Xing Zuocheng Ma Xiao 《国际计算机前沿大会会议论文集》 2015年第1期112-113,共2页
As the big data era is coming, it brings new challenges to the massive data processing. A combination of GPU and CPU on chip is the trend to release the pressure of large scale computing. We found that there are diffe... As the big data era is coming, it brings new challenges to the massive data processing. A combination of GPU and CPU on chip is the trend to release the pressure of large scale computing. We found that there are different memory access characteristics between GPU and CPU. The most important one is that the programs of GPU include a large number of threads, which lead to higher access frequency in cache than the CPU programs. Although the LRU policy favors the programs with high memory access frequency, the programs of GPU can’t get the corresponding performance boost even more cache resources are provided. So LRU policy is not suitable for heterogeneous multi-core processor. Based on the different characteristics of GPU and CPU programs on memory access, this paper proposes an LLC dynamic replacement policy--DIPP (Dynamic Insertion / Promotion Policy) for heterogeneous multi-core processors.The core idea of the replacement policy is to reduce the miss rate of the program and enhance the overall system performance by limiting the cache resources that GPU can acquire and reducing the thread interferences between programs. Experiments compare the DIPP replacement policy with LRU and we conduct a classified discussion according to the program results of GPU. Friendly programs enhance 23.29% on the average performance (using arithmetic mean).Large working sets programs can improve 13.95%, compute-intensive programs enhance 9.66% and stream class programs improve 3.8%. 展开更多
关键词 BIG data HETEROGENEOUS MULTICORE REPLACEMENT Policy DIPP.
下载PDF
科学研究的第五范式——以智能驱动的材料设计为例 被引量:4
10
作者 Can Leng Zhuo Tang +5 位作者 Yi-Ge Zhou Zean Tian Wei-Qing Huang Jie Liu Keqin Li Kenli Li 《Engineering》 SCIE EI CAS CSCD 2023年第5期126-137,I0003,I0004,共14页
科学正在进入一个新时代——第五范式——它被认为是知识整合到不同领域的主要特征,是基于无所不在的机器学习系统的计算社区中智能驱动的工作。在此,我们通过在天河一号超级计算机系统上构建的催化材料专门设计的典型平台案例,生动地... 科学正在进入一个新时代——第五范式——它被认为是知识整合到不同领域的主要特征,是基于无所不在的机器学习系统的计算社区中智能驱动的工作。在此,我们通过在天河一号超级计算机系统上构建的催化材料专门设计的典型平台案例,生动地阐明了第五范式的本质,旨在促进第五范式在其他领域的培养。第五范式平台主要包括模型自动构建(原始数据提取)、指纹自动构建(神经网络特征选择)以及跨学科知识串联的重复迭代(“火山图”)。与分解一起进行的是对迭代中实现的体系结构的性能评估。通过讨论,第五范式的智能驱动平台可以极大地简化和改进研究中极其繁琐和具有挑战性的工作,并通过补偿机器学习中缺少样本和替代一些由于计算资源不足而导致的数值计算来实现数值计算与机器学习之间的相互反馈,从而加速探索过程。在数据驱动的学科中,跨学科专家的协同作用和对动态数据需求的急剧增长仍然是一个挑战。我们相信,对第五范式平台的一瞥可以为其在其他领域的应用铺平道路。 展开更多
关键词 机器学习 自动构建 天河一号 数据驱动 神经网络 动态数据 知识整合 科学研究
下载PDF
MilkyWay-2 supercomputer: system and application 被引量:34
11
作者 Xiangke LIAO Liquan XIAO +1 位作者 Canqun YANG Yutong LU 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第3期345-356,共12页
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design... On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system. 展开更多
关键词 MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation
原文传递
FAAD:an unsupervised fast and accurate anomaly detection method for a multi-dimensional sequence over data stream 被引量:1
12
作者 Bin LI Yi-jie WANG +2 位作者 Dong-sheng YANG Yong-mou LI Xing-kong MA 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2019年第3期388-404,共17页
Recently, sequence anomaly detection has been widely used in many fields. Sequence data in these fields are usually multi-dimensional over the data stream. It is a challenge to design an anomaly detection method for a... Recently, sequence anomaly detection has been widely used in many fields. Sequence data in these fields are usually multi-dimensional over the data stream. It is a challenge to design an anomaly detection method for a multi-dimensional sequence over the data stream to satisfy the requirements of accuracy and high speed. It is because:(1) Redundant dimensions in sequence data and large state space lead to a poor ability for sequence modeling;(2) Anomaly detection cannot adapt to the high-speed nature of the data stream, especially when concept drift occurs, and it will reduce the detection rate. On one hand, most existing methods of sequence anomaly detection focus on the single-dimension sequence. On the other hand, some studies concerning multi-dimensional sequence concentrate mainly on the static database rather than the data stream. To improve the performance of anomaly detection for a multi-dimensional sequence over the data stream, we propose a novel unsupervised fast and accurate anomaly detection(FAAD) method which includes three algorithms. First, a method called "information calculation and minimum spanning tree cluster" is adopted to reduce redundant dimensions. Second, to speed up model construction and ensure the detection rate for the sequence over the data stream, we propose a method called"random sampling and subsequence partitioning based on the index probabilistic suffix tree." Last, the method called "anomaly buffer based on model dynamic adjustment" dramatically reduces the effects of concept drift in the data stream. FAAD is implemented on the streaming platform Storm to detect multi-dimensional log audit data.Compared with the existing anomaly detection methods, FAAD has a good performance in detection rate and speed without being affected by concept drift. 展开更多
关键词 Data STREAM MULTI-DIMENSIONAL SEQUENCE ANOMALY detection Concept DRIFT Feature selection
原文传递
Detailed and clock-driven simulation for HPC interconnection network
13
作者 Wenhao ZHOU Juan CHEN +3 位作者 Chen CUI Qian WANG Dezun DONG Yuhua TANG 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第5期797-811,共15页
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form... Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router's on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses. 展开更多
关键词 high performance computing clock-driven sim-ulation interconnection network BookSim
原文传递
An Efficient and Flexible Deterministic Framework for Multithreaded Programs 被引量:1
14
作者 卢凯 周旭 +2 位作者 王小平 Tom Bergan 陈沉 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第1期42-56,共15页
Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, thes... Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system. 展开更多
关键词 DETERMINISM MULTITHREADING FRAMEWORK FLEXIBLE
原文传递
High Performance Interconnect Network for Tianhe System 被引量:19
15
作者 廖湘科 庞征 +5 位作者 王克非 卢宇彤 谢旻 夏军 董德尊 所光 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第2期259-272,共14页
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features e... In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented. 展开更多
关键词 Tianhe-2 supercomputer interconnect network router architecture network interface architecture user-level message passing
原文传递
The TH Express high performance interconnect networks 被引量:15
16
作者 Zhengbin PANG Min XIE +4 位作者 Jun ZHANG Yi ZHENG Guibin WANG Dezun DONG Guang SUO 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第3期357-366,共10页
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interpr... Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface. 展开更多
关键词 HPC network interface chip (NIC) TH Express nterconnect offload collective operation
原文传递
OHTMA:an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype 被引量:2
17
作者 Yi-shui LI Xin-hai CHEN +5 位作者 Jie LIU Bo YANG Chun-ye GONG Xin-biao GAN Sheng-guo LI Han XU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第6期939-949,共11页
With the rapid increase of the size of applications and the complexity of the supercomputer architecture,topology-aware process mapping becomes increasingly important.High communication cost has become a dominant cons... With the rapid increase of the size of applications and the complexity of the supercomputer architecture,topology-aware process mapping becomes increasingly important.High communication cost has become a dominant constraint of the performance of applications running on the supercomputer.To avoid a bad mapping strategy which can lead to terrible communication performance,we propose an optimized heuristic topology-aware mapping algorithm(OHTMA).The algorithm attempts to minimize the hop-byte metric that we use to measure the mapping results.OHTMA incorporates a new greedy heuristic method and pair-exchange-based optimization.It reduces the number of long-distance communications and effectively enhances the locality of the communication.Experimental results on the Tianhe-3 exascale supercomputer prototype indicate that OHTMA can significantly reduce the communication costs. 展开更多
关键词 High-performance computing Topology mapping Heuristic algorithm
原文传递
Iaso: an autonomous fault-tolerant management system for supercomputers 被引量:1
18
作者 Kai LU Xiaoping WANG +6 位作者 Gen LI Ruibo WANG Wanqing CHI Yongpeng LIU Hongwei TANG Hua FENG Yinghui GAO 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第3期378-390,共13页
With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the us... With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the usability of supercomputers. This issue is referred to as the "reliability wall", which is regarded as a critical problem for current and future supercomputers. To address this problem, we propose an autonomous fault-tolerant system, named Iaso, in MilkyWay- 2 system. Iaso introduces the concept of autonomous management in supercomputers. By autonomous management, the computer itself, rather than manpower, takes charge of the fault management work. Iaso automatically manage the whole lifecycle of faults, including fault detection, fault diagnosis, fault isolation, and task recovery. Iaso endows the autonomous features with MilkyWay-2 system, such as self-awareness, self-diagnosis, self-healing, and self-protection. With the help of Iaso, the cost of fault handling in supercomputers reduces from several hours to a few seconds. Iaso greatly improves the usability and reliability of MilkyWay-2 system. 展开更多
关键词 SUPERCOMPUTER autonomous management fault tolerant fault management MilkyWay-2 system
原文传递
Merge-Weighted Dynamic Time Warping for Speech Recognition 被引量:1
19
作者 张湘莉兰 骆志刚 李明 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第6期1072-1082,共11页
Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy... Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy, language- independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a convenient option to solve tile problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small-footprint SD ASR for real-time applications with limited storage and small vocabularies. These applications include voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. However, traditional DTW has several lhnitations, such as high computational complexity, constraint induced coarse approximation, and inaccuracy problems. In this paper, we introduce the merge-weighted dynamic time warping (MWDTW) algorithm. This method defines a template confidence index for measuring the similarity between merged training data and testing data, while following the core DTW process. MWDTW is simple, efficient, and easy to implement. With extensive experiments on three representative SD speech recognition datasets, we demonstrate that our method outperforms DTW, DTW on merged speech data, the hidden Markov model (HMM) significantly, and is also six times faster than DTW overall. 展开更多
关键词 merge-weighted dynamic time warping natural language processing speech recognition and synthesis tem-plate confidence index
原文传递
Transfer learning for deep neural network-based partial differential equations solving 被引量:1
20
作者 Xinhai Chen Chunye Gong +5 位作者 Qian Wan Liang Deng Yunbo Wan Yang Liu Bo Chen Jie Liu 《Advances in Aerodynamics》 2021年第1期635-648,共14页
Deep neural networks(DNNs)have recently shown great potential in solving partial differential equations(PDEs).The success of neural network-based surrogate models is attributed to their ability to learn a rich set of ... Deep neural networks(DNNs)have recently shown great potential in solving partial differential equations(PDEs).The success of neural network-based surrogate models is attributed to their ability to learn a rich set of solution-related features.However,learning DNNs usually involves tedious training iterations to converge and requires a very large number of training data,which hinders the application of these models to complex physical contexts.To address this problem,we propose to apply the transfer learning approach to DNN-based PDE solving tasks.In our work,we create pairs of transfer experiments on Helmholtz and Navier-Stokes equations by constructing subtasks with different source terms and Reynolds numbers.We also conduct a series of experiments to investigate the degree of generality of the features between different equations.Our results demonstrate that despite differences in underlying PDE systems,the transfer methodology can lead to a significant improvement in the accuracy of the predicted solutions and achieve a maximum performance boost of 97.3%on widely used surrogate models. 展开更多
关键词 Deep neural network Partial differential equation Surrogate model Transfer learning
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部