期刊文献+
共找到240篇文章
< 1 2 12 >
每页显示 20 50 100
Research of NS dataflow mechanism and its analyzer implementation
1
作者 金烨 樊隽 《Journal of Southeast University(English Edition)》 EI CAS 2003年第1期44-48,共5页
This paper analyzes the main elements in NS network simulator, makes adetailed view of dataflow management in a link, a node, and an agent, respectively, and introducesthe information described by its trace file. Base... This paper analyzes the main elements in NS network simulator, makes adetailed view of dataflow management in a link, a node, and an agent, respectively, and introducesthe information described by its trace file. Based on the analysis of transportation and treatmentof different packets in NS, a dataflow state machine is proposed with its states exchange triggeringevents and a dataflow analyzer is designed and implemented according to it. As the machine statefunctions, the analyzer can make statistic of total transportation flux of a specified dataflow andoffer a general fluctuation diagram. Finally, a concrete example is used to test its performance. 展开更多
关键词 network simulation NS simulator dataflow analysis state machine
下载PDF
面向Dataflow的异构集群混合式资源调度框架研究 被引量:3
2
作者 汤小春 赵全 +4 位作者 符莹 朱紫钰 丁朝 胡小雪 李战怀 《软件学报》 EI CSCD 北大核心 2022年第12期4704-4726,共23页
Dataflow模型的使用,使得大数据计算的批处理和流处理融合为一体.但是,现有的针对大数据计算的集群资源调度框架,要么面向流处理,要么面向批处理,不适合批处理与流处理作业共享集群资源的需求.另外,GPU用于大数据分析计算时,由于缺乏有... Dataflow模型的使用,使得大数据计算的批处理和流处理融合为一体.但是,现有的针对大数据计算的集群资源调度框架,要么面向流处理,要么面向批处理,不适合批处理与流处理作业共享集群资源的需求.另外,GPU用于大数据分析计算时,由于缺乏有效的CPU-GPU资源解耦方式,降低了资源使用效率.在分析现有的集群资源调度框架的基础上,设计并实现了一种可以感知批处理/流处理应用的混合式资源调度框架HRM.它以共享状态架构为基础,采用乐观封锁协议和悲观封锁协议相结合的方式,确保流处理作业和批处理作业的不同资源要求.在计算节点上,提供CPU-GPU资源的灵活绑定,采用队列堆叠技术,不但满足流处理作业的实时性需求,也减少了反馈延迟并实现了GPU资源的共享.通过模拟大规模作业的调度,结果显示,HRM的调度延迟只有集中式调度框架的75%左右;使用实际负载测试,批处理与流处理共享集群时,使用HRM调度框架,CPU资源利用率提高25%以上;而使用细粒度作业调度方法,不但GPU利用率提高2倍以上,作业的完成时间也能够减少50%左右. 展开更多
关键词 数据流模型 批处理 流处理 作业感知 CPU-GPU 队列堆叠
下载PDF
Sub Farm Interface of the ATLAS Dataflow System
3
作者 刘尉悦 安琪 Maria Lorenza FERRER 《Plasma Science and Technology》 SCIE EI CAS CSCD 2006年第3期355-357,共3页
Sub Farm Interface is the event builder of the ATLAS(A Toroidal LHC ApparatuS) Dataflow System. It receives event fragments from the Read Out System, builds full events and sends complete events to the Event Filter ... Sub Farm Interface is the event builder of the ATLAS(A Toroidal LHC ApparatuS) Dataflow System. It receives event fragments from the Read Out System, builds full events and sends complete events to the Event Filter for high level event selection. This paper describes the implementation of the Sub Farm Interface. Furthermore, this paper introduces some issues on SFI(Sub Farm Interface) optimization and the monitoring service inside SFI. 展开更多
关键词 nuclear physics event building data acquisition MONITORING dataflow
下载PDF
Accelerating hybrid and compact neural networks targeting perception and control domains with coarse-grained dataflow reconfiguration
4
作者 Zheng Wang Libing Zhou +12 位作者 Wenting Xie Weiguang Chen Jinyuan Su Wenxuan Chen Anhua Du Shanliao Li Minglan Liang Yuejin Lin Wei Zhao Yanze Wu Tianfu Sun Wenqi Fang Zhibin Yu 《Journal of Semiconductors》 EI CAS CSCD 2020年第2期29-41,共13页
Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accele... Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accelerators,especially for neural networks,have attracted the research interests of computer architects and VLSI designers.State-of-the-art accelerators increase performance by deploying a huge amount of processing elements,however still face the issue of degraded resource utilization across hybrid and non-standard algorithmic kernels.In this work,we exploit the properties of important neural network kernels for both perception and control to propose a reconfigurable dataflow processor,which adjusts the patterns of data flowing,functionalities of processing elements and on-chip storages according to network kernels.In contrast to stateof-the-art fine-grained data flowing techniques,the proposed coarse-grained dataflow reconfiguration approach enables extensive sharing of computing and storage resources.Three hybrid networks for MobileNet,deep reinforcement learning and sequence classification are constructed and analyzed with customized instruction sets and toolchain.A test chip has been designed and fabricated under UMC 65 nm CMOS technology,with the measured power consumption of 7.51 mW under 100 MHz frequency on a die size of 1.8×1.8 mm^2. 展开更多
关键词 CMOS technology digital integrated circuits neural networks dataflow architecture
下载PDF
Scheduling optimization for upstream dataflows in edge computing
5
作者 Haohao Wang Mengmeng Sun +3 位作者 Lianming Zhang Pingping Dong Yehua Wei Jing Mei 《Digital Communications and Networks》 SCIE CSCD 2023年第6期1448-1457,共10页
Edge computing can alleviate the problem of insufficient computational resources for the user equipment,improve the network processing environment,and promote the user experience.Edge computing is well known as a pros... Edge computing can alleviate the problem of insufficient computational resources for the user equipment,improve the network processing environment,and promote the user experience.Edge computing is well known as a prospective method for the development of the Internet of Things(IoT).However,with the development of smart terminals,much more time is required for scheduling the terminal high-intensity upstream dataflow in the edge server than for scheduling that in the downstream dataflow.In this paper,we study the scheduling strategy for upstream dataflows in edge computing networks and introduce a three-tier edge computing network architecture.We propose a Time-Slicing Self-Adaptive Scheduling(TSAS)algorithm based on the hierarchical queue,which can reduce the queuing delay of the dataflow,improve the timeliness of dataflow processing and achieve an efficient and reasonable performance of dataflow scheduling.The experimental results show that the TSAS algorithm can reduce latency,minimize energy consumption,and increase system throughput. 展开更多
关键词 Edge computing Time-slicing dataflow scheduling Dynamic analysis
下载PDF
An Efficient Network-on-Chip Router for Dataflow Architecture 被引量:6
6
作者 Xiao-Wei Shen Xiao-Chun Ye +6 位作者 Xu Tan Da Wang Lunkai Zhang Wen-Ming Li Zhi-Min Zhang Dong-Rui Fan Ning-Hui Sun 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第1期11-25,共15页
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip ... Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router. 展开更多
关键词 multi-destination ROUTER NETWORK-ON-CHIP dataflow architecture high-performance computing
原文传递
A Non-Stop Double Buffering Mechanism for Dataflow Architecture 被引量:4
7
作者 Xu Tan Xiao-Wei Shen +6 位作者 Xiao-Chun Ye Da Wang Dong-Rui Fan Lunkai Zhang Wen-Ming Li Zhi-Min Zhang Zhi-Min Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第1期145-157,共13页
Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles d... Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles decreases the performance because of repetitive filling and draining of the dataflow accelerator. In this work, we propose a non-stop double buffering mechanism for dataflow architecture. The proposed non-stop mechanism assigns tiles to the processing element array without stopping the execution of processing elements through optimizing control logic in dataflow architecture. Moreover, we propose a work-flow program to cooperate with the non-stop double buffering mechanism. After optimizations both on control logic and on work-flow program, the filling and draining of the array needs to be done only once across the execution of all tiles belonging to the same dataflow graph. Experimental results show that the proposed double buffering mechanism for dataftow architecture achieves a 16.2% average efficiency improvement over that without the optimization. 展开更多
关键词 non-stop double buffering dataflow architecture high-performance computing
原文传递
A Pipelining Loop Optimization Method for Dataflow Architecture 被引量:2
8
作者 Xu Tan Xiao-Chun Ye +6 位作者 Xiao-Wei Shen Yuan-Chao Xu Da Wang Lunkai Zhang Wen-Ming Li Dong-Rui Fan Zhi-Min Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第1期116-130,共15页
With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for sc... With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good f(mndation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable. 展开更多
关键词 dataflow model control-flow model loop optimization exascale computing scientific application
原文传递
Dataflow Management in the Internet of Things: Sensing,Control, and Security 被引量:6
9
作者 Dawei Wei Huansheng Ning +4 位作者 Feifei Shi Yueliang Wan Jiabo Xu Shunkun Yang Li Zhu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2021年第6期918-930,共13页
The pervasiveness of the smart Internet of Things(IoTs) enables many electric sensors and devices to be connected and generates a large amount of dataflow. Compared with traditional big data, the streaming dataflow is... The pervasiveness of the smart Internet of Things(IoTs) enables many electric sensors and devices to be connected and generates a large amount of dataflow. Compared with traditional big data, the streaming dataflow is faced with representative challenges, such as high speed, strong variability, rough continuity, and demanding timeliness, which pose severe tests of its efficient management. In this paper, we provide an overall review of IoT dataflow management. We first analyze the key challenges faced with IoT dataflow and initially overview the related techniques in dataflow management, spanning dataflow sensing, mining, control, security, privacy protection,etc. Then, we illustrate and compare representative tools or platforms for IoT dataflow management. In addition,promising application scenarios, such as smart cities, smart transportation, and smart manufacturing, are elaborated,which will provide significant guidance for further research. The management of IoT dataflow is also an important area, which merits in-depth discussions and further study. 展开更多
关键词 Internet of Things(IoTs) dataflow SECURITY PRIVACY MANAGEMENT
原文传递
数据流可视化语言LabScene的连线设计
10
作者 随阳轶 林君 +1 位作者 范永开 张晓拓 《计算机工程》 CAS CSCD 北大核心 2007年第24期13-15,共3页
连线作为数据流可视化语言编辑器的重要组成部分,一方面要表示逻辑上的数据依赖关系,另一方面自身要以容易理解的物理形式展示出来。为了有机地整合这两个属性,该文提出了线树的概念,利用线树的结构表示逻辑属性,线树的节点表示物理属性... 连线作为数据流可视化语言编辑器的重要组成部分,一方面要表示逻辑上的数据依赖关系,另一方面自身要以容易理解的物理形式展示出来。为了有机地整合这两个属性,该文提出了线树的概念,利用线树的结构表示逻辑属性,线树的节点表示物理属性,使得两个属性既相互关联又相互独立,从而为解析运行提供完整的逻辑信息且容易进行编辑和优化。该设计已在面向虚拟仪器开发的数据流可视化语言LabScene中实现。 展开更多
关键词 数据流可视化语言 labscene语言 连线 虚拟仪器
下载PDF
A Dataflow-Oriented Programming Interface for Named Data Networking 被引量:1
11
作者 Li-Jing Wang Yong-Qiang Lv +1 位作者 Ilya Moiseenko Dong-Sheng Wang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第1期158-168,共11页
Inheriting from a data-driven communication pattern other than a location-driven pattern, named data net- working (NDN) offers better support to network-layer dataflow. However, the application developers have to ha... Inheriting from a data-driven communication pattern other than a location-driven pattern, named data net- working (NDN) offers better support to network-layer dataflow. However, the application developers have to handle complex tasks, such as data segmentation, packet verification, and flow control, due to the lack of proper transport-layer protocols over the network layer. In this study, we design a dataflow-oriented programming interface to provide transport strategies for NDN, which greatly improves the efficiency in developing applications. This interface presents two application data unit; (ADU) retrieval strategies according to different data publishing patterns, in which it adopts an adaptive ADU pipelining algorithm to control the dataflow based on the current network status and data generation rate. The interface also offers network measurement strategies to monitor an abundance of critical metrics infuencing the application performance. We verify the functionality and performance of our interface by implementing a video streaming application spanning 11 time zones over the worldwide NDN testbed. Our experiments show that the interface can efficiently support developing high-performance and dataflow-driven NDN applications. 展开更多
关键词 named data networking (NDN) dataflow network architecture and design transport-layer protocol
原文传递
面向虚拟仪器的DFVPL:LabScene的自动连线
12
作者 随阳轶 林君 范永开 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2007年第5期1203-1208,共6页
针对目前大多数数据流可视化编程语言(DFVPL)不提供自动连线或者提供的自动连线对用户在连线过程中的设计意图表达不够准确的情况,提出了面向虚拟仪器领域的DFV-PL的自动连线框架。通过有机地整合连线距离、逻辑关系、类型匹配等诸多因... 针对目前大多数数据流可视化编程语言(DFVPL)不提供自动连线或者提供的自动连线对用户在连线过程中的设计意图表达不够准确的情况,提出了面向虚拟仪器领域的DFV-PL的自动连线框架。通过有机地整合连线距离、逻辑关系、类型匹配等诸多因素,并利用生成虚拟仪器软件的逻辑顺序设计了自动连线方法。此设计已在自主研制的面向虚拟仪器开发的DFVPL:LabScene中实现,并可作为其他领域的DFVPL自动连线设计的参考。 展开更多
关键词 计算机软件 数据流可视化编程语言 自动连线 labscene 虚拟仪器
下载PDF
Simulation and Improvement of the Processing Subsystem of the Manchester Dataflow Computer
13
作者 来智勇 郑守淇 《Journal of Computer Science & Technology》 SCIE EI CSCD 1995年第6期557-563,共7页
The Manchester dataflow computer is a famous dynamic dataflow computer.It is centralized in architecture and simple in organization. Its overhead for communication and scheduling is very small. Its efficiency comes do... The Manchester dataflow computer is a famous dynamic dataflow computer.It is centralized in architecture and simple in organization. Its overhead for communication and scheduling is very small. Its efficiency comes down, when processing elements in the processing subsystem increaJse. Several articles eval uated its performance and presented improved methods. The authors studied its processing subsystem and carried out the simulation. The simulation rer sults show that the efficiency of the processing subsystem drops dramatically when average instruction execution microcycles become less and the maximum instruction execution rate is nearly attained. Two improved methods are pre-sented to overcome the disadvantage. The improved processing subsystem with a cheap distributor made up of a bus and a twthlevel fixed priority circuit pos-sesses almost full efficiency no matter whether the average instruction execution microcycles number is large or small and even if the mtalmum instruction execution rate is approached. 展开更多
关键词 dataflow computer processing subsystem DISTRIBUTOR efficiency maximum instruction execution rate
原文传递
A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications
14
作者 陈鹏 张磊 +1 位作者 韩银和 陈云霁 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期239-246,共8页
The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed... The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-efficiency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFC is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average. 展开更多
关键词 dataflow graph many-accelerator CLUSTERING function instruction set architecture
原文传递
Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism
15
作者 Yu-Jing Feng De-Jian Li +6 位作者 Xu Tan Xiao-Chun Ye Dong-Rui Fan Wen-Ming Li Da Wang Hao Zhang Zhi-Min Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第4期942-959,共18页
The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and p... The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency,especially of applications used in high-performance computing(HPC).Importantly,the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner.Therefore,a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts.Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection,or a handshake mechanism in which the data is only sent after acknowledgments are received.However,these mechanisms are characterized by both inefficient data transfer and increased area cost.Good performance of the dataflow architecture depends on the efficiency of data transfer.In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost,we propose a Look-Ahead Acknowledgment(LAA)mechanism.LAA accelerates the execution flow by speculatively acknowledging ahead without penalties.Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%,with a reduction in the average execution time by 17.4%and an increase in the average power efficiency of dataflow processors by 22.4%.Crucially,our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%.In conclusion,the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures. 展开更多
关键词 dataflow model control-flow model high-performance computing application data transfer power efficiency
原文传递
论香港地区与内地个人信息的协同保护 被引量:1
16
作者 曹旭东 桑栩 《港澳研究》 CSSCI 2024年第1期38-49,94,95,共14页
数字时代为香港地区与内地个人信息的协同保护提供了历史契机。相较于欧盟提倡的“统一标准规制模式”,新近区域贸易协定(RTAs)发展出的个人信息保护的“求同与存异并举规制模式”,更有利于在协调香港地区与内地个人信息保护制度差异的... 数字时代为香港地区与内地个人信息的协同保护提供了历史契机。相较于欧盟提倡的“统一标准规制模式”,新近区域贸易协定(RTAs)发展出的个人信息保护的“求同与存异并举规制模式”,更有利于在协调香港地区与内地个人信息保护制度差异的同时,保持香港地区的独特地位和优势。数字时代推动香港地区与内地个人信息协同保护,应统筹发展与安全,确立香港地区与内地间个人信息自由跨境原则的同时规定例外条款作为安全阀机制;坚持“求同与存异”并举,在尊重香港地区个人信息保护制度独特性的同时充分协调两地个人信息保护法律的基本原则框架;加强交流与合作,构建更加灵活多元的跨境个人信息协同保护制度机制。 展开更多
关键词 区域贸易协定 数据跨境流动 个人信息协同保护 求同与存异并举规制模式 统一标准规制模式
下载PDF
基于数据流架构的雷达信号调制方式识别加速
17
作者 黄湘松 王振 潘大鹏 《实验技术与管理》 CAS 北大核心 2024年第5期23-30,共8页
在雷达电子战中,快速并准确地识别敌方雷达信号调制技术对于获得战术优势至关重要,而传统依赖于图形处理单元(graphics processing unit,GPU)的识别方法难以满足此应用场景的低延迟要求。为此,该文设计了一种基于数据流架构(dataflow ar... 在雷达电子战中,快速并准确地识别敌方雷达信号调制技术对于获得战术优势至关重要,而传统依赖于图形处理单元(graphics processing unit,GPU)的识别方法难以满足此应用场景的低延迟要求。为此,该文设计了一种基于数据流架构(dataflow architecture,DF)的雷达信号调制方式识别加速系统。该系统通过对卷积神经网络权值进行二值化来减少模型参数,便于将算法部署到现场可编程门阵列(field-programmablegatearray,FPGA),同时采用数据流架构加快雷达信号调制方式的识别过程。实验结果表明,在确保整体识别准确率的前提下,该加速系统的推理速度相比i7-11800H CPU提升44.43倍,相比RTX 3050Ti GPU提升2.59倍,系统功耗仅为1.724 W。 展开更多
关键词 调制方式识别 深度学习 数据流架构 二值化神经网络 硬件部署
下载PDF
基于Actor模型的众核数据流硬件架构探索
18
作者 张家豪 邓金易 +2 位作者 尹首一 魏少军 胡杨 《计算机工程与科学》 CSCD 北大核心 2024年第6期959-967,共9页
超大规模AI模型的分布式训练对芯片架构的通信能力和可扩展性提出了挑战。晶圆级芯片通过在同一片晶圆上集成大量的计算核心和互联网络,实现了超高的计算密度和通信性能,成为了训练超大规模AI模型的理想选择。AMCoDA是一种基于Actor模... 超大规模AI模型的分布式训练对芯片架构的通信能力和可扩展性提出了挑战。晶圆级芯片通过在同一片晶圆上集成大量的计算核心和互联网络,实现了超高的计算密度和通信性能,成为了训练超大规模AI模型的理想选择。AMCoDA是一种基于Actor模型的众核数据流硬件架构,旨在利用Actor并行编程模型的高度并行性、异步消息传递和高扩展性等特点,在晶圆级芯片上实现AI模型的分布式训练。AMCoDA的设计包括计算模型、执行模型和硬件架构3个层面。实验表明,AMCoDA能广泛支持分布式训练中的各种并行模式和集合通信模式,灵活高效地完成复杂分布式训练策略的部署和执行。 展开更多
关键词 晶圆级芯片 分布式训练 Actor模型 众核数据流架构
下载PDF
新一代神威处理器上高效任务流并行系统
19
作者 傅游 杜雷明 +1 位作者 高希然 陈莉 《计算机科学》 CSCD 北大核心 2024年第12期137-146,共10页
我国自主研制的新一代神威超级计算机相比前一代的神威太湖之光,具有更强大的内存系统和更高的计算密度,其主力编程模型仍然是块同步(Bulk Synchronous Parallelism,BSP)模型。顺序任务流(Sequential Task Flow,STF)模型基于数据流信息... 我国自主研制的新一代神威超级计算机相比前一代的神威太湖之光,具有更强大的内存系统和更高的计算密度,其主力编程模型仍然是块同步(Bulk Synchronous Parallelism,BSP)模型。顺序任务流(Sequential Task Flow,STF)模型基于数据流信息实现对串行程序的自动任务并行,并通过任务间的细粒度同步实现异步并行,相比于BSP模型的全局同步,并行度更高,负载更均衡。STF模型为用户高效使用神威平台提供了一种新选择。但在众核系统上,STF模型的运行时开销会直接影响并行程序性能。首先,分析新一代神威处理器影响STF模型高效实现的两个特征;然后,利用处理器架构的独有特性,提出一种基于代理的数据流构图机制以实现模型的构图需求,以及一种无锁的集中式任务调度机制以优化调度开销。最后,基于以上技术,为AceMesh模型实现了高效的任务流并行系统。实验表明,实现的任务流并行系统相比传统运行时支持优势显著,在细粒度任务场景下最高加速2.37倍;AceMesh性能高于神威平台的OpenACC模型,对典型应用的加速最高达到2.07倍。 展开更多
关键词 顺序任务流模型 异构众核并行 任务调度 数据流并行 块同步模型
下载PDF
基于污点分析的SQL注入漏洞检测 被引量:1
20
作者 王国峰 唐云善 徐立飞 《信息技术》 2024年第2期185-190,共6页
SQL注入漏洞给Web程序的数据库系统带来巨大的风险,一旦此漏洞遭受攻击,其带来的损失不可估量。对此,提出一种基于污点分析的SQL注入漏洞的检测方法。该方法以三地址码为中间表示,根据SQL注入漏洞特征,设计了用于前向分析的污点数据流... SQL注入漏洞给Web程序的数据库系统带来巨大的风险,一旦此漏洞遭受攻击,其带来的损失不可估量。对此,提出一种基于污点分析的SQL注入漏洞的检测方法。该方法以三地址码为中间表示,根据SQL注入漏洞特征,设计了用于前向分析的污点数据流值和污点传播规则;在程序控制流图上进行数据流算法的迭代分析;在计算过程中同步进行安全性检查,进而得到所有包含污点数据的Sink点;通过遍历包含污点数据的Sink点集合,报出SQL注入漏洞位置。最后通过对比实验验证了该方法的有效性。 展开更多
关键词 SQL注入 静态漏洞检测 数据流分析 污点分析
下载PDF
上一页 1 2 12 下一页 到第
使用帮助 返回顶部