期刊文献+
共找到48,635篇文章
< 1 2 250 >
每页显示 20 50 100
Research on Multi-Core Processor Analysis for WCET Estimation
1
作者 LUO Haoran HU Shuisong +2 位作者 WANG Wenyong TANG Yuke ZHOU Junwei 《ZTE Communications》 2024年第1期87-94,共8页
Real-time system timing analysis is crucial for estimating the worst-case execution time(WCET)of a program.To achieve this,static or dynamic analysis methods are used,along with targeted modeling of the actual hardwar... Real-time system timing analysis is crucial for estimating the worst-case execution time(WCET)of a program.To achieve this,static or dynamic analysis methods are used,along with targeted modeling of the actual hardware system.This literature review focuses on calculating WCET for multi-core processors,providing a survey of traditional methods used for static and dynamic analysis and highlighting the major challenges that arise from different program execution scenarios on multi-core platforms.This paper outlines the strengths and weaknesses of current methodologies and offers insights into prospective areas of research on multi-core analysis.By presenting a comprehensive analysis of the current state of research on multi-core processor analysis for WCET estimation,this review aims to serve as a valuable resource for researchers and practitioners in the field. 展开更多
关键词 real-time system worst-case execution time(WCET) multi-core analysis
下载PDF
An MPI parallel DEM-IMB-LBM framework for simulating fluid-solid interaction problems 被引量:2
2
作者 Ming Xia Liuhong Deng +3 位作者 Fengqiang Gong Tongming Qu Y.T.Feng Jin Yu 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第6期2219-2231,共13页
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp... The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework. 展开更多
关键词 Discrete element method(DEM) Lattice Boltzmann method(LBM) Immersed moving boundary(IMB) multi-cores parallelization Message passing interface(MPI) CPU Submarine landslides
下载PDF
Hybridization of Metaheuristics Based Energy Efficient Scheduling Algorithm for Multi-Core Systems
3
作者 J.Jean Justus U.Sakthi +4 位作者 K.Priyadarshini B.Thiyaneswaran Masoud Alajmi Marwa Obayya Manar Ahmed Hamza 《Computer Systems Science & Engineering》 SCIE EI 2023年第1期205-219,共15页
The developments of multi-core systems(MCS)have considerably improved the existing technologies in thefield of computer architecture.The MCS comprises several processors that are heterogeneous for resource capacities,... The developments of multi-core systems(MCS)have considerably improved the existing technologies in thefield of computer architecture.The MCS comprises several processors that are heterogeneous for resource capacities,working environments,topologies,and so on.The existing multi-core technology unlocks additional research opportunities for energy minimization by the use of effective task scheduling.At the same time,the task scheduling process is yet to be explored in the multi-core systems.This paper presents a new hybrid genetic algorithm(GA)with a krill herd(KH)based energy-efficient scheduling techni-que for multi-core systems(GAKH-SMCS).The goal of the GAKH-SMCS tech-nique is to derive scheduling tasks in such a way to achieve faster completion time and minimum energy dissipation.The GAKH-SMCS model involves a multi-objectivefitness function using four parameters such as makespan,processor utilization,speedup,and energy consumption to schedule tasks proficiently.The performance of the GAKH-SMCS model has been validated against two datasets namely random dataset and benchmark dataset.The experimental outcome ensured the effectiveness of the GAKH-SMCS model interms of makespan,pro-cessor utilization,speedup,and energy consumption.The overall simulation results depicted that the presented GAKH-SMCS model achieves energy effi-ciency by optimal task scheduling process in MCS. 展开更多
关键词 Task scheduling energy efficiency multi-core systems fitness function MAKESPAN
下载PDF
Shared Cache Based on Content Addressable Memory in a Multi-Core Architecture
4
作者 Allam Abumwais Mahmoud Obaid 《Computers, Materials & Continua》 SCIE EI 2023年第3期4951-4963,共13页
Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc... Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors. 展开更多
关键词 multi-core processor shared cache content addressable memory dual port CAM replacement algorithm benchmark program
下载PDF
Multi-core based parallel computing technique for content-based image retrieval 被引量:1
5
作者 陈文浩 方昱春 +1 位作者 姚继锋 张武 《Journal of Shanghai University(English Edition)》 2010年第1期55-59,共5页
In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based ... In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based on cluster or network computing architecture. Due to its specific applications (such as medical image processing) and the harsh terms of hardware resource requirement, the CBIR system has been prevented from being widely used. With the increasing volume of the image database, the widespread use of multi-core processors, and the requirement of the retrieval accuracy and speed, we need to achieve a retrieval strategy which is based on multi-core processor to make the retrieval faster and more convenient than before. Experimental results demonstrate that this parallel architecture can significantly improve the performance of retrieval system. In addition, we also propose an efficient parallel technique with the combinations of the cluster and the multi-core techniques, which is supposed to gear to the new trend of the cloud computing. 展开更多
关键词 content-based image retrieval (CBIR) parallel computing SHARED-MEMORY feature extraction similarity comparison
下载PDF
Performance Enhancement of XML Parsing Using Regression and Parallelism
6
作者 Muhammad Ali Minhaj Ahmad Khan 《Computer Systems Science & Engineering》 2024年第2期287-303,共17页
The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Obj... The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files. 展开更多
关键词 Regression parallel parsing multi-cores XML
下载PDF
Parallel Implementation of the CCSDS Turbo Decoder on GPU
7
作者 Liu Zhanxian Liu Rongke +3 位作者 Zhang Haijun Wang Ning Sun Lei Wang Jianquan 《China Communications》 SCIE CSCD 2024年第10期70-77,共8页
This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste... This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively. 展开更多
关键词 CCSDS CUDA GPU parallel decoding turbo codes
下载PDF
Evaluation on Configuration Stiffness of Overconstrained 2R1T Parallel Mechanisms
8
作者 Xuejian Ma Zhenghe Xu +3 位作者 Yundou Xu Yu Wang Jiantao Yao Yongsheng Zhao 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2024年第3期62-82,共21页
Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate th... Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms. 展开更多
关键词 parallel mechanism STIFFNESS Over-constrained Three degrees of freedom
下载PDF
Configuration and Kinematics of a 3-DOF Generalized Spherical Parallel Mechanism for Ankle Rehabilitation
9
作者 Jianjun Zhang Shuai Yang +2 位作者 Chenglei Liu Xiaohui Wang Shijie Guo 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2024年第1期176-188,共13页
The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the hum... The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification. 展开更多
关键词 Ankle rehabilitation parallel mechanism Kinematic analysis Parameter optimization
下载PDF
An efficient parallel algorithm of variational nodal method for heterogeneous neutron transport problems
10
作者 Han Yin Xiao-Jing Liu Teng-Fei Zhang 《Nuclear Science and Techniques》 SCIE EI CAS CSCD 2024年第4期29-45,共17页
The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-... The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem. 展开更多
关键词 Neutron transport Variational nodal method parallelIZATION KAIST JRR-3
下载PDF
Parallel scheduling strategy of web-based spatial computing tasks in multi-core environment
11
作者 郭明强 Huang Ying Xie Zhong 《High Technology Letters》 EI CAS 2014年第4期395-400,共6页
In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of pa... In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained. 展开更多
关键词 parallel scheduling strategy the web-based spatial computing model multi-core environment load balancing
下载PDF
Type Synthesis of Self-Alignment Parallel Ankle Rehabilitation Robot with Suitable Passive Degrees of Freedom
12
作者 Ya Liu Wenjuan Lu +3 位作者 Dabao Fan Weijian Tan Bo Hu Daxing Zeng 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2024年第1期160-175,共16页
The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study ... The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR. 展开更多
关键词 Ankle rehabilitation robot SELF-ALIGNMENT parallel mechanism Type synthesis Screw theory
下载PDF
MPI/OpenMP-Based Parallel Solver for Imprint Forming Simulation
13
作者 Yang Li Jiangping Xu +2 位作者 Yun Liu Wen Zhong Fei Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期461-483,共23页
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr... In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns. 展开更多
关键词 Hybrid MPI/OpenMP parallel computing MPI OPENMP imprint forming
下载PDF
Joint computation offloading and parallel scheduling to maximize delay-guarantee in cooperative MEC systems
14
作者 Mian Guo Mithun Mukherjee +3 位作者 Jaime Lloret Lei Li Quansheng Guan Fei Ji 《Digital Communications and Networks》 SCIE CSCD 2024年第3期693-705,共13页
The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cess... The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC. 展开更多
关键词 Edge computing Computation offloading parallel scheduling Mobile-edge cooperation Delay guarantee
下载PDF
THE NONLINEAR STABILITY OF PLANE PARALLEL SHEAR FLOWS WITH RESPECT TO TILTED PERTURBATIONS
15
作者 许兰喜 关芳芳 《Acta Mathematica Scientia》 SCIE CSCD 2024年第3期1036-1045,共10页
The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direc... The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direction of the basic flows.By defining an energy functional,it is proven that plane parallel shear flows are unconditionally nonlinearly exponentially stable for tilted streamwise perturbation when the Reynolds number is below a certain critical value and the boundary conditions are either rigid or stress-free.In the case of stress-free boundaries,by taking advantage of the poloidal-toroidal decomposition of a solenoidal field to define energy functionals,it can be even shown that plane parallel shear flows are unconditionally nonlinearly exponentially stable for all Reynolds numbers,where the tilted perturbation can be either spanwise or streamwise. 展开更多
关键词 plane parallel shear flows energy method energy functional nonlinear stability Reynolds number
下载PDF
Static Analysis Techniques for Fixing Software Defects in MPI-Based Parallel Programs
16
作者 Norah Abdullah Al-Johany Sanaa Abdullah Sharaf +1 位作者 Fathy Elbouraey Eassa Reem Abdulaziz Alnanih 《Computers, Materials & Continua》 SCIE EI 2024年第5期3139-3173,共35页
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par... The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems. 展开更多
关键词 High-performance computing parallel computing software engineering software defect message passing interface DEADLOCK
下载PDF
Dynamical Artificial Bee Colony for Energy-Efficient Unrelated Parallel Machine Scheduling with Additional Resources and Maintenance
17
作者 Yizhuo Zhu Shaosi He Deming Lei 《Computers, Materials & Continua》 SCIE EI 2024年第10期843-866,共24页
Unrelated parallel machine scheduling problem(UPMSP)is a typical scheduling one and UPMSP with various reallife constraints such as additional resources has been widely studied;however,UPMSP with additional resources,... Unrelated parallel machine scheduling problem(UPMSP)is a typical scheduling one and UPMSP with various reallife constraints such as additional resources has been widely studied;however,UPMSP with additional resources,maintenance,and energy-related objectives is seldom investigated.The Artificial Bee Colony(ABC)algorithm has been successfully applied to various production scheduling problems and demonstrates potential search advantages in solving UPMSP with additional resources,among other factors.In this study,an energy-efficient UPMSP with additional resources and maintenance is considered.A dynamical artificial bee colony(DABC)algorithm is presented to minimize makespan and total energy consumption simultaneously.Three heuristics are applied to produce the initial population.Employed bee swarm and onlooker bee swarm are constructed.Computing resources are shifted from the dominated solutions to non-dominated solutions in each swarm when the given condition is met.Dynamical employed bee phase is implemented by computing resource shifting and solution migration.Computing resource shifting and feedback are used to construct dynamical onlooker bee phase.Computational experiments are conducted on 300 instances from the literature and three comparative algorithms and ABC are compared after parameter settings of all algorithms are given.The computational results demonstrate that the new strategies of DABC are effective and that DABC has promising advantages in solving the considered UPMSP. 展开更多
关键词 Artificial bee colony parallel machine scheduling ENERGY additional resource
下载PDF
Multi-Level Parallel Network for Brain Tumor Segmentation
18
作者 Juhong Tie Hui Peng 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第4期741-757,共17页
Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly... Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly heterogeneous appearance and shape.Deep convolution neural networks(CNNs)have recently improved glioma segmentation performance.However,extensive down-sampling such as pooling or stridden convolution in CNNs significantly decreases the initial image resolution,resulting in the loss of accurate spatial and object parts information,especially information on the small sub-region tumors,affecting segmentation performance.Hence,this paper proposes a novel multi-level parallel network comprising three different level parallel subnetworks to fully use low-level,mid-level,and high-level information and improve the performance of brain tumor segmentation.We also introduce the Combo loss function to address input class imbalance and false positives and negatives imbalance in deep learning.The proposed method is trained and validated on the BraTS 2020 training and validation dataset.On the validation dataset,ourmethod achieved a mean Dice score of 0.907,0.830,and 0.787 for the whole tumor,tumor core,and enhancing tumor core,respectively.Compared with state-of-the-art methods,the multi-level parallel network has achieved competitive results on the validation dataset. 展开更多
关键词 Convolution neural network brain tumor segmentation parallel network
下载PDF
Solution for Output Coordination Equations of Several Typical Parallel Six-Dimensional Acceleration Sensing Mechanisms
19
作者 ZHANG Xianzhu YOU Jingjing ZHANG Yuanwei 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2024年第S01期96-102,共7页
Aiming at the problem that it is difficult to generate the dynamic decoupling equation of the parallel six-dimensional acceleration sensing mechanism,two typical parallel six-dimensional acceleration sensing mechanism... Aiming at the problem that it is difficult to generate the dynamic decoupling equation of the parallel six-dimensional acceleration sensing mechanism,two typical parallel six-dimensional acceleration sensing mechanisms are taken as examples.By analyzing the scale constraint relationship between the hinge points on the mass block and the hinge points on the base of the sensing mechanism,a new method for establishing the dynamic equation of the sensing mechanism is proposed.Firstly,based on the scale constraint relationship between the hinge points on the mass block and the hinge points on the base of the sensing mechanism,the expression of the branch rod length is obtained.The inherent constraint relationship between the branches is excavated and the branch coordination closed chain of the“12-6”configuration is constructed.The output coordination equation of the sensing mechanism is successfully derived.Secondly,the dynamic equations of“12-4”and“12-6”configurations are constructed by the Newton-Euler method,and the forward decoupling equations of the two configurations are solved by combining the dynamic equations and the output coordination equations.Finally,the virtual prototype experiment is carried out,and the maximum reference errors of the forward decoupling equations of the two configuration sensing mechanisms are 4.23%and 6.53%,respectively.The results show that the proposed method is effective and feasible,and meets the real-time requirements. 展开更多
关键词 six-dimensional acceleration sensor parallel mechanism topological configuration coordination equation dynamics
下载PDF
Effective Capacity of URLLC over Parallel Fading Channels with Imperfect Channel State Information
20
作者 Peng Hongsen Tao Meixia 《China Communications》 SCIE CSCD 2024年第5期45-63,共19页
This paper investigates the effective capacity of a point-to-point ultra-reliable low latency communication(URLLC)transmission over multiple parallel sub-channels at finite blocklength(FBL)with imperfect channel state... This paper investigates the effective capacity of a point-to-point ultra-reliable low latency communication(URLLC)transmission over multiple parallel sub-channels at finite blocklength(FBL)with imperfect channel state information(CSI).Based on reasonable assumptions and approximations,we derive the effective capacity as a function of the pilot length,decoding error probability,transmit power and the sub-channel number.Then we reveal significant impact of the above parameters on the effective capacity.A closed-form lower bound of the effective capacity is derived and an alternating optimization based algorithm is proposed to find the optimal pilot length and decoding error probability.Simulation results validate our theoretical analysis and show that the closedform lower bound is very tight.In addition,through the simulations of the optimized effective capacity,insights for pilot length and decoding error probability optimization are provided to evaluate the optimal parameters in realistic systems. 展开更多
关键词 effective capacity finite blocklength regime imperfect CSI parallel fading channels URLLC
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部