Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights t...Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights the importance of addressing race conditions in parallel image processing, specifically focusing on color inverse filtering using OpenMP. We considered three solutions to solve race conditions, each with distinct characteristics: #pragma omp atomic: Protects individual memory operations for fine-grained control. #pragma omp critical: Protects entire code blocks for exclusive access. #pragma omp parallel sections reduction: Employs a reduction clause for safe aggregation of values across threads. Our findings show that the produced images were unaffected by race condition. However, it becomes evident that solving the race conditions in the code makes it significantly faster, especially when it is executed on multiple cores.展开更多
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl...In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.展开更多
Ray tracing is a computer graphics method that renders images realistically. As the name suggests, this technique primarily traces the path of light rays interacting with objects in a scene [1], permitting the calcula...Ray tracing is a computer graphics method that renders images realistically. As the name suggests, this technique primarily traces the path of light rays interacting with objects in a scene [1], permitting the calculation of lighting and reflecting impact [2]. As ray tracing is a time-consuming process, the need for parallelization to solve this problem arises. One downside of this solution is the existence of race conditions. In this work, we explore and experiment with a different, well-known solution for this race condition. Starting with the introduction and the background section, a brief overview of the topic is followed by a detailed part of how the race conditions may occur in the case of the ray tracing algorithm. Continuing with the methods and results section, we have used OpenMP to parallelize the Ray tracing algorithm with the different compiler directives critical, atomic, and first-private. Hence, it concluded that both critical and atomic are not efficient solutions to produce a good-quality picture, but first-private succeeded in producing a high-quality picture.展开更多
This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison ...This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison of their performance. This study investigates the efficacy of both techniques through the lens of array generation and pivot selection to manage datasets of varying sizes. This study meticulously documents the performance metrics, recording 16,499.2 milliseconds for the serial implementation and 16,339 milliseconds for the parallel implementation when sorting an array by using C++ chrono library. These results suggest that while the performance gains of the parallel approach over its serial counterpart are not immediately pronounced for smaller datasets, the benefits are expected to be more substantial as the dataset size increases.展开更多
Building on a new model proposed recently for calculating constant electro-magnetic field values, the present article explores the electro-magnetic field configuration generated by parallel electrical wires. This impo...Building on a new model proposed recently for calculating constant electro-magnetic field values, the present article explores the electro-magnetic field configuration generated by parallel electrical wires. This imposes a reevaluation of the drawing procedure for constructing field curves with a constant field values around multiple parallel electrical conducting wires. To achieve this, we employ methods akin to those used for creating contours on topographical maps, ensuring a consistent numerical field value along the entire length of the field curves. Subsequent calculations will be conducted for scenarios where wires are not parallel.展开更多
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp...The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.展开更多
This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study ...The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.展开更多
The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-...The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem.展开更多
Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate th...Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms.展开更多
The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the hum...The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification.展开更多
The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direc...The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direction of the basic flows.By defining an energy functional,it is proven that plane parallel shear flows are unconditionally nonlinearly exponentially stable for tilted streamwise perturbation when the Reynolds number is below a certain critical value and the boundary conditions are either rigid or stress-free.In the case of stress-free boundaries,by taking advantage of the poloidal-toroidal decomposition of a solenoidal field to define energy functionals,it can be even shown that plane parallel shear flows are unconditionally nonlinearly exponentially stable for all Reynolds numbers,where the tilted perturbation can be either spanwise or streamwise.展开更多
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr...In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.展开更多
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par...The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.展开更多
The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cess...The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.展开更多
This paper investigates the effective capacity of a point-to-point ultra-reliable low latency communication(URLLC)transmission over multiple parallel sub-channels at finite blocklength(FBL)with imperfect channel state...This paper investigates the effective capacity of a point-to-point ultra-reliable low latency communication(URLLC)transmission over multiple parallel sub-channels at finite blocklength(FBL)with imperfect channel state information(CSI).Based on reasonable assumptions and approximations,we derive the effective capacity as a function of the pilot length,decoding error probability,transmit power and the sub-channel number.Then we reveal significant impact of the above parameters on the effective capacity.A closed-form lower bound of the effective capacity is derived and an alternating optimization based algorithm is proposed to find the optimal pilot length and decoding error probability.Simulation results validate our theoretical analysis and show that the closedform lower bound is very tight.In addition,through the simulations of the optimized effective capacity,insights for pilot length and decoding error probability optimization are provided to evaluate the optimal parameters in realistic systems.展开更多
Low-Earth-Orbit satellite constellation networks(LEO-SCN)can provide low-cost,largescale,flexible coverage wireless communication services.High dynamics and large topological sizes characterize LEO-SCN.Protocol develo...Low-Earth-Orbit satellite constellation networks(LEO-SCN)can provide low-cost,largescale,flexible coverage wireless communication services.High dynamics and large topological sizes characterize LEO-SCN.Protocol development and application testing of LEO-SCN are challenging to carry out in a natural environment.Simulation platforms are a more effective means of technology demonstration.Currently available simulators have a single function and limited simulation scale.There needs to be a simulator for full-featured simulation.In this paper,we apply the parallel discrete-event simulation technique to the simulation of LEO-SCN to support large-scale complex system simulation at the packet level.To solve the problem that single-process programs cannot cope with complex simulations containing numerous entities,we propose a parallel mechanism and algorithms LP-NM and LP-YAWNS for synchronization.In the experiment,we use ns-3 to verify the acceleration ratio and efficiency of the above algorithms.The results show that our proposed mechanism can provide parallel simulation engine support for the LEO-SCN.展开更多
Neutron-skin thickness is a key parameter for a neutron-rich nucleus;however,it is difficult to determine.In the framework of the Lanzhou Quantum Molecular Dynamics(LQMD)model,a possible probe for the neutron-skin thi...Neutron-skin thickness is a key parameter for a neutron-rich nucleus;however,it is difficult to determine.In the framework of the Lanzhou Quantum Molecular Dynamics(LQMD)model,a possible probe for the neutron-skin thickness(δ_(np))of neutron-rich ^(48)Ca was studied in the 140A MeV ^(48)Ca+^(9)Be projectile fragmentation reaction based on the parallel momentum distribution(p∥)of the residual fragments.A Fermi-type density distribution was employed to initiate the neutron density distributions in the LQMD simulations.A combined Gaussian function with different width parameters for the left side(Γ_(L))and the right side(Γ_(R))in the distribution was used to describe the p∥of the residual fragments.Taking neutron-rich sulfur isotopes as examples,Γ_(L) shows a sensitive correlation withδ_(np) of ^(48)Ca,and is proposed as a probe for determining the neutron skin thickness of the projectile nucleus.展开更多
Objective This study aimed to analyze the clinical efficacy of the Jianpi Shengxue tablet for treating renal anemia.Methods A total of 200 patients with renal anemia from December 2020 to December 2022 were enrolled a...Objective This study aimed to analyze the clinical efficacy of the Jianpi Shengxue tablet for treating renal anemia.Methods A total of 200 patients with renal anemia from December 2020 to December 2022 were enrolled and randomly divided into two groups.Patients in the control group were treated with polysaccharide-iron complex,and those in the experimental group were administered Jianpi Shengxue tablet.After 8 weeks of continuous treatment,the therapeutic outcomes regarding anemia were compared between the two groups.Results After treatment,the red blood cell(RBC)count,hematocrit(HCT),reticulocyte percentage(RET),ferritin(SF),serum iron(SI),transferrin saturation(TSAT),and serum albumin(ALB)all increased(P<0.01),and the clinical symptom score and total iron binding capacity decreased(P<0.01)in the experimental group.Moreover,the improvements in RBC,HCT,RET,SF,SI,TAST,ALB,and clinical symptoms(fatigue,anorexia,dull skin complexion,numbness of hands and feet)in the experimental group were significantly greater than those in the control group(P<0.05).The total effective rate for treating renal anemia was significantly higher in the experimental group than in the control group(P<0.01).Conclusion The Jianpi Shengxue tablet demonstrates efficacy in treating renal anemia,leading to significant improvements in the laboratory examination results and clinical symptoms of patients with renal anemia.展开更多
The establishment of an elastostatic stiffness model for over constrained parallel manipulators(PMs),particularly those with over constrained subclosed loops,poses a challenge while ensuring numerical stability.This s...The establishment of an elastostatic stiffness model for over constrained parallel manipulators(PMs),particularly those with over constrained subclosed loops,poses a challenge while ensuring numerical stability.This study addresses this issue by proposing a systematic elastostatic stiffness model based on matrix structural analysis(MSA)and independent displacement coordinates(IDCs)extraction techniques.To begin,the closed-loop PM is transformed into an open-loop PM by eliminating constraints.A subassembly element is then introduced,which considers the flexibility of both rods and joints.This approach helps circumvent the numerical instability typically encountered with traditional constraint equations.The IDCs and analytical constraint equations of nodes constrained by various joints are summarized in the appendix,utilizing multipoint constraint theory and singularity analysis,all unified within a single coordinate frame.Subsequently,the open-loop mechanism is efficiently closed by referencing the constraint equations presented in the appendix,alongside its elastostatic model.The proposed method proves to be both modeling and computationally efficient due to the comprehensive summary of the constraint equations in the Appendix,eliminating the need for additional equations.An example utilizing an over constrained subclosed loops demonstrate the application of the proposed method.In conclusion,the model proposed in this study enriches the theory of elastostatic stiffness modeling of PMs and provides an effective solution for stiffness modeling challenges they present.展开更多
文摘Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights the importance of addressing race conditions in parallel image processing, specifically focusing on color inverse filtering using OpenMP. We considered three solutions to solve race conditions, each with distinct characteristics: #pragma omp atomic: Protects individual memory operations for fine-grained control. #pragma omp critical: Protects entire code blocks for exclusive access. #pragma omp parallel sections reduction: Employs a reduction clause for safe aggregation of values across threads. Our findings show that the produced images were unaffected by race condition. However, it becomes evident that solving the race conditions in the code makes it significantly faster, especially when it is executed on multiple cores.
文摘In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.
文摘Ray tracing is a computer graphics method that renders images realistically. As the name suggests, this technique primarily traces the path of light rays interacting with objects in a scene [1], permitting the calculation of lighting and reflecting impact [2]. As ray tracing is a time-consuming process, the need for parallelization to solve this problem arises. One downside of this solution is the existence of race conditions. In this work, we explore and experiment with a different, well-known solution for this race condition. Starting with the introduction and the background section, a brief overview of the topic is followed by a detailed part of how the race conditions may occur in the case of the ray tracing algorithm. Continuing with the methods and results section, we have used OpenMP to parallelize the Ray tracing algorithm with the different compiler directives critical, atomic, and first-private. Hence, it concluded that both critical and atomic are not efficient solutions to produce a good-quality picture, but first-private succeeded in producing a high-quality picture.
文摘This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison of their performance. This study investigates the efficacy of both techniques through the lens of array generation and pivot selection to manage datasets of varying sizes. This study meticulously documents the performance metrics, recording 16,499.2 milliseconds for the serial implementation and 16,339 milliseconds for the parallel implementation when sorting an array by using C++ chrono library. These results suggest that while the performance gains of the parallel approach over its serial counterpart are not immediately pronounced for smaller datasets, the benefits are expected to be more substantial as the dataset size increases.
文摘Building on a new model proposed recently for calculating constant electro-magnetic field values, the present article explores the electro-magnetic field configuration generated by parallel electrical wires. This imposes a reevaluation of the drawing procedure for constructing field curves with a constant field values around multiple parallel electrical conducting wires. To achieve this, we employ methods akin to those used for creating contours on topographical maps, ensuring a consistent numerical field value along the entire length of the field curves. Subsequent calculations will be conducted for scenarios where wires are not parallel.
基金financially supported by the National Natural Science Foundation of China(Grant Nos.12072217 and 42077254)the Natural Science Foundation of Hunan Province,China(Grant No.2022JJ30567).
文摘The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金Supported by Key Scientific Research Platforms and Projects of Guangdong Regular Institutions of Higher Education of China(Grant No.2022KCXTD033)Guangdong Provincial Natural Science Foundation of China(Grant No.2023A1515012103)+1 种基金Guangdong Provincial Scientific Research Capacity Improvement Project of Key Developing Disciplines of China(Grant No.2021ZDJS084)National Natural Science Foundation of China(Grant No.52105009).
文摘The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.
基金supported by the National Key Research and Development Program of China(No.2020YFB1901900)the National Natural Science Foundation of China(Nos.U20B2011,12175138)the Shanghai Rising-Star Program。
文摘The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem.
基金Supported by National Natural Science Foundation of China (Grant Nos.51875495,U2037202)Hebei Provincial Science and Technology Project (Grant No.206Z1805G)。
文摘Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms.
基金Supported by National Natural Science Foundation of China(Grant No.52075145)S&T Program of Hebei Province of China(Grant Nos.20281805Z,E2020103001)Central Government Guides Basic Research Projects of Local Science and Technology Development Funds of China(Grant No.206Z1801G).
文摘The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification.
基金supported by the National Natural Science Foundation of China(21627813)。
文摘The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direction of the basic flows.By defining an energy functional,it is proven that plane parallel shear flows are unconditionally nonlinearly exponentially stable for tilted streamwise perturbation when the Reynolds number is below a certain critical value and the boundary conditions are either rigid or stress-free.In the case of stress-free boundaries,by taking advantage of the poloidal-toroidal decomposition of a solenoidal field to define energy functionals,it can be even shown that plane parallel shear flows are unconditionally nonlinearly exponentially stable for all Reynolds numbers,where the tilted perturbation can be either spanwise or streamwise.
基金supported by the fund from ShenyangMint Company Limited(No.20220056)Senior Talent Foundation of Jiangsu University(No.19JDG022)Taizhou City Double Innovation and Entrepreneurship Talent Program(No.Taizhou Human Resources Office[2022]No.22).
文摘In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.
基金the Deanship of Scientific Research at King Abdulaziz University,Jeddah,Saudi Arabia under the Grant No.RG-12-611-43.
文摘The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.
基金supported in part by the National Natural Science Foundation of China under Grant 61901128,62273109the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB510032).
文摘The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.
基金supported by the National Natural Science Foundation of China under grant 61941106。
文摘This paper investigates the effective capacity of a point-to-point ultra-reliable low latency communication(URLLC)transmission over multiple parallel sub-channels at finite blocklength(FBL)with imperfect channel state information(CSI).Based on reasonable assumptions and approximations,we derive the effective capacity as a function of the pilot length,decoding error probability,transmit power and the sub-channel number.Then we reveal significant impact of the above parameters on the effective capacity.A closed-form lower bound of the effective capacity is derived and an alternating optimization based algorithm is proposed to find the optimal pilot length and decoding error probability.Simulation results validate our theoretical analysis and show that the closedform lower bound is very tight.In addition,through the simulations of the optimized effective capacity,insights for pilot length and decoding error probability optimization are provided to evaluate the optimal parameters in realistic systems.
基金supported by Jiangsu Provincial Key Research and Development Program (No.BE20210132)the Zhejiang Provincial Key Research and Development Program (No.2021C01040)the team of S-SET
文摘Low-Earth-Orbit satellite constellation networks(LEO-SCN)can provide low-cost,largescale,flexible coverage wireless communication services.High dynamics and large topological sizes characterize LEO-SCN.Protocol development and application testing of LEO-SCN are challenging to carry out in a natural environment.Simulation platforms are a more effective means of technology demonstration.Currently available simulators have a single function and limited simulation scale.There needs to be a simulator for full-featured simulation.In this paper,we apply the parallel discrete-event simulation technique to the simulation of LEO-SCN to support large-scale complex system simulation at the packet level.To solve the problem that single-process programs cannot cope with complex simulations containing numerous entities,we propose a parallel mechanism and algorithms LP-NM and LP-YAWNS for synchronization.In the experiment,we use ns-3 to verify the acceleration ratio and efficiency of the above algorithms.The results show that our proposed mechanism can provide parallel simulation engine support for the LEO-SCN.
基金the National Natural Science Foundation of China(Nos.12375123,11975091,and 12305130)the Natural Science Foundation of Henan Province(No.242300421048)+1 种基金China Postdoctoral Science Foundation(No.2023M731016)Henan Postdoctoral Foundation(No.HN2022164).
文摘Neutron-skin thickness is a key parameter for a neutron-rich nucleus;however,it is difficult to determine.In the framework of the Lanzhou Quantum Molecular Dynamics(LQMD)model,a possible probe for the neutron-skin thickness(δ_(np))of neutron-rich ^(48)Ca was studied in the 140A MeV ^(48)Ca+^(9)Be projectile fragmentation reaction based on the parallel momentum distribution(p∥)of the residual fragments.A Fermi-type density distribution was employed to initiate the neutron density distributions in the LQMD simulations.A combined Gaussian function with different width parameters for the left side(Γ_(L))and the right side(Γ_(R))in the distribution was used to describe the p∥of the residual fragments.Taking neutron-rich sulfur isotopes as examples,Γ_(L) shows a sensitive correlation withδ_(np) of ^(48)Ca,and is proposed as a probe for determining the neutron skin thickness of the projectile nucleus.
基金financially supported by the National Natural Science Foundation of China(No.82170701).
文摘Objective This study aimed to analyze the clinical efficacy of the Jianpi Shengxue tablet for treating renal anemia.Methods A total of 200 patients with renal anemia from December 2020 to December 2022 were enrolled and randomly divided into two groups.Patients in the control group were treated with polysaccharide-iron complex,and those in the experimental group were administered Jianpi Shengxue tablet.After 8 weeks of continuous treatment,the therapeutic outcomes regarding anemia were compared between the two groups.Results After treatment,the red blood cell(RBC)count,hematocrit(HCT),reticulocyte percentage(RET),ferritin(SF),serum iron(SI),transferrin saturation(TSAT),and serum albumin(ALB)all increased(P<0.01),and the clinical symptom score and total iron binding capacity decreased(P<0.01)in the experimental group.Moreover,the improvements in RBC,HCT,RET,SF,SI,TAST,ALB,and clinical symptoms(fatigue,anorexia,dull skin complexion,numbness of hands and feet)in the experimental group were significantly greater than those in the control group(P<0.05).The total effective rate for treating renal anemia was significantly higher in the experimental group than in the control group(P<0.01).Conclusion The Jianpi Shengxue tablet demonstrates efficacy in treating renal anemia,leading to significant improvements in the laboratory examination results and clinical symptoms of patients with renal anemia.
基金Supported by National Natural Science Foundation of China (Grant No.52275036)Key Research and Development Project of the Jiaxing Science and Technology Bureau (Grant No.2022BZ10004)。
文摘The establishment of an elastostatic stiffness model for over constrained parallel manipulators(PMs),particularly those with over constrained subclosed loops,poses a challenge while ensuring numerical stability.This study addresses this issue by proposing a systematic elastostatic stiffness model based on matrix structural analysis(MSA)and independent displacement coordinates(IDCs)extraction techniques.To begin,the closed-loop PM is transformed into an open-loop PM by eliminating constraints.A subassembly element is then introduced,which considers the flexibility of both rods and joints.This approach helps circumvent the numerical instability typically encountered with traditional constraint equations.The IDCs and analytical constraint equations of nodes constrained by various joints are summarized in the appendix,utilizing multipoint constraint theory and singularity analysis,all unified within a single coordinate frame.Subsequently,the open-loop mechanism is efficiently closed by referencing the constraint equations presented in the appendix,alongside its elastostatic model.The proposed method proves to be both modeling and computationally efficient due to the comprehensive summary of the constraint equations in the Appendix,eliminating the need for additional equations.An example utilizing an over constrained subclosed loops demonstrate the application of the proposed method.In conclusion,the model proposed in this study enriches the theory of elastostatic stiffness modeling of PMs and provides an effective solution for stiffness modeling challenges they present.