A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk ac...A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk according to the similar service time. Firstly, the files were sorted and stored at the set I in descending order in terms of their service time, then one disk of cluster node was selected randomly when the files were to be assigned, and at last the continuous files were taken orderly from the set I to the disk until the disk reached its load maximum. The experimental results show that the new strategy improves the performance by 20.2% when the load of the system is light and by 31.6% when the load is heavy. And the higher the data access rate, the more evident the improvement of the performance obtained by the heuristic file sorted assignment algorithm.展开更多
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp...The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.展开更多
This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate th...Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms.展开更多
The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the hum...The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification.展开更多
The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-...The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem.展开更多
The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Obj...The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.展开更多
The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study ...The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.展开更多
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr...In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.展开更多
The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cess...The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.展开更多
The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direc...The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direction of the basic flows.By defining an energy functional,it is proven that plane parallel shear flows are unconditionally nonlinearly exponentially stable for tilted streamwise perturbation when the Reynolds number is below a certain critical value and the boundary conditions are either rigid or stress-free.In the case of stress-free boundaries,by taking advantage of the poloidal-toroidal decomposition of a solenoidal field to define energy functionals,it can be even shown that plane parallel shear flows are unconditionally nonlinearly exponentially stable for all Reynolds numbers,where the tilted perturbation can be either spanwise or streamwise.展开更多
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par...The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.展开更多
Unrelated parallel machine scheduling problem(UPMSP)is a typical scheduling one and UPMSP with various reallife constraints such as additional resources has been widely studied;however,UPMSP with additional resources,...Unrelated parallel machine scheduling problem(UPMSP)is a typical scheduling one and UPMSP with various reallife constraints such as additional resources has been widely studied;however,UPMSP with additional resources,maintenance,and energy-related objectives is seldom investigated.The Artificial Bee Colony(ABC)algorithm has been successfully applied to various production scheduling problems and demonstrates potential search advantages in solving UPMSP with additional resources,among other factors.In this study,an energy-efficient UPMSP with additional resources and maintenance is considered.A dynamical artificial bee colony(DABC)algorithm is presented to minimize makespan and total energy consumption simultaneously.Three heuristics are applied to produce the initial population.Employed bee swarm and onlooker bee swarm are constructed.Computing resources are shifted from the dominated solutions to non-dominated solutions in each swarm when the given condition is met.Dynamical employed bee phase is implemented by computing resource shifting and solution migration.Computing resource shifting and feedback are used to construct dynamical onlooker bee phase.Computational experiments are conducted on 300 instances from the literature and three comparative algorithms and ABC are compared after parameter settings of all algorithms are given.The computational results demonstrate that the new strategies of DABC are effective and that DABC has promising advantages in solving the considered UPMSP.展开更多
Aiming at the problem that it is difficult to generate the dynamic decoupling equation of the parallel six-dimensional acceleration sensing mechanism,two typical parallel six-dimensional acceleration sensing mechanism...Aiming at the problem that it is difficult to generate the dynamic decoupling equation of the parallel six-dimensional acceleration sensing mechanism,two typical parallel six-dimensional acceleration sensing mechanisms are taken as examples.By analyzing the scale constraint relationship between the hinge points on the mass block and the hinge points on the base of the sensing mechanism,a new method for establishing the dynamic equation of the sensing mechanism is proposed.Firstly,based on the scale constraint relationship between the hinge points on the mass block and the hinge points on the base of the sensing mechanism,the expression of the branch rod length is obtained.The inherent constraint relationship between the branches is excavated and the branch coordination closed chain of the“12-6”configuration is constructed.The output coordination equation of the sensing mechanism is successfully derived.Secondly,the dynamic equations of“12-4”and“12-6”configurations are constructed by the Newton-Euler method,and the forward decoupling equations of the two configurations are solved by combining the dynamic equations and the output coordination equations.Finally,the virtual prototype experiment is carried out,and the maximum reference errors of the forward decoupling equations of the two configuration sensing mechanisms are 4.23%and 6.53%,respectively.The results show that the proposed method is effective and feasible,and meets the real-time requirements.展开更多
Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly...Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly heterogeneous appearance and shape.Deep convolution neural networks(CNNs)have recently improved glioma segmentation performance.However,extensive down-sampling such as pooling or stridden convolution in CNNs significantly decreases the initial image resolution,resulting in the loss of accurate spatial and object parts information,especially information on the small sub-region tumors,affecting segmentation performance.Hence,this paper proposes a novel multi-level parallel network comprising three different level parallel subnetworks to fully use low-level,mid-level,and high-level information and improve the performance of brain tumor segmentation.We also introduce the Combo loss function to address input class imbalance and false positives and negatives imbalance in deep learning.The proposed method is trained and validated on the BraTS 2020 training and validation dataset.On the validation dataset,ourmethod achieved a mean Dice score of 0.907,0.830,and 0.787 for the whole tumor,tumor core,and enhancing tumor core,respectively.Compared with state-of-the-art methods,the multi-level parallel network has achieved competitive results on the validation dataset.展开更多
This paper investigates the effective capacity of a point-to-point ultra-reliable low latency communication(URLLC)transmission over multiple parallel sub-channels at finite blocklength(FBL)with imperfect channel state...This paper investigates the effective capacity of a point-to-point ultra-reliable low latency communication(URLLC)transmission over multiple parallel sub-channels at finite blocklength(FBL)with imperfect channel state information(CSI).Based on reasonable assumptions and approximations,we derive the effective capacity as a function of the pilot length,decoding error probability,transmit power and the sub-channel number.Then we reveal significant impact of the above parameters on the effective capacity.A closed-form lower bound of the effective capacity is derived and an alternating optimization based algorithm is proposed to find the optimal pilot length and decoding error probability.Simulation results validate our theoretical analysis and show that the closedform lower bound is very tight.In addition,through the simulations of the optimized effective capacity,insights for pilot length and decoding error probability optimization are provided to evaluate the optimal parameters in realistic systems.展开更多
Low-Earth-Orbit satellite constellation networks(LEO-SCN)can provide low-cost,largescale,flexible coverage wireless communication services.High dynamics and large topological sizes characterize LEO-SCN.Protocol develo...Low-Earth-Orbit satellite constellation networks(LEO-SCN)can provide low-cost,largescale,flexible coverage wireless communication services.High dynamics and large topological sizes characterize LEO-SCN.Protocol development and application testing of LEO-SCN are challenging to carry out in a natural environment.Simulation platforms are a more effective means of technology demonstration.Currently available simulators have a single function and limited simulation scale.There needs to be a simulator for full-featured simulation.In this paper,we apply the parallel discrete-event simulation technique to the simulation of LEO-SCN to support large-scale complex system simulation at the packet level.To solve the problem that single-process programs cannot cope with complex simulations containing numerous entities,we propose a parallel mechanism and algorithms LP-NM and LP-YAWNS for synchronization.In the experiment,we use ns-3 to verify the acceleration ratio and efficiency of the above algorithms.The results show that our proposed mechanism can provide parallel simulation engine support for the LEO-SCN.展开更多
A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete...A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete wavelet transform.Then, the coefficient matrix is scrambled and compressed to obtain a size-reduced image using the Fisher–Yates shuffle and parallel compressive sensing. Subsequently, to increase the security of the proposed algorithm, the compressed image is re-encrypted through permutation and diffusion to obtain a noise-like secret image. Finally, an adaptive embedding method based on edge detection for different carrier images is proposed to generate a visually meaningful cipher image. To improve the plaintext sensitivity of the algorithm, the counter mode is combined with the hash function to generate keys for chaotic systems. Additionally, an effective permutation method is designed to scramble the pixels of the compressed image in the re-encryption stage. The simulation results and analyses demonstrate that the proposed algorithm performs well in terms of visual security and decryption quality.展开更多
文摘A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk according to the similar service time. Firstly, the files were sorted and stored at the set I in descending order in terms of their service time, then one disk of cluster node was selected randomly when the files were to be assigned, and at last the continuous files were taken orderly from the set I to the disk until the disk reached its load maximum. The experimental results show that the new strategy improves the performance by 20.2% when the load of the system is light and by 31.6% when the load is heavy. And the higher the data access rate, the more evident the improvement of the performance obtained by the heuristic file sorted assignment algorithm.
基金Acknowledgements: This work has been st, pported in part by the National High-Tech Research and Dcvelopment Plan of China under Gram No. 2002BA711A08 and by the Natural Science Foundation of Hunan Province under Grant No. 03JJY4054.
基金financially supported by the National Natural Science Foundation of China(Grant Nos.12072217 and 42077254)the Natural Science Foundation of Hunan Province,China(Grant No.2022JJ30567).
文摘The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金Supported by National Natural Science Foundation of China (Grant Nos.51875495,U2037202)Hebei Provincial Science and Technology Project (Grant No.206Z1805G)。
文摘Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms.
基金Supported by National Natural Science Foundation of China(Grant No.52075145)S&T Program of Hebei Province of China(Grant Nos.20281805Z,E2020103001)Central Government Guides Basic Research Projects of Local Science and Technology Development Funds of China(Grant No.206Z1801G).
文摘The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification.
基金supported by the National Key Research and Development Program of China(No.2020YFB1901900)the National Natural Science Foundation of China(Nos.U20B2011,12175138)the Shanghai Rising-Star Program。
文摘The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem.
文摘The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.
基金Supported by Key Scientific Research Platforms and Projects of Guangdong Regular Institutions of Higher Education of China(Grant No.2022KCXTD033)Guangdong Provincial Natural Science Foundation of China(Grant No.2023A1515012103)+1 种基金Guangdong Provincial Scientific Research Capacity Improvement Project of Key Developing Disciplines of China(Grant No.2021ZDJS084)National Natural Science Foundation of China(Grant No.52105009).
文摘The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.
基金supported by the fund from ShenyangMint Company Limited(No.20220056)Senior Talent Foundation of Jiangsu University(No.19JDG022)Taizhou City Double Innovation and Entrepreneurship Talent Program(No.Taizhou Human Resources Office[2022]No.22).
文摘In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.
基金supported in part by the National Natural Science Foundation of China under Grant 61901128,62273109the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB510032).
文摘The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.
基金supported by the National Natural Science Foundation of China(21627813)。
文摘The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direction of the basic flows.By defining an energy functional,it is proven that plane parallel shear flows are unconditionally nonlinearly exponentially stable for tilted streamwise perturbation when the Reynolds number is below a certain critical value and the boundary conditions are either rigid or stress-free.In the case of stress-free boundaries,by taking advantage of the poloidal-toroidal decomposition of a solenoidal field to define energy functionals,it can be even shown that plane parallel shear flows are unconditionally nonlinearly exponentially stable for all Reynolds numbers,where the tilted perturbation can be either spanwise or streamwise.
基金the Deanship of Scientific Research at King Abdulaziz University,Jeddah,Saudi Arabia under the Grant No.RG-12-611-43.
文摘The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.
基金the National Natural Science Foundation of China(grant number 61573264)。
文摘Unrelated parallel machine scheduling problem(UPMSP)is a typical scheduling one and UPMSP with various reallife constraints such as additional resources has been widely studied;however,UPMSP with additional resources,maintenance,and energy-related objectives is seldom investigated.The Artificial Bee Colony(ABC)algorithm has been successfully applied to various production scheduling problems and demonstrates potential search advantages in solving UPMSP with additional resources,among other factors.In this study,an energy-efficient UPMSP with additional resources and maintenance is considered.A dynamical artificial bee colony(DABC)algorithm is presented to minimize makespan and total energy consumption simultaneously.Three heuristics are applied to produce the initial population.Employed bee swarm and onlooker bee swarm are constructed.Computing resources are shifted from the dominated solutions to non-dominated solutions in each swarm when the given condition is met.Dynamical employed bee phase is implemented by computing resource shifting and solution migration.Computing resource shifting and feedback are used to construct dynamical onlooker bee phase.Computational experiments are conducted on 300 instances from the literature and three comparative algorithms and ABC are compared after parameter settings of all algorithms are given.The computational results demonstrate that the new strategies of DABC are effective and that DABC has promising advantages in solving the considered UPMSP.
基金supported in part by the National Natural Science Foundation of China(No.51405237)。
文摘Aiming at the problem that it is difficult to generate the dynamic decoupling equation of the parallel six-dimensional acceleration sensing mechanism,two typical parallel six-dimensional acceleration sensing mechanisms are taken as examples.By analyzing the scale constraint relationship between the hinge points on the mass block and the hinge points on the base of the sensing mechanism,a new method for establishing the dynamic equation of the sensing mechanism is proposed.Firstly,based on the scale constraint relationship between the hinge points on the mass block and the hinge points on the base of the sensing mechanism,the expression of the branch rod length is obtained.The inherent constraint relationship between the branches is excavated and the branch coordination closed chain of the“12-6”configuration is constructed.The output coordination equation of the sensing mechanism is successfully derived.Secondly,the dynamic equations of“12-4”and“12-6”configurations are constructed by the Newton-Euler method,and the forward decoupling equations of the two configurations are solved by combining the dynamic equations and the output coordination equations.Finally,the virtual prototype experiment is carried out,and the maximum reference errors of the forward decoupling equations of the two configuration sensing mechanisms are 4.23%and 6.53%,respectively.The results show that the proposed method is effective and feasible,and meets the real-time requirements.
基金supported by the Sichuan Science and Technology Program (No.2019YJ0356).
文摘Accurate automatic segmentation of gliomas in various sub-regions,including peritumoral edema,necrotic core,and enhancing and non-enhancing tumor core from 3D multimodal MRI images,is challenging because of its highly heterogeneous appearance and shape.Deep convolution neural networks(CNNs)have recently improved glioma segmentation performance.However,extensive down-sampling such as pooling or stridden convolution in CNNs significantly decreases the initial image resolution,resulting in the loss of accurate spatial and object parts information,especially information on the small sub-region tumors,affecting segmentation performance.Hence,this paper proposes a novel multi-level parallel network comprising three different level parallel subnetworks to fully use low-level,mid-level,and high-level information and improve the performance of brain tumor segmentation.We also introduce the Combo loss function to address input class imbalance and false positives and negatives imbalance in deep learning.The proposed method is trained and validated on the BraTS 2020 training and validation dataset.On the validation dataset,ourmethod achieved a mean Dice score of 0.907,0.830,and 0.787 for the whole tumor,tumor core,and enhancing tumor core,respectively.Compared with state-of-the-art methods,the multi-level parallel network has achieved competitive results on the validation dataset.
基金supported by the National Natural Science Foundation of China under grant 61941106。
文摘This paper investigates the effective capacity of a point-to-point ultra-reliable low latency communication(URLLC)transmission over multiple parallel sub-channels at finite blocklength(FBL)with imperfect channel state information(CSI).Based on reasonable assumptions and approximations,we derive the effective capacity as a function of the pilot length,decoding error probability,transmit power and the sub-channel number.Then we reveal significant impact of the above parameters on the effective capacity.A closed-form lower bound of the effective capacity is derived and an alternating optimization based algorithm is proposed to find the optimal pilot length and decoding error probability.Simulation results validate our theoretical analysis and show that the closedform lower bound is very tight.In addition,through the simulations of the optimized effective capacity,insights for pilot length and decoding error probability optimization are provided to evaluate the optimal parameters in realistic systems.
基金supported by Jiangsu Provincial Key Research and Development Program (No.BE20210132)the Zhejiang Provincial Key Research and Development Program (No.2021C01040)the team of S-SET
文摘Low-Earth-Orbit satellite constellation networks(LEO-SCN)can provide low-cost,largescale,flexible coverage wireless communication services.High dynamics and large topological sizes characterize LEO-SCN.Protocol development and application testing of LEO-SCN are challenging to carry out in a natural environment.Simulation platforms are a more effective means of technology demonstration.Currently available simulators have a single function and limited simulation scale.There needs to be a simulator for full-featured simulation.In this paper,we apply the parallel discrete-event simulation technique to the simulation of LEO-SCN to support large-scale complex system simulation at the packet level.To solve the problem that single-process programs cannot cope with complex simulations containing numerous entities,we propose a parallel mechanism and algorithms LP-NM and LP-YAWNS for synchronization.In the experiment,we use ns-3 to verify the acceleration ratio and efficiency of the above algorithms.The results show that our proposed mechanism can provide parallel simulation engine support for the LEO-SCN.
基金supported by the Key Area R&D Program of Guangdong Province (Grant No.2022B0701180001)the National Natural Science Foundation of China (Grant No.61801127)+1 种基金the Science Technology Planning Project of Guangdong Province,China (Grant Nos.2019B010140002 and 2020B111110002)the Guangdong-Hong Kong-Macao Joint Innovation Field Project (Grant No.2021A0505080006)。
文摘A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete wavelet transform.Then, the coefficient matrix is scrambled and compressed to obtain a size-reduced image using the Fisher–Yates shuffle and parallel compressive sensing. Subsequently, to increase the security of the proposed algorithm, the compressed image is re-encrypted through permutation and diffusion to obtain a noise-like secret image. Finally, an adaptive embedding method based on edge detection for different carrier images is proposed to generate a visually meaningful cipher image. To improve the plaintext sensitivity of the algorithm, the counter mode is combined with the hash function to generate keys for chaotic systems. Additionally, an effective permutation method is designed to scramble the pixels of the compressed image in the re-encryption stage. The simulation results and analyses demonstrate that the proposed algorithm performs well in terms of visual security and decryption quality.