There are some new results about photovoltaic transient response in the new effect. We suggest a theoretical model to explain the effect reasonably. The theoretical calculation results agree with that in experiments.
A code developed recently by the authors, for counting and computing the eigenvalues of a complex tridiagonal matrix, as well as the roots of a complex polynomial, which lie in a given region of the complex plane, is ...A code developed recently by the authors, for counting and computing the eigenvalues of a complex tridiagonal matrix, as well as the roots of a complex polynomial, which lie in a given region of the complex plane, is modified to run in parallel on multi-core machines. A basic characteristic of this code (eventually pointing to its parallelization) is that it can proceed with: 1) partitioning the given region into an appropriate number of subregions;2) counting eigenvalues in each subregion;and 3) computing (already counted) eigenvalues in each subregion. Consequently, theoretically speaking, the whole code in itself parallelizes ideally. We carry out several numerical experiments with random complex tridiagonal matrices, and random complex polynomials as well, in order to study the behaviour of the parallel code, especially the degree of declination from theoretical expectations.展开更多
The Fourier transform is very important to numerous applications in science and engineering. However, its usefulness is hampered by its computational expense. In this paper, in an attempt to develop a faster method fo...The Fourier transform is very important to numerous applications in science and engineering. However, its usefulness is hampered by its computational expense. In this paper, in an attempt to develop a faster method for computing Fourier transforms, the authors present parallel implementations of two new algorithms developed for the type IV Discrete Cosine Transform (DCT-IV) which support the new interleaved fast Fourier transform method. The authors discuss the realizations of their implementations using two paradigms. The first involved commodity equipment and the Message-Passing Interface (MPI) library. The second utilized the RapidMind development platform and the Cell Broadband Engine (BE) processor. These experiments indicate that the authors' rotation-based algorithm is preferable to their lifting-based algorithm on the platforms tested, with increased efficiency demonstrated by their MPI implementation for large data sets. Finally, the authors outline future work by discussing an architecture-oriented method for computing DCT-IVs which promises further optimization. The results indicate a promising fresh direction in the search for efficient ways to compute Fourier transforms.展开更多
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp...The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.展开更多
Hydraulic-electric rock fragmentation(HERF)plays a significant role in improving the efficiency of high voltage pulse rock breaking.However,the underlying mechanism of HERF remains unclear.In this study,considering th...Hydraulic-electric rock fragmentation(HERF)plays a significant role in improving the efficiency of high voltage pulse rock breaking.However,the underlying mechanism of HERF remains unclear.In this study,considering the heterogeneity of the rock,microscopic thermodynamic properties,and shockwave time domain waveforms,based on the shockwave model,digital imaging technology and the discrete element method,the cyclic loading numerical simulations of HERF is achieved by coupling electrical,thermal,and solid mechanics under different formation temperatures,confining pressure,initial peak voltage,electrode bit diameter,and loading times.Meanwhile,the HERF discharge system is conducive to the laboratory experiments with various electrical parameters and the resulting broken pits are numerically reconstructed to obtain the geometric parameters.The results show that,the completely broken area consists of powdery rock debris.In the pre-broken zone,the mineral cementation of the rock determines the transition of type CⅠcracks to type CⅡand type CⅢcracks.Furthermore,the peak pressure of the shockwave increased with initial peak voltage but decreased with electrode bit diameter,while the wave front time reduced.Moreover,increasing well depth,formation temperature and confining pressure augment and inhibit HERF,but once confining pressure surpassed the threshold of 60 MPa for 152.40,215.90,and 228.60 mm electrode bits,and 40 MPa for 309.88 mm electrode bits,HERF is promoted.Additionally,for the same kind of rock,the volume and width of the broken pit increase with higher initial peak voltage and rock fissures will promote HERF.Eventually,the electrode drill bit with a 215.90 mm diameter is more suitable for drilling pink granite.This research contributes to a better microscopic understanding of HERF and provides valuable insights for electrode bit selection,as well as the optimization of circuit parameters for HERF technology.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
Most of the neural network architectures are based on human experience,which requires a long and tedious trial-and-error process.Neural architecture search(NAS)attempts to detect effective architectures without human ...Most of the neural network architectures are based on human experience,which requires a long and tedious trial-and-error process.Neural architecture search(NAS)attempts to detect effective architectures without human intervention.Evolutionary algorithms(EAs)for NAS can find better solutions than human-designed architectures by exploring a large search space for possible architectures.Using multiobjective EAs for NAS,optimal neural architectures that meet various performance criteria can be explored and discovered efficiently.Furthermore,hardware-accelerated NAS methods can improve the efficiency of the NAS.While existing reviews have mainly focused on different strategies to complete NAS,a few studies have explored the use of EAs for NAS.In this paper,we summarize and explore the use of EAs for NAS,as well as large-scale multiobjective optimization strategies and hardware-accelerated NAS methods.NAS performs well in healthcare applications,such as medical image analysis,classification of disease diagnosis,and health monitoring.EAs for NAS can automate the search process and optimize multiple objectives simultaneously in a given healthcare task.Deep neural network has been successfully used in healthcare,but it lacks interpretability.Medical data is highly sensitive,and privacy leaks are frequently reported in the healthcare industry.To solve these problems,in healthcare,we propose an interpretable neuroevolution framework based on federated learning to address search efficiency and privacy protection.Moreover,we also point out future research directions for evolutionary NAS.Overall,for researchers who want to use EAs to optimize NNs in healthcare,we analyze the advantages and disadvantages of doing so to provide detailed guidance,and propose an interpretable privacy-preserving framework for healthcare applications.展开更多
Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advecti...Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advection process within porous structures is essential for material design.In this study,we present advancements in the volumetric lattice Boltzmann method(VLBM)for modeling and simulating pore-scale diffusion-advection of radioactive isotopes within geopolymer porous structures.These structures are created using the phase field method(PFM)to precisely control pore architectures.In our VLBM approach,we introduce a concentration field of an isotope seamlessly coupled with the velocity field and solve it by the time evolution of its particle population function.To address the computational intensity inherent in the coupled lattice Boltzmann equations for velocity and concentration fields,we implement graphics processing unit(GPU)parallelization.Validation of the developed model involves examining the flow and diffusion fields in porous structures.Remarkably,good agreement is observed for both the velocity field from VLBM and multiphysics object-oriented simulation environment(MOOSE),and the concentration field from VLBM and the finite difference method(FDM).Furthermore,we investigate the effects of background flow,species diffusivity,and porosity on the diffusion-advection behavior by varying the background flow velocity,diffusion coefficient,and pore volume fraction,respectively.Notably,all three parameters exert an influence on the diffusion-advection process.Increased background flow and diffusivity markedly accelerate the process due to increased advection intensity and enhanced diffusion capability,respectively.Conversely,increasing the porosity has a less significant effect,causing a slight slowdown of the diffusion-advection process due to the expanded pore volume.This comprehensive parametric study provides valuable insights into the kinetics of isotope uptake in porous structures,facilitating the development of porous materials for nuclear waste treatment applications.展开更多
This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study ...The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.展开更多
The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-...The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem.展开更多
Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate th...Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms.展开更多
The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the hum...The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification.展开更多
Efficiency of calculating a dynamic response is an important point of the compliant mechanism for posture adjustment.Dynamic modeling with low orders of a 2R1T compliant parallel mechanism is studied in the paper.The ...Efficiency of calculating a dynamic response is an important point of the compliant mechanism for posture adjustment.Dynamic modeling with low orders of a 2R1T compliant parallel mechanism is studied in the paper.The mechanism with two out-of-plane rotational and one lifting degrees of freedom(DoFs)plays an important role in posture adjustment.Based on elastic beam theory,the stiffness matrix and mass matrix of the beam element are established where the moment of inertia is considered.To improve solving efficiency,a dynamic model with low orders of the mechanism is established based on a modified modal synthesis method.Firstly,each branch of the RPR type mechanism is divided into a substructure.Subsequently,a set of hypothetical modes of each substructure is obtained based on the C-B method.Finally,dynamic equation of the whole mechanism is established by the substructure assembly.A dynamic experiment is conducted to verify the dynamic characteristics of the compliant mechanism.展开更多
Evapotranspiration is an important parameter used to characterize the water cycle of ecosystems.To under-stand the properties of the evapotranspiration and energy balance of a subalpine forest in the southeastern Qing...Evapotranspiration is an important parameter used to characterize the water cycle of ecosystems.To under-stand the properties of the evapotranspiration and energy balance of a subalpine forest in the southeastern Qinghai-Tibet Plateau,an open-path eddy covariance system was set up to monitor the forest from November 2020 to October 2021 in a core area of the Three Parallel Rivers in the Qing-hai-Tibet Plateau.The results show that the evapotranspira-tion peaked daily,the maximum occurring between 11:00 and 15:00.Environmental factors had significant effects on evapotranspiration,among them,net radiation the greatest(R^(2)=0.487),and relative humidity the least(R^(2)=0.001).The energy flux varied considerably in different seasons and sensible heat flux accounted for the main part of turbulent energy.The energy balance ratio in the dormant season was less than that in the growing season,and there is an energy imbalance at the site on an annual time scale.展开更多
The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direc...The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direction of the basic flows.By defining an energy functional,it is proven that plane parallel shear flows are unconditionally nonlinearly exponentially stable for tilted streamwise perturbation when the Reynolds number is below a certain critical value and the boundary conditions are either rigid or stress-free.In the case of stress-free boundaries,by taking advantage of the poloidal-toroidal decomposition of a solenoidal field to define energy functionals,it can be even shown that plane parallel shear flows are unconditionally nonlinearly exponentially stable for all Reynolds numbers,where the tilted perturbation can be either spanwise or streamwise.展开更多
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr...In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.展开更多
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par...The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.展开更多
The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cess...The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.展开更多
文摘There are some new results about photovoltaic transient response in the new effect. We suggest a theoretical model to explain the effect reasonably. The theoretical calculation results agree with that in experiments.
文摘A code developed recently by the authors, for counting and computing the eigenvalues of a complex tridiagonal matrix, as well as the roots of a complex polynomial, which lie in a given region of the complex plane, is modified to run in parallel on multi-core machines. A basic characteristic of this code (eventually pointing to its parallelization) is that it can proceed with: 1) partitioning the given region into an appropriate number of subregions;2) counting eigenvalues in each subregion;and 3) computing (already counted) eigenvalues in each subregion. Consequently, theoretically speaking, the whole code in itself parallelizes ideally. We carry out several numerical experiments with random complex tridiagonal matrices, and random complex polynomials as well, in order to study the behaviour of the parallel code, especially the degree of declination from theoretical expectations.
文摘The Fourier transform is very important to numerous applications in science and engineering. However, its usefulness is hampered by its computational expense. In this paper, in an attempt to develop a faster method for computing Fourier transforms, the authors present parallel implementations of two new algorithms developed for the type IV Discrete Cosine Transform (DCT-IV) which support the new interleaved fast Fourier transform method. The authors discuss the realizations of their implementations using two paradigms. The first involved commodity equipment and the Message-Passing Interface (MPI) library. The second utilized the RapidMind development platform and the Cell Broadband Engine (BE) processor. These experiments indicate that the authors' rotation-based algorithm is preferable to their lifting-based algorithm on the platforms tested, with increased efficiency demonstrated by their MPI implementation for large data sets. Finally, the authors outline future work by discussing an architecture-oriented method for computing DCT-IVs which promises further optimization. The results indicate a promising fresh direction in the search for efficient ways to compute Fourier transforms.
基金financially supported by the National Natural Science Foundation of China(Grant Nos.12072217 and 42077254)the Natural Science Foundation of Hunan Province,China(Grant No.2022JJ30567).
文摘The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.
基金supported by the National Natural Science Foundation of China(Nos.52034006,52004229,52225401,and 52274231)the Regional Innovation Cooperation Project of Sichuan Province(No.2022YFQ0059)+3 种基金Science and Technology Cooperation Project of the CNPC-SWPU Innovation Alliance(No.2020CX040301)Natural Science Foundation of Sichuan Province(No.2023NSFSC0431)Science and Technology Strategic Cooperation Project between Nanchong City and Southwest Petroleum University(No.SXHZ004)Research and innovation Fund for Graduate Students of Southwest Petroleum University(No.2022KYCX058).
文摘Hydraulic-electric rock fragmentation(HERF)plays a significant role in improving the efficiency of high voltage pulse rock breaking.However,the underlying mechanism of HERF remains unclear.In this study,considering the heterogeneity of the rock,microscopic thermodynamic properties,and shockwave time domain waveforms,based on the shockwave model,digital imaging technology and the discrete element method,the cyclic loading numerical simulations of HERF is achieved by coupling electrical,thermal,and solid mechanics under different formation temperatures,confining pressure,initial peak voltage,electrode bit diameter,and loading times.Meanwhile,the HERF discharge system is conducive to the laboratory experiments with various electrical parameters and the resulting broken pits are numerically reconstructed to obtain the geometric parameters.The results show that,the completely broken area consists of powdery rock debris.In the pre-broken zone,the mineral cementation of the rock determines the transition of type CⅠcracks to type CⅡand type CⅢcracks.Furthermore,the peak pressure of the shockwave increased with initial peak voltage but decreased with electrode bit diameter,while the wave front time reduced.Moreover,increasing well depth,formation temperature and confining pressure augment and inhibit HERF,but once confining pressure surpassed the threshold of 60 MPa for 152.40,215.90,and 228.60 mm electrode bits,and 40 MPa for 309.88 mm electrode bits,HERF is promoted.Additionally,for the same kind of rock,the volume and width of the broken pit increase with higher initial peak voltage and rock fissures will promote HERF.Eventually,the electrode drill bit with a 215.90 mm diameter is more suitable for drilling pink granite.This research contributes to a better microscopic understanding of HERF and provides valuable insights for electrode bit selection,as well as the optimization of circuit parameters for HERF technology.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
基金supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.61976242in part by the Natural Science Fund of Hebei Province for Distinguished Young Scholars under Grant No.F2021202010+2 种基金in part by the Fundamental Scientific Research Funds for Interdisciplinary Team of Hebei University of Technology under Grant No.JBKYTD2002funded by Science and Technology Project of Hebei Education Department under Grant No.JZX2023007supported by 2022 Interdisciplinary Postgraduate Training Program of Hebei University of Technology under Grant No.HEBUT-YXKJC-2022122.
文摘Most of the neural network architectures are based on human experience,which requires a long and tedious trial-and-error process.Neural architecture search(NAS)attempts to detect effective architectures without human intervention.Evolutionary algorithms(EAs)for NAS can find better solutions than human-designed architectures by exploring a large search space for possible architectures.Using multiobjective EAs for NAS,optimal neural architectures that meet various performance criteria can be explored and discovered efficiently.Furthermore,hardware-accelerated NAS methods can improve the efficiency of the NAS.While existing reviews have mainly focused on different strategies to complete NAS,a few studies have explored the use of EAs for NAS.In this paper,we summarize and explore the use of EAs for NAS,as well as large-scale multiobjective optimization strategies and hardware-accelerated NAS methods.NAS performs well in healthcare applications,such as medical image analysis,classification of disease diagnosis,and health monitoring.EAs for NAS can automate the search process and optimize multiple objectives simultaneously in a given healthcare task.Deep neural network has been successfully used in healthcare,but it lacks interpretability.Medical data is highly sensitive,and privacy leaks are frequently reported in the healthcare industry.To solve these problems,in healthcare,we propose an interpretable neuroevolution framework based on federated learning to address search efficiency and privacy protection.Moreover,we also point out future research directions for evolutionary NAS.Overall,for researchers who want to use EAs to optimize NNs in healthcare,we analyze the advantages and disadvantages of doing so to provide detailed guidance,and propose an interpretable privacy-preserving framework for healthcare applications.
基金supported as part of the Center for Hierarchical Waste Form Materials,an Energy Frontier Research Center funded by the U.S.Department of Energy,Office of Science,Basic Energy Sciences under Award No.DE-SC0016574.
文摘Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advection process within porous structures is essential for material design.In this study,we present advancements in the volumetric lattice Boltzmann method(VLBM)for modeling and simulating pore-scale diffusion-advection of radioactive isotopes within geopolymer porous structures.These structures are created using the phase field method(PFM)to precisely control pore architectures.In our VLBM approach,we introduce a concentration field of an isotope seamlessly coupled with the velocity field and solve it by the time evolution of its particle population function.To address the computational intensity inherent in the coupled lattice Boltzmann equations for velocity and concentration fields,we implement graphics processing unit(GPU)parallelization.Validation of the developed model involves examining the flow and diffusion fields in porous structures.Remarkably,good agreement is observed for both the velocity field from VLBM and multiphysics object-oriented simulation environment(MOOSE),and the concentration field from VLBM and the finite difference method(FDM).Furthermore,we investigate the effects of background flow,species diffusivity,and porosity on the diffusion-advection behavior by varying the background flow velocity,diffusion coefficient,and pore volume fraction,respectively.Notably,all three parameters exert an influence on the diffusion-advection process.Increased background flow and diffusivity markedly accelerate the process due to increased advection intensity and enhanced diffusion capability,respectively.Conversely,increasing the porosity has a less significant effect,causing a slight slowdown of the diffusion-advection process due to the expanded pore volume.This comprehensive parametric study provides valuable insights into the kinetics of isotope uptake in porous structures,facilitating the development of porous materials for nuclear waste treatment applications.
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金Supported by Key Scientific Research Platforms and Projects of Guangdong Regular Institutions of Higher Education of China(Grant No.2022KCXTD033)Guangdong Provincial Natural Science Foundation of China(Grant No.2023A1515012103)+1 种基金Guangdong Provincial Scientific Research Capacity Improvement Project of Key Developing Disciplines of China(Grant No.2021ZDJS084)National Natural Science Foundation of China(Grant No.52105009).
文摘The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.
基金supported by the National Key Research and Development Program of China(No.2020YFB1901900)the National Natural Science Foundation of China(Nos.U20B2011,12175138)the Shanghai Rising-Star Program。
文摘The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem.
基金Supported by National Natural Science Foundation of China (Grant Nos.51875495,U2037202)Hebei Provincial Science and Technology Project (Grant No.206Z1805G)。
文摘Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms.
基金Supported by National Natural Science Foundation of China(Grant No.52075145)S&T Program of Hebei Province of China(Grant Nos.20281805Z,E2020103001)Central Government Guides Basic Research Projects of Local Science and Technology Development Funds of China(Grant No.206Z1801G).
文摘The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification.
基金Supported by National Natural Science Foundation of China (Grant No.51975007)。
文摘Efficiency of calculating a dynamic response is an important point of the compliant mechanism for posture adjustment.Dynamic modeling with low orders of a 2R1T compliant parallel mechanism is studied in the paper.The mechanism with two out-of-plane rotational and one lifting degrees of freedom(DoFs)plays an important role in posture adjustment.Based on elastic beam theory,the stiffness matrix and mass matrix of the beam element are established where the moment of inertia is considered.To improve solving efficiency,a dynamic model with low orders of the mechanism is established based on a modified modal synthesis method.Firstly,each branch of the RPR type mechanism is divided into a substructure.Subsequently,a set of hypothetical modes of each substructure is obtained based on the C-B method.Finally,dynamic equation of the whole mechanism is established by the substructure assembly.A dynamic experiment is conducted to verify the dynamic characteristics of the compliant mechanism.
基金supported by the CAS"Light of West China"Program (2021XBZG-XBQNXZ-A-007)the National Natural Science Foundation of China (31971436)the State Key Laboratory of Cryospheric Science,Northwest Institute of Eco-Environment and Resources,Chinese Academy Sciences (SKLCS-OP-2021-06).
文摘Evapotranspiration is an important parameter used to characterize the water cycle of ecosystems.To under-stand the properties of the evapotranspiration and energy balance of a subalpine forest in the southeastern Qinghai-Tibet Plateau,an open-path eddy covariance system was set up to monitor the forest from November 2020 to October 2021 in a core area of the Three Parallel Rivers in the Qing-hai-Tibet Plateau.The results show that the evapotranspira-tion peaked daily,the maximum occurring between 11:00 and 15:00.Environmental factors had significant effects on evapotranspiration,among them,net radiation the greatest(R^(2)=0.487),and relative humidity the least(R^(2)=0.001).The energy flux varied considerably in different seasons and sensible heat flux accounted for the main part of turbulent energy.The energy balance ratio in the dormant season was less than that in the growing season,and there is an energy imbalance at the site on an annual time scale.
基金supported by the National Natural Science Foundation of China(21627813)。
文摘The nonlinear stability of plane parallel shear flows with respect to tilted perturbations is studied by energy methods.Tilted perturbation refers to the fact that perturbations form an angleθ∈(0,π/2)with the direction of the basic flows.By defining an energy functional,it is proven that plane parallel shear flows are unconditionally nonlinearly exponentially stable for tilted streamwise perturbation when the Reynolds number is below a certain critical value and the boundary conditions are either rigid or stress-free.In the case of stress-free boundaries,by taking advantage of the poloidal-toroidal decomposition of a solenoidal field to define energy functionals,it can be even shown that plane parallel shear flows are unconditionally nonlinearly exponentially stable for all Reynolds numbers,where the tilted perturbation can be either spanwise or streamwise.
基金supported by the fund from ShenyangMint Company Limited(No.20220056)Senior Talent Foundation of Jiangsu University(No.19JDG022)Taizhou City Double Innovation and Entrepreneurship Talent Program(No.Taizhou Human Resources Office[2022]No.22).
文摘In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.
基金the Deanship of Scientific Research at King Abdulaziz University,Jeddah,Saudi Arabia under the Grant No.RG-12-611-43.
文摘The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.
基金supported in part by the National Natural Science Foundation of China under Grant 61901128,62273109the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB510032).
文摘The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.