Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes...Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware.展开更多
It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one m...It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one mobile platform.This paper proposes HOPE,an end-to-end heterogeneous inference framework running on mobile platforms to distribute the operators in a DNN model to different computing devices.The problem is formalized into an integer linear programming(ILP)problem and a heuristic algorithm is proposed to determine the near-optimal heterogeneous execution plan.The experimental results demonstrate that HOPE can reduce up to 36.2%inference latency(with an average of 22.0%)than MOSAIC,22.0%(with an average of 10.2%)than StarPU and 41.8%(with an average of 18.4%)thanμLayer respectively.展开更多
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp...The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.展开更多
Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating du...Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating due to the small size of datasets while mapping the relative importance of properties to the model response.This paper proposes an augmented Bayesian multi-model inference(BMMI)coupled with GSA methodology(BMMI-GSA)to address this issue by estimating the imprecision in the momentindependent sensitivity indices of rock structures arising from the small size of input data.The methodology employs BMMI to quantify the epistemic uncertainties associated with model type and parameters of input properties.The estimated uncertainties are propagated in estimating imprecision in moment-independent Borgonovo’s indices by employing a reweighting approach on candidate probabilistic models.The proposed methodology is showcased for a rock slope prone to stress-controlled failure in the Himalayan region of India.The proposed methodology was superior to the conventional GSA(neglects all epistemic uncertainties)and Bayesian coupled GSA(B-GSA)(neglects model uncertainty)due to its capability to incorporate the uncertainties in both model type and parameters of properties.Imprecise Borgonovo’s indices estimated via proposed methodology provide the confidence intervals of the sensitivity indices instead of their fixed-point estimates,which makes the user more informed in the data collection efforts.Analyses performed with the varying sample sizes suggested that the uncertainties in sensitivity indices reduce significantly with the increasing sample sizes.The accurate importance ranking of properties was only possible via samples of large sizes.Further,the impact of the prior knowledge in terms of prior ranges and distributions was significant;hence,any related assumption should be made carefully.展开更多
An accurate plasma current profile has irreplaceable value for the steady-state operation of the plasma.In this study,plasma current tomography based on Bayesian inference is applied to an HL-2A device and used to rec...An accurate plasma current profile has irreplaceable value for the steady-state operation of the plasma.In this study,plasma current tomography based on Bayesian inference is applied to an HL-2A device and used to reconstruct the plasma current profile.Two different Bayesian probability priors are tried,namely the Conditional Auto Regressive(CAR)prior and the Advanced Squared Exponential(ASE)kernel prior.Compared to the CAR prior,the ASE kernel prior adopts nonstationary hyperparameters and introduces the current profile of the reference discharge into the hyperparameters,which can make the shape of the current profile more flexible in space.The results indicate that the ASE prior couples more information,reduces the probability of unreasonable solutions,and achieves higher reconstruction accuracy.展开更多
Recently,deep learning-based semantic communication has garnered widespread attention,with numerous systems designed for transmitting diverse data sources,including text,image,and speech,etc.While efforts have been di...Recently,deep learning-based semantic communication has garnered widespread attention,with numerous systems designed for transmitting diverse data sources,including text,image,and speech,etc.While efforts have been directed toward improving system performance,many studies have concentrated on enhancing the structure of the encoder and decoder.However,this often overlooks the resulting increase in model complexity,imposing additional storage and computational burdens on smart devices.Furthermore,existing work tends to prioritize explicit semantics,neglecting the potential of implicit semantics.This paper aims to easily and effectively enhance the receiver's decoding capability without modifying the encoder and decoder structures.We propose a novel semantic communication system with variational neural inference for text transmission.Specifically,we introduce a simple but effective variational neural inferer at the receiver to infer the latent semantic information within the received text.This information is then utilized to assist in the decoding process.The simulation results show a significant enhancement in system performance and improved robustness.展开更多
The Stokes production coefficient(E_(6))constitutes a critical parameter within the Mellor-Yamada type(MY-type)Langmuir turbulence(LT)parameterization schemes,significantly affecting the simulation of turbulent kineti...The Stokes production coefficient(E_(6))constitutes a critical parameter within the Mellor-Yamada type(MY-type)Langmuir turbulence(LT)parameterization schemes,significantly affecting the simulation of turbulent kinetic energy,turbulent length scale,and vertical diffusivity coefficient for turbulent kinetic energy in the upper ocean.However,the accurate determination of its value remains a pressing scientific challenge.This study adopted an innovative approach by leveraging deep learning technology to address this challenge of inferring the E_(6).Through the integration of the information of the turbulent length scale equation into a physical-informed neural network(PINN),we achieved an accurate and physically meaningful inference of E_(6).Multiple cases were examined to assess the feasibility of PINN in this task,revealing that under optimal settings,the average mean squared error of the E_(6) inference was only 0.01,attesting to the effectiveness of PINN.The optimal hyperparameter combination was identified using the Tanh activation function,along with a spatiotemporal sampling interval of 1 s and 0.1 m.This resulted in a substantial reduction in the average bias of the E_(6) inference,ranging from O(10^(1))to O(10^(2))times compared with other combinations.This study underscores the potential application of PINN in intricate marine environments,offering a novel and efficient method for optimizing MY-type LT parameterization schemes.展开更多
Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully superv...Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully supervised salient object detectors because the scribble annotation can only provide very limited foreground/background information.Therefore,an intuitive idea is to infer annotations that cover more complete object and background regions for training.To this end,a label inference strategy is proposed based on the assumption that pixels with similar colours and close positions should have consistent labels.Specifically,k-means clustering algorithm was first performed on both colours and coordinates of original annotations,and then assigned the same labels to points having similar colours with colour cluster centres and near coordinate cluster centres.Next,the same annotations for pixels with similar colours within each kernel neighbourhood was set further.Extensive experiments on six benchmarks demonstrate that our method can significantly improve the performance and achieve the state-of-the-art results.展开更多
Modern industrial processes are typically characterized by large-scale and intricate internal relationships.Therefore,the distributed modeling process monitoring method is effective.A novel distributed monitoring sche...Modern industrial processes are typically characterized by large-scale and intricate internal relationships.Therefore,the distributed modeling process monitoring method is effective.A novel distributed monitoring scheme utilizing the Kantorovich distance-multiblock variational autoencoder(KD-MBVAE)is introduced.Firstly,given the high consistency of relevant variables within each sub-block during the change process,the variables exhibiting analogous statistical features are grouped into identical segments according to the optimal quality transfer theory.Subsequently,the variational autoencoder(VAE)model was separately established,and corresponding T^(2)statistics were calculated.To improve fault sensitivity further,a novel statistic,derived from Kantorovich distance,is introduced by analyzing model residuals from the perspective of probability distribution.The thresholds of both statistics were determined by kernel density estimation.Finally,monitoring results for both types of statistics within all blocks are amalgamated using Bayesian inference.Additionally,a novel approach for fault diagnosis is introduced.The feasibility and efficiency of the introduced scheme are verified through two cases.展开更多
Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the intro...Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.展开更多
This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate th...Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms.展开更多
The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the hum...The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification.展开更多
The development of the Internet of Things(IoT)has brought great convenience to people.However,some information security problems such as privacy leakage are caused by communicating with risky users.It is a challenge t...The development of the Internet of Things(IoT)has brought great convenience to people.However,some information security problems such as privacy leakage are caused by communicating with risky users.It is a challenge to choose reliable users with which to interact in the IoT.Therefore,trust plays a crucial role in the IoT because trust may avoid some risks.Agents usually choose reliable users with high trust to maximize their own interests based on reinforcement learning.However,trust propagation is time-consuming,and trust changes with the interaction process in social networks.To track the dynamic changes in trust values,a dynamic trust inference algorithm named Dynamic Double DQN Trust(Dy-DDQNTrust)is proposed to predict the indirect trust values of two users without direct contact with each other.The proposed algorithm simulates the interactions among users by double DQN.Firstly,CurrentNet and TargetNet networks are used to select users for interaction.The users with high trust are chosen to interact in future iterations.Secondly,the trust value is updated dynamically until a reliable trust path is found according to the result of the interaction.Finally,the trust value between indirect users is inferred by aggregating the opinions from multiple users through a Modified Collaborative Filtering Averagebased Similarity(SMCFAvg)aggregation strategy.Experiments are carried out on the FilmTrust and the Epinions datasets.Compared with TidalTrust,MoleTrust,DDQNTrust,DyTrust and Dynamic Weighted Heuristic trust path Search algorithm(DWHS),our dynamic trust inference algorithm has higher prediction accuracy and better scalability.展开更多
The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-...The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem.展开更多
The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Obj...The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.展开更多
The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study ...The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.展开更多
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr...In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.展开更多
The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cess...The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.展开更多
文摘Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware.
基金Supported by the General Program of National Natural Science Foundation of China(No.61872043)。
文摘It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one mobile platform.This paper proposes HOPE,an end-to-end heterogeneous inference framework running on mobile platforms to distribute the operators in a DNN model to different computing devices.The problem is formalized into an integer linear programming(ILP)problem and a heuristic algorithm is proposed to determine the near-optimal heterogeneous execution plan.The experimental results demonstrate that HOPE can reduce up to 36.2%inference latency(with an average of 22.0%)than MOSAIC,22.0%(with an average of 10.2%)than StarPU and 41.8%(with an average of 18.4%)thanμLayer respectively.
基金financially supported by the National Natural Science Foundation of China(Grant Nos.12072217 and 42077254)the Natural Science Foundation of Hunan Province,China(Grant No.2022JJ30567).
文摘The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.
文摘Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating due to the small size of datasets while mapping the relative importance of properties to the model response.This paper proposes an augmented Bayesian multi-model inference(BMMI)coupled with GSA methodology(BMMI-GSA)to address this issue by estimating the imprecision in the momentindependent sensitivity indices of rock structures arising from the small size of input data.The methodology employs BMMI to quantify the epistemic uncertainties associated with model type and parameters of input properties.The estimated uncertainties are propagated in estimating imprecision in moment-independent Borgonovo’s indices by employing a reweighting approach on candidate probabilistic models.The proposed methodology is showcased for a rock slope prone to stress-controlled failure in the Himalayan region of India.The proposed methodology was superior to the conventional GSA(neglects all epistemic uncertainties)and Bayesian coupled GSA(B-GSA)(neglects model uncertainty)due to its capability to incorporate the uncertainties in both model type and parameters of properties.Imprecise Borgonovo’s indices estimated via proposed methodology provide the confidence intervals of the sensitivity indices instead of their fixed-point estimates,which makes the user more informed in the data collection efforts.Analyses performed with the varying sample sizes suggested that the uncertainties in sensitivity indices reduce significantly with the increasing sample sizes.The accurate importance ranking of properties was only possible via samples of large sizes.Further,the impact of the prior knowledge in terms of prior ranges and distributions was significant;hence,any related assumption should be made carefully.
基金supported by the National MCF Energy R&D Program of China (Nos. 2018 YFE0301105, 2022YFE03010002 and 2018YFE0302100)the National Key R&D Program of China (Nos. 2022YFE03070004 and 2022YFE03070000)National Natural Science Foundation of China (Nos. 12205195, 12075155 and 11975277)
文摘An accurate plasma current profile has irreplaceable value for the steady-state operation of the plasma.In this study,plasma current tomography based on Bayesian inference is applied to an HL-2A device and used to reconstruct the plasma current profile.Two different Bayesian probability priors are tried,namely the Conditional Auto Regressive(CAR)prior and the Advanced Squared Exponential(ASE)kernel prior.Compared to the CAR prior,the ASE kernel prior adopts nonstationary hyperparameters and introduces the current profile of the reference discharge into the hyperparameters,which can make the shape of the current profile more flexible in space.The results indicate that the ASE prior couples more information,reduces the probability of unreasonable solutions,and achieves higher reconstruction accuracy.
基金supported in part by the National Science Foundation of China(NSFC)with grant no.62271514in part by the Science,Technology and Innovation Commission of Shenzhen Municipality with grant no.JCYJ20210324120002007 and ZDSYS20210623091807023in part by the State Key Laboratory of Public Big Data with grant no.PBD2023-01。
文摘Recently,deep learning-based semantic communication has garnered widespread attention,with numerous systems designed for transmitting diverse data sources,including text,image,and speech,etc.While efforts have been directed toward improving system performance,many studies have concentrated on enhancing the structure of the encoder and decoder.However,this often overlooks the resulting increase in model complexity,imposing additional storage and computational burdens on smart devices.Furthermore,existing work tends to prioritize explicit semantics,neglecting the potential of implicit semantics.This paper aims to easily and effectively enhance the receiver's decoding capability without modifying the encoder and decoder structures.We propose a novel semantic communication system with variational neural inference for text transmission.Specifically,we introduce a simple but effective variational neural inferer at the receiver to infer the latent semantic information within the received text.This information is then utilized to assist in the decoding process.The simulation results show a significant enhancement in system performance and improved robustness.
基金The National Key Research and Development Program of China under contract No.2022YFC3105002the National Natural Science Foundation of China under contract No.42176020the project from the Key Laboratory of Marine Environmental Information Technology,Ministry of Natural Resources,under contract No.2023GFW-1047.
文摘The Stokes production coefficient(E_(6))constitutes a critical parameter within the Mellor-Yamada type(MY-type)Langmuir turbulence(LT)parameterization schemes,significantly affecting the simulation of turbulent kinetic energy,turbulent length scale,and vertical diffusivity coefficient for turbulent kinetic energy in the upper ocean.However,the accurate determination of its value remains a pressing scientific challenge.This study adopted an innovative approach by leveraging deep learning technology to address this challenge of inferring the E_(6).Through the integration of the information of the turbulent length scale equation into a physical-informed neural network(PINN),we achieved an accurate and physically meaningful inference of E_(6).Multiple cases were examined to assess the feasibility of PINN in this task,revealing that under optimal settings,the average mean squared error of the E_(6) inference was only 0.01,attesting to the effectiveness of PINN.The optimal hyperparameter combination was identified using the Tanh activation function,along with a spatiotemporal sampling interval of 1 s and 0.1 m.This resulted in a substantial reduction in the average bias of the E_(6) inference,ranging from O(10^(1))to O(10^(2))times compared with other combinations.This study underscores the potential application of PINN in intricate marine environments,offering a novel and efficient method for optimizing MY-type LT parameterization schemes.
文摘Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully supervised salient object detectors because the scribble annotation can only provide very limited foreground/background information.Therefore,an intuitive idea is to infer annotations that cover more complete object and background regions for training.To this end,a label inference strategy is proposed based on the assumption that pixels with similar colours and close positions should have consistent labels.Specifically,k-means clustering algorithm was first performed on both colours and coordinates of original annotations,and then assigned the same labels to points having similar colours with colour cluster centres and near coordinate cluster centres.Next,the same annotations for pixels with similar colours within each kernel neighbourhood was set further.Extensive experiments on six benchmarks demonstrate that our method can significantly improve the performance and achieve the state-of-the-art results.
基金support from the National Key Research&Development Program of China(2021YFC2101100)the National Natural Science Foundation of China(62322309,61973119).
文摘Modern industrial processes are typically characterized by large-scale and intricate internal relationships.Therefore,the distributed modeling process monitoring method is effective.A novel distributed monitoring scheme utilizing the Kantorovich distance-multiblock variational autoencoder(KD-MBVAE)is introduced.Firstly,given the high consistency of relevant variables within each sub-block during the change process,the variables exhibiting analogous statistical features are grouped into identical segments according to the optimal quality transfer theory.Subsequently,the variational autoencoder(VAE)model was separately established,and corresponding T^(2)statistics were calculated.To improve fault sensitivity further,a novel statistic,derived from Kantorovich distance,is introduced by analyzing model residuals from the perspective of probability distribution.The thresholds of both statistics were determined by kernel density estimation.Finally,monitoring results for both types of statistics within all blocks are amalgamated using Bayesian inference.Additionally,a novel approach for fault diagnosis is introduced.The feasibility and efficiency of the introduced scheme are verified through two cases.
基金National College Students’Training Programs of Innovation and Entrepreneurship,Grant/Award Number:S202210022060the CACMS Innovation Fund,Grant/Award Number:CI2021A00512the National Nature Science Foundation of China under Grant,Grant/Award Number:62206021。
文摘Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金Supported by National Natural Science Foundation of China (Grant Nos.51875495,U2037202)Hebei Provincial Science and Technology Project (Grant No.206Z1805G)。
文摘Currently,two rotations and one translation(2R1T)three-degree-of-freedom(DOF)parallel mechanisms(PMs)are widely applied in five-DOF hybrid machining robots.However,there is a lack of an effective method to evaluate the configuration stiffness of mechanisms during the mechanism design stage.It is a challenge to select appropriate 2R1T PMs with excellent stiffness performance during the design stage.Considering the operational status of 2R1T PMs,the bending and torsional stiffness are considered as indices to evaluate PMs'configuration stiffness.Subsequently,a specific method is proposed to calculate these stiffness indices.Initially,the various types of structural and driving stiffness for each branch are assessed and their specific values defined.Subsequently,a rigid-flexible coupled force model for the over-constrained 2R1T PM is established,and the proposed evaluation method is used to analyze the configuration stiffness of the five 2R1T PMs in the entire workspace.Finally,the driving force and constraint force of each branch in the whole working space are calculated to further elucidate the stiffness evaluating results by using the proposed method above.The obtained results demonstrate that the bending and torsional stiffness of the 2RPU/UPR/RPR mechanism along the x and y-directions are larger than the other four mechanisms.
基金Supported by National Natural Science Foundation of China(Grant No.52075145)S&T Program of Hebei Province of China(Grant Nos.20281805Z,E2020103001)Central Government Guides Basic Research Projects of Local Science and Technology Development Funds of China(Grant No.206Z1801G).
文摘The kinematic equivalent model of an existing ankle-rehabilitation robot is inconsistent with the anatomical structure of the human ankle,which influences the rehabilitation effect.Therefore,this study equates the human ankle to the UR model and proposes a novel three degrees of freedom(3-DOF)generalized spherical parallel mechanism for ankle rehabilitation.The parallel mechanism has two spherical centers corresponding to the rotation centers of tibiotalar and subtalar joints.Using screw theory,the mobility of the parallel mechanism,which meets the requirements of the human ankle,is analyzed.The inverse kinematics are presented,and singularities are identified based on the Jacobian matrix.The workspaces of the parallel mechanism are obtained through the search method and compared with the motion range of the human ankle,which shows that the parallel mechanism can meet the motion demand of ankle rehabilitation.Additionally,based on the motion-force transmissibility,the performance atlases are plotted in the parameter optimal design space,and the optimum parameter is obtained according to the demands of practical applications.The results show that the parallel mechanism can meet the motion requirements of ankle rehabilitation and has excellent kinematic performance in its rehabilitation range,which provides a theoretical basis for the prototype design and experimental verification.
基金supported by the National Natural Science Foundation of China(62072392)the National Natural Science Foundation of China(61972360)the Major Scientific and Technological Innovation Projects of Shandong Province(2019522Y020131).
文摘The development of the Internet of Things(IoT)has brought great convenience to people.However,some information security problems such as privacy leakage are caused by communicating with risky users.It is a challenge to choose reliable users with which to interact in the IoT.Therefore,trust plays a crucial role in the IoT because trust may avoid some risks.Agents usually choose reliable users with high trust to maximize their own interests based on reinforcement learning.However,trust propagation is time-consuming,and trust changes with the interaction process in social networks.To track the dynamic changes in trust values,a dynamic trust inference algorithm named Dynamic Double DQN Trust(Dy-DDQNTrust)is proposed to predict the indirect trust values of two users without direct contact with each other.The proposed algorithm simulates the interactions among users by double DQN.Firstly,CurrentNet and TargetNet networks are used to select users for interaction.The users with high trust are chosen to interact in future iterations.Secondly,the trust value is updated dynamically until a reliable trust path is found according to the result of the interaction.Finally,the trust value between indirect users is inferred by aggregating the opinions from multiple users through a Modified Collaborative Filtering Averagebased Similarity(SMCFAvg)aggregation strategy.Experiments are carried out on the FilmTrust and the Epinions datasets.Compared with TidalTrust,MoleTrust,DDQNTrust,DyTrust and Dynamic Weighted Heuristic trust path Search algorithm(DWHS),our dynamic trust inference algorithm has higher prediction accuracy and better scalability.
基金supported by the National Key Research and Development Program of China(No.2020YFB1901900)the National Natural Science Foundation of China(Nos.U20B2011,12175138)the Shanghai Rising-Star Program。
文摘The heterogeneous variational nodal method(HVNM)has emerged as a potential approach for solving high-fidelity neutron transport problems.However,achieving accurate results with HVNM in large-scale problems using high-fidelity models has been challenging due to the prohibitive computational costs.This paper presents an efficient parallel algorithm tailored for HVNM based on the Message Passing Interface standard.The algorithm evenly distributes the response matrix sets among processors during the matrix formation process,thus enabling independent construction without communication.Once the formation tasks are completed,a collective operation merges and shares the matrix sets among the processors.For the solution process,the problem domain is decomposed into subdomains assigned to specific processors,and the red-black Gauss-Seidel iteration is employed within each subdomain to solve the response matrix equation.Point-to-point communication is conducted between adjacent subdomains to exchange data along the boundaries.The accuracy and efficiency of the parallel algorithm are verified using the KAIST and JRR-3 test cases.Numerical results obtained with multiple processors agree well with those obtained from Monte Carlo calculations.The parallelization of HVNM results in eigenvalue errors of 31 pcm/-90 pcm and fission rate RMS errors of 1.22%/0.66%,respectively,for the 3D KAIST problem and the 3D JRR-3 problem.In addition,the parallel algorithm significantly reduces computation time,with an efficiency of 68.51% using 36 processors in the KAIST problem and 77.14% using 144 processors in the JRR-3 problem.
文摘The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.
基金Supported by Key Scientific Research Platforms and Projects of Guangdong Regular Institutions of Higher Education of China(Grant No.2022KCXTD033)Guangdong Provincial Natural Science Foundation of China(Grant No.2023A1515012103)+1 种基金Guangdong Provincial Scientific Research Capacity Improvement Project of Key Developing Disciplines of China(Grant No.2021ZDJS084)National Natural Science Foundation of China(Grant No.52105009).
文摘The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.
基金supported by the fund from ShenyangMint Company Limited(No.20220056)Senior Talent Foundation of Jiangsu University(No.19JDG022)Taizhou City Double Innovation and Entrepreneurship Talent Program(No.Taizhou Human Resources Office[2022]No.22).
文摘In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.
基金supported in part by the National Natural Science Foundation of China under Grant 61901128,62273109the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB510032).
文摘The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.