When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to ...When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.展开更多
Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL...Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.展开更多
The aim of this study is to investigate the impacts of the sampling strategy of landslide and non-landslide on the performance of landslide susceptibility assessment(LSA).The study area is the Feiyun catchment in Wenz...The aim of this study is to investigate the impacts of the sampling strategy of landslide and non-landslide on the performance of landslide susceptibility assessment(LSA).The study area is the Feiyun catchment in Wenzhou City,Southeast China.Two types of landslides samples,combined with seven non-landslide sampling strategies,resulted in a total of 14 scenarios.The corresponding landslide susceptibility map(LSM)for each scenario was generated using the random forest model.The receiver operating characteristic(ROC)curve and statistical indicators were calculated and used to assess the impact of the dataset sampling strategy.The results showed that higher accuracies were achieved when using the landslide core as positive samples,combined with non-landslide sampling from the very low zone or buffer zone.The results reveal the influence of landslide and non-landslide sampling strategies on the accuracy of LSA,which provides a reference for subsequent researchers aiming to obtain a more reasonable LSM.展开更多
Material identification is critical for understanding the relationship between mechanical properties and the associated mechanical functions.However,material identification is a challenging task,especially when the ch...Material identification is critical for understanding the relationship between mechanical properties and the associated mechanical functions.However,material identification is a challenging task,especially when the characteristic of the material is highly nonlinear in nature,as is common in biological tissue.In this work,we identify unknown material properties in continuum solid mechanics via physics-informed neural networks(PINNs).To improve the accuracy and efficiency of PINNs,we develop efficient strategies to nonuniformly sample observational data.We also investigate different approaches to enforce Dirichlet-type boundary conditions(BCs)as soft or hard constraints.Finally,we apply the proposed methods to a diverse set of time-dependent and time-independent solid mechanic examples that span linear elastic and hyperelastic material space.The estimated material parameters achieve relative errors of less than 1%.As such,this work is relevant to diverse applications,including optimizing structural integrity and developing novel materials.展开更多
Global variance reduction is a bottleneck in Monte Carlo shielding calculations.The global variance reduction problem requires that the statistical error of the entire space is uniform.This study proposed a grid-AIS m...Global variance reduction is a bottleneck in Monte Carlo shielding calculations.The global variance reduction problem requires that the statistical error of the entire space is uniform.This study proposed a grid-AIS method for the global variance reduction problem based on the AIS method,which was implemented in the Monte Carlo program MCShield.The proposed method was validated using the VENUS-Ⅲ international benchmark problem and a self-shielding calculation example.The results from the VENUS-Ⅲ benchmark problem showed that the grid-AIS method achieved a significant reduction in the variance of the statistical errors of the MESH grids,decreasing from 1.08×10^(-2) to 3.84×10^(-3),representing a 64.00% reduction.This demonstrates that the grid-AIS method is effective in addressing global issues.The results of the selfshielding calculation demonstrate that the grid-AIS method produced accurate computational results.Moreover,the grid-AIS method exhibited a computational efficiency approximately one order of magnitude higher than that of the AIS method and approximately two orders of magnitude higher than that of the conventional Monte Carlo method.展开更多
In this paper,we establish a new multivariate Hermite sampling series involving samples from the function itself and its mixed and non-mixed partial derivatives of arbitrary order.This multivariate form of Hermite sam...In this paper,we establish a new multivariate Hermite sampling series involving samples from the function itself and its mixed and non-mixed partial derivatives of arbitrary order.This multivariate form of Hermite sampling will be valid for some classes of multivariate entire functions,satisfying certain growth conditions.We will show that many known results included in Commun Korean Math Soc,2002,17:731-740,Turk J Math,2017,41:387-403 and Filomat,2020,34:3339-3347 are special cases of our results.Moreover,we estimate the truncation error of this sampling based on localized sampling without decay assumption.Illustrative examples are also presented.展开更多
This study presents the design of a modified attributed control chart based on a double sampling(DS)np chart applied in combination with generalized multiple dependent state(GMDS)sampling to monitor the mean life of t...This study presents the design of a modified attributed control chart based on a double sampling(DS)np chart applied in combination with generalized multiple dependent state(GMDS)sampling to monitor the mean life of the product based on the time truncated life test employing theWeibull distribution.The control chart developed supports the examination of the mean lifespan variation for a particular product in the process of manufacturing.Three control limit levels are used:the warning control limit,inner control limit,and outer control limit.Together,they enhance the capability for variation detection.A genetic algorithm can be used for optimization during the in-control process,whereby the optimal parameters can be established for the proposed control chart.The control chart performance is assessed using the average run length,while the influence of the model parameters upon the control chart solution is assessed via sensitivity analysis based on an orthogonal experimental design withmultiple linear regression.A comparative study was conducted based on the out-of-control average run length,in which the developed control chart offered greater sensitivity in the detection of process shifts while making use of smaller samples on average than is the case for existing control charts.Finally,to exhibit the utility of the developed control chart,this paper presents its application using simulated data with parameters drawn from the real set of data.展开更多
The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with...The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with molecular simulations to improve the sampling efficiency of the vast conformational space of large biomolecules.This review focuses on recent studies that utilize ML-based techniques in the exploration of protein conformational landscape.We first highlight the recent development of ML-aided enhanced sampling methods,including heuristic algorithms and neural networks that are designed to refine the selection of reaction coordinates for the construction of bias potential,or facilitate the exploration of the unsampled region of the energy landscape.Further,we review the development of autoencoder based methods that combine molecular simulations and deep learning to expand the search for protein conformations.Lastly,we discuss the cutting-edge methodologies for the one-shot generation of protein conformations with precise Boltzmann weights.Collectively,this review demonstrates the promising potential of machine learning in revolutionizing our insight into the complex conformational ensembles of proteins.展开更多
The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-atten...The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-attention mechanisms falter when applied to datasets with intricate semantic content and extensive dependency structures.In response,this paper introduces a Diffusion Sampling and Label-Driven Co-attention Neural Network(DSLD),which adopts a diffusion sampling method to capture more comprehensive semantic information of the data.Additionally,themodel leverages the joint correlation information of labels and data to introduce the computation of text representation,correcting semantic representationbiases in thedata,andincreasing the accuracyof semantic representation.Ultimately,the model computes the corresponding classification results by synthesizing these rich data semantic representations.Experiments on seven benchmark datasets show that our proposed model achieves competitive results compared to state-of-the-art methods.展开更多
Peer-to-peer(P2P)overlay networks provide message transmission capabilities for blockchain systems.Improving data transmission efficiency in P2P networks can greatly enhance the performance of blockchain systems.Howev...Peer-to-peer(P2P)overlay networks provide message transmission capabilities for blockchain systems.Improving data transmission efficiency in P2P networks can greatly enhance the performance of blockchain systems.However,traditional blockchain P2P networks face a common challenge where there is often a mismatch between the upper-layer traffic requirements and the underlying physical network topology.This mismatch results in redundant data transmission and inefficient routing,severely constraining the scalability of blockchain systems.To address these pressing issues,we propose FPSblo,an efficient transmission method for blockchain networks.Our inspiration for FPSblo stems from the Farthest Point Sampling(FPS)algorithm,a well-established technique widely utilized in point cloud image processing.In this work,we analogize blockchain nodes to points in a point cloud image and select a representative set of nodes to prioritize message forwarding so that messages reach the network edge quickly and are evenly distributed.Moreover,we compare our model with the Kadcast transmission model,which is a classic improvement model for blockchain P2P transmission networks,the experimental findings show that the FPSblo model reduces 34.8%of transmission redundancy and reduces the overload rate by 37.6%.By conducting experimental analysis,the FPS-BT model enhances the transmission capabilities of the P2P network in blockchain.展开更多
Disjoint sampling is critical for rigorous and unbiased evaluation of state-of-the-art(SOTA)models e.g.,Attention Graph and Vision Transformer.When training,validation,and test sets overlap or share data,it introduces...Disjoint sampling is critical for rigorous and unbiased evaluation of state-of-the-art(SOTA)models e.g.,Attention Graph and Vision Transformer.When training,validation,and test sets overlap or share data,it introduces a bias that inflates performance metrics and prevents accurate assessment of a model’s true ability to generalize to new examples.This paper presents an innovative disjoint sampling approach for training SOTA models for the Hyperspectral Image Classification(HSIC).By separating training,validation,and test data without overlap,the proposed method facilitates a fairer evaluation of how well a model can classify pixels it was not exposed to during training or validation.Experiments demonstrate the approach significantly improves a model’s generalization compared to alternatives that include training and validation data in test data(A trivial approach involves testing the model on the entire Hyperspectral dataset to generate the ground truth maps.This approach produces higher accuracy but ultimately results in low generalization performance).Disjoint sampling eliminates data leakage between sets and provides reliable metrics for benchmarking progress in HSIC.Disjoint sampling is critical for advancing SOTA models and their real-world application to large-scale land mapping with Hyperspectral sensors.Overall,with the disjoint test set,the performance of the deep models achieves 96.36%accuracy on Indian Pines data,99.73%on Pavia University data,98.29%on University of Houston data,99.43%on Botswana data,and 99.88%on Salinas data.展开更多
We propose a new framework for the sampling,compression,and analysis of distributions of point sets and other geometric objects embedded in Euclidean spaces.Our approach involves constructing a tensor called the RaySe...We propose a new framework for the sampling,compression,and analysis of distributions of point sets and other geometric objects embedded in Euclidean spaces.Our approach involves constructing a tensor called the RaySense sketch,which captures nearest neighbors from the underlying geometry of points along a set of rays.We explore various operations that can be performed on the RaySense sketch,leading to different properties and potential applications.Statistical information about the data set can be extracted from the sketch,independent of the ray set.Line integrals on point sets can be efficiently computed using the sketch.We also present several examples illustrating applications of the proposed strategy in practical scenarios.展开更多
Physics-informed neural networks(PINNs)have become an attractive machine learning framework for obtaining solutions to partial differential equations(PDEs).PINNs embed initial,boundary,and PDE constraints into the los...Physics-informed neural networks(PINNs)have become an attractive machine learning framework for obtaining solutions to partial differential equations(PDEs).PINNs embed initial,boundary,and PDE constraints into the loss function.The performance of PINNs is generally affected by both training and sampling.Specifically,training methods focus on how to overcome the training difficulties caused by the special PDE residual loss of PINNs,and sampling methods are concerned with the location and distribution of the sampling points upon which evaluations of PDE residual loss are accomplished.However,a common problem among these original PINNs is that they omit special temporal information utilization during the training or sampling stages when dealing with an important PDE category,namely,time-dependent PDEs,where temporal information plays a key role in the algorithms used.There is one method,called Causal PINN,that considers temporal causality at the training level but not special temporal utilization at the sampling level.Incorporating temporal knowledge into sampling remains to be studied.To fill this gap,we propose a novel temporal causality-based adaptive sampling method that dynamically determines the sampling ratio according to both PDE residual and temporal causality.By designing a sampling ratio determined by both residual loss and temporal causality to control the number and location of sampled points in each temporal sub-domain,we provide a practical solution by incorporating temporal information into sampling.Numerical experiments of several nonlinear time-dependent PDEs,including the Cahn–Hilliard,Korteweg–de Vries,Allen–Cahn and wave equations,show that our proposed sampling method can improve the performance.We demonstrate that using such a relatively simple sampling method can improve prediction performance by up to two orders of magnitude compared with the results from other methods,especially when points are limited.展开更多
Dispersion fuels,knowned for their excellent safety performance,are widely used in advanced reactors,such as hightemperature gas-cooled reactors.Compared with deterministic methods,the Monte Carlo method has more adva...Dispersion fuels,knowned for their excellent safety performance,are widely used in advanced reactors,such as hightemperature gas-cooled reactors.Compared with deterministic methods,the Monte Carlo method has more advantages in the geometric modeling of stochastic media.The explicit modeling method has high computational accuracy and high computational cost.The chord length sampling(CLS)method can improve computational efficiency by sampling the chord length during neutron transport using the matrix chord length?s probability density function.This study shows that the excluded-volume effect in realistic stochastic media can introduce certain deviations into the CLS.A chord length correction approach is proposed to obtain the chord length correction factor by developing the Particle code based on equivalent transmission probability.Through numerical analysis against reference solutions from explicit modeling in the RMC code,it was demonstrated that CLS with the proposed correction method provides good accuracy for addressing the excludedvolume effect in realistic infinite stochastic media.展开更多
Wideband spectrum sensing with a high-speed analog-digital converter(ADC) presents a challenge for practical systems.The Nyquist folding receiver(NYFR) is a promising scheme for achieving cost-effective real-time spec...Wideband spectrum sensing with a high-speed analog-digital converter(ADC) presents a challenge for practical systems.The Nyquist folding receiver(NYFR) is a promising scheme for achieving cost-effective real-time spectrum sensing,which is subject to the complexity of processing the modulated outputs.In this case,a multipath NYFR architecture with a step-sampling rate for the different paths is proposed.The different numbers of digital channels for each path are designed based on the Chinese remainder theorem(CRT).Then,the detectable frequency range is divided into multiple frequency grids,and the Nyquist zone(NZ) of the input can be obtained by sensing these grids.Thus,high-precision parameter estimation is performed by utilizing the NYFR characteristics.Compared with the existing methods,the scheme proposed in this paper overcomes the challenge of NZ estimation,information damage,many computations,low accuracy,and high false alarm probability.Comparative simulation experiments verify the effectiveness of the proposed architecture in this paper.展开更多
BACKGROUND Pancreatic ductal leaks complicated by endoscopic ultrasonography-guided tissue sampling(EUS-TS)can manifest as acute pancreatitis.CASE SUMMARY A 63-year-old man presented with persistent abdominal pain and...BACKGROUND Pancreatic ductal leaks complicated by endoscopic ultrasonography-guided tissue sampling(EUS-TS)can manifest as acute pancreatitis.CASE SUMMARY A 63-year-old man presented with persistent abdominal pain and weight loss.Diagnosis:Laboratory findings revealed elevated carbohydrate antigen 19-9(5920 U/mL)and carcinoembryonic antigen(23.7 ng/mL)levels.Magnetic resonance imaging of the pancreas revealed an approximately 3 cm ill-defined space-occupying lesion in the inferior aspect of the head,with severe encasement of the superior mesenteric artery.Pancreatic ductal adenocarcinoma was confirmed after pathological examination of specimens obtained by EUS-TS using the fanning method.Interventions and outcomes:The following day,the patient experienced severe abdominal pain with high amylase(265 U/L)and lipase(1173 U/L)levels.Computed tomography of the abdomen revealed edematous wall thickening of the second portion of the duodenum with adjacent fluid collections and a suspicious leak from either the distal common bile duct or the main pancreatic duct in the head.Endoscopic retrograde cholangiopancreatography revealed dye leakage in the head of the main pancreatic duct.Therefore,a 5F 7 cm linear plastic stent was deployed into the pancreatic duct to divert the pancreatic juice.The patient’s abdominal pain improved immediately after pancreatic stent insertion,and amylase and lipase levels normalized within a week.Neoadjuvant chemotherapy was then initiated.CONCLUSION Using the fanning method in EUS-TS can inadvertently cause damage to the pancreatic duct and may lead to clinically significant pancreatitis.Placing a pancreatic stent may immediately resolve acute pancreatitis and shorten the waiting time for curative therapy.When using the fanning method during EUSTS,ductal structures should be excluded to prevent pancreatic ductal leakage.展开更多
In order to enhance grain sampling efficiency, in this work a truss type multi-rod grain sampling machine is designed and tested. The sampling machine primarily consists of truss support mechanism, main carriage mecha...In order to enhance grain sampling efficiency, in this work a truss type multi-rod grain sampling machine is designed and tested. The sampling machine primarily consists of truss support mechanism, main carriage mechanism, auxiliary carriage mechanism, sampling rods, and a PLC controller. The movement of the main carriage on the truss, the auxiliary carriage on the main carriage, and the vertical movement of the sampling rods on the auxiliary carriage are controlled through PLC programming. The sampling machine accurately controls the position of the sampling rods, enabling random sampling with six rods to ensure comprehensive and random sampling. Additionally, sampling experiments were conducted, and the results showed that the multi-rod grain sampling machine simultaneously samples with six rods, achieving a sampling frequency of 38 times per hour. The round trip time for the sampling rods is 33 seconds per cycle, and the sampling length direction reaches 18 m. This study provides valuable insights for the design of multi-rod grain sampling machines.展开更多
Background Functional mapping, despite its proven efficiency, suffers from a “chicken or egg” scenario, in that, poor spatial features lead to inadequate spectral alignment and vice versa during training, often resu...Background Functional mapping, despite its proven efficiency, suffers from a “chicken or egg” scenario, in that, poor spatial features lead to inadequate spectral alignment and vice versa during training, often resulting in slow convergence, high computational costs, and learning failures, particularly when small datasets are used. Methods A novel method is presented for dense-shape correspondence, whereby the spatial information transformed by neural networks is combined with the projections onto spectral maps to overcome the “chicken or egg” challenge by selectively sampling only points with high confidence in their alignment. These points then contribute to the alignment and spectral loss terms, boosting training, and accelerating convergence by a factor of five. To ensure full unsupervised learning, the Gromov–Hausdorff distance metric was used to select the points with the maximal alignment score displaying most confidence. Results The effectiveness of the proposed approach was demonstrated on several benchmark datasets, whereby results were reported as superior to those of spectral and spatial-based methods. Conclusions The proposed method provides a promising new approach to dense-shape correspondence, addressing the key challenges in the field and offering significant advantages over the current methods, including faster convergence, improved accuracy, and reduced computational costs.展开更多
Understanding the mechanisms and risks of forest fires by building a spatial prediction model is an important means of controlling forest fires.Non-fire point data are important training data for constructing a model,...Understanding the mechanisms and risks of forest fires by building a spatial prediction model is an important means of controlling forest fires.Non-fire point data are important training data for constructing a model,and their quality significantly impacts the prediction performance of the model.However,non-fire point data obtained using existing sampling methods generally suffer from low representativeness.Therefore,this study proposes a non-fire point data sampling method based on geographical similarity to improve the quality of non-fire point samples.The method is based on the idea that the less similar the geographical environment between a sample point and an already occurred fire point,the greater the confidence in being a non-fire point sample.Yunnan Province,China,with a high frequency of forest fires,was used as the study area.We compared the prediction performance of traditional sampling methods and the proposed method using three commonly used forest fire risk prediction models:logistic regression(LR),support vector machine(SVM),and random forest(RF).The results show that the modeling and prediction accuracies of the forest fire prediction models established based on the proposed sampling method are significantly improved compared with those of the traditional sampling method.Specifically,in 2010,the modeling and prediction accuracies improved by 19.1%and 32.8%,respectively,and in 2020,they improved by 13.1%and 24.3%,respectively.Therefore,we believe that collecting non-fire point samples based on the principle of geographical similarity is an effective way to improve the quality of forest fire samples,and thus enhance the prediction of forest fire risk.展开更多
Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship ...Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.展开更多
基金supported by the Yunnan Major Scientific and Technological Projects(Grant No.202302AD080001)the National Natural Science Foundation,China(No.52065033).
文摘When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.
文摘Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.
文摘The aim of this study is to investigate the impacts of the sampling strategy of landslide and non-landslide on the performance of landslide susceptibility assessment(LSA).The study area is the Feiyun catchment in Wenzhou City,Southeast China.Two types of landslides samples,combined with seven non-landslide sampling strategies,resulted in a total of 14 scenarios.The corresponding landslide susceptibility map(LSM)for each scenario was generated using the random forest model.The receiver operating characteristic(ROC)curve and statistical indicators were calculated and used to assess the impact of the dataset sampling strategy.The results showed that higher accuracies were achieved when using the landslide core as positive samples,combined with non-landslide sampling from the very low zone or buffer zone.The results reveal the influence of landslide and non-landslide sampling strategies on the accuracy of LSA,which provides a reference for subsequent researchers aiming to obtain a more reasonable LSM.
基金funded by the Cora Topolewski Cardiac Research Fund at the Children’s Hospital of Philadelphia(CHOP)the Pediatric Valve Center Frontier Program at CHOP+4 种基金the Additional Ventures Single Ventricle Research Fund Expansion Awardthe National Institutes of Health(USA)supported by the program(Nos.NHLBI T32 HL007915 and NIH R01 HL153166)supported by the program(No.NIH R01 HL153166)supported by the U.S.Department of Energy(No.DE-SC0022953)。
文摘Material identification is critical for understanding the relationship between mechanical properties and the associated mechanical functions.However,material identification is a challenging task,especially when the characteristic of the material is highly nonlinear in nature,as is common in biological tissue.In this work,we identify unknown material properties in continuum solid mechanics via physics-informed neural networks(PINNs).To improve the accuracy and efficiency of PINNs,we develop efficient strategies to nonuniformly sample observational data.We also investigate different approaches to enforce Dirichlet-type boundary conditions(BCs)as soft or hard constraints.Finally,we apply the proposed methods to a diverse set of time-dependent and time-independent solid mechanic examples that span linear elastic and hyperelastic material space.The estimated material parameters achieve relative errors of less than 1%.As such,this work is relevant to diverse applications,including optimizing structural integrity and developing novel materials.
基金supported by the Platform Development Foundation of the China Institute for Radiation Protection(No.YP21030101)the National Natural Science Foundation of China(General Program)(Nos.12175114,U2167209)+1 种基金the National Key R&D Program of China(No.2021YFF0603600)the Tsinghua University Initiative Scientific Research Program(No.20211080081).
文摘Global variance reduction is a bottleneck in Monte Carlo shielding calculations.The global variance reduction problem requires that the statistical error of the entire space is uniform.This study proposed a grid-AIS method for the global variance reduction problem based on the AIS method,which was implemented in the Monte Carlo program MCShield.The proposed method was validated using the VENUS-Ⅲ international benchmark problem and a self-shielding calculation example.The results from the VENUS-Ⅲ benchmark problem showed that the grid-AIS method achieved a significant reduction in the variance of the statistical errors of the MESH grids,decreasing from 1.08×10^(-2) to 3.84×10^(-3),representing a 64.00% reduction.This demonstrates that the grid-AIS method is effective in addressing global issues.The results of the selfshielding calculation demonstrate that the grid-AIS method produced accurate computational results.Moreover,the grid-AIS method exhibited a computational efficiency approximately one order of magnitude higher than that of the AIS method and approximately two orders of magnitude higher than that of the conventional Monte Carlo method.
文摘In this paper,we establish a new multivariate Hermite sampling series involving samples from the function itself and its mixed and non-mixed partial derivatives of arbitrary order.This multivariate form of Hermite sampling will be valid for some classes of multivariate entire functions,satisfying certain growth conditions.We will show that many known results included in Commun Korean Math Soc,2002,17:731-740,Turk J Math,2017,41:387-403 and Filomat,2020,34:3339-3347 are special cases of our results.Moreover,we estimate the truncation error of this sampling based on localized sampling without decay assumption.Illustrative examples are also presented.
基金the Science,Research and Innovation Promotion Funding(TSRI)(Grant No.FRB660012/0168)managed under Rajamangala University of Technology Thanyaburi(FRB66E0646O.4).
文摘This study presents the design of a modified attributed control chart based on a double sampling(DS)np chart applied in combination with generalized multiple dependent state(GMDS)sampling to monitor the mean life of the product based on the time truncated life test employing theWeibull distribution.The control chart developed supports the examination of the mean lifespan variation for a particular product in the process of manufacturing.Three control limit levels are used:the warning control limit,inner control limit,and outer control limit.Together,they enhance the capability for variation detection.A genetic algorithm can be used for optimization during the in-control process,whereby the optimal parameters can be established for the proposed control chart.The control chart performance is assessed using the average run length,while the influence of the model parameters upon the control chart solution is assessed via sensitivity analysis based on an orthogonal experimental design withmultiple linear regression.A comparative study was conducted based on the out-of-control average run length,in which the developed control chart offered greater sensitivity in the detection of process shifts while making use of smaller samples on average than is the case for existing control charts.Finally,to exhibit the utility of the developed control chart,this paper presents its application using simulated data with parameters drawn from the real set of data.
基金Project supported by the National Key Research and Development Program of China(Grant No.2023YFF1204402)the National Natural Science Foundation of China(Grant Nos.12074079 and 12374208)+1 种基金the Natural Science Foundation of Shanghai(Grant No.22ZR1406800)the China Postdoctoral Science Foundation(Grant No.2022M720815).
文摘The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with molecular simulations to improve the sampling efficiency of the vast conformational space of large biomolecules.This review focuses on recent studies that utilize ML-based techniques in the exploration of protein conformational landscape.We first highlight the recent development of ML-aided enhanced sampling methods,including heuristic algorithms and neural networks that are designed to refine the selection of reaction coordinates for the construction of bias potential,or facilitate the exploration of the unsampled region of the energy landscape.Further,we review the development of autoencoder based methods that combine molecular simulations and deep learning to expand the search for protein conformations.Lastly,we discuss the cutting-edge methodologies for the one-shot generation of protein conformations with precise Boltzmann weights.Collectively,this review demonstrates the promising potential of machine learning in revolutionizing our insight into the complex conformational ensembles of proteins.
基金the Communication University of China(CUC230A013)the Fundamental Research Funds for the Central Universities.
文摘The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-attention mechanisms falter when applied to datasets with intricate semantic content and extensive dependency structures.In response,this paper introduces a Diffusion Sampling and Label-Driven Co-attention Neural Network(DSLD),which adopts a diffusion sampling method to capture more comprehensive semantic information of the data.Additionally,themodel leverages the joint correlation information of labels and data to introduce the computation of text representation,correcting semantic representationbiases in thedata,andincreasing the accuracyof semantic representation.Ultimately,the model computes the corresponding classification results by synthesizing these rich data semantic representations.Experiments on seven benchmark datasets show that our proposed model achieves competitive results compared to state-of-the-art methods.
基金This present research work was supported by the National Key R&D Program of China(No.2021YFB2700800)the GHfund B(No.202302024490).
文摘Peer-to-peer(P2P)overlay networks provide message transmission capabilities for blockchain systems.Improving data transmission efficiency in P2P networks can greatly enhance the performance of blockchain systems.However,traditional blockchain P2P networks face a common challenge where there is often a mismatch between the upper-layer traffic requirements and the underlying physical network topology.This mismatch results in redundant data transmission and inefficient routing,severely constraining the scalability of blockchain systems.To address these pressing issues,we propose FPSblo,an efficient transmission method for blockchain networks.Our inspiration for FPSblo stems from the Farthest Point Sampling(FPS)algorithm,a well-established technique widely utilized in point cloud image processing.In this work,we analogize blockchain nodes to points in a point cloud image and select a representative set of nodes to prioritize message forwarding so that messages reach the network edge quickly and are evenly distributed.Moreover,we compare our model with the Kadcast transmission model,which is a classic improvement model for blockchain P2P transmission networks,the experimental findings show that the FPSblo model reduces 34.8%of transmission redundancy and reduces the overload rate by 37.6%.By conducting experimental analysis,the FPS-BT model enhances the transmission capabilities of the P2P network in blockchain.
基金the Researchers Supporting Project number(RSPD2024R848),King Saud University,Riyadh,Saudi Arabia.
文摘Disjoint sampling is critical for rigorous and unbiased evaluation of state-of-the-art(SOTA)models e.g.,Attention Graph and Vision Transformer.When training,validation,and test sets overlap or share data,it introduces a bias that inflates performance metrics and prevents accurate assessment of a model’s true ability to generalize to new examples.This paper presents an innovative disjoint sampling approach for training SOTA models for the Hyperspectral Image Classification(HSIC).By separating training,validation,and test data without overlap,the proposed method facilitates a fairer evaluation of how well a model can classify pixels it was not exposed to during training or validation.Experiments demonstrate the approach significantly improves a model’s generalization compared to alternatives that include training and validation data in test data(A trivial approach involves testing the model on the entire Hyperspectral dataset to generate the ground truth maps.This approach produces higher accuracy but ultimately results in low generalization performance).Disjoint sampling eliminates data leakage between sets and provides reliable metrics for benchmarking progress in HSIC.Disjoint sampling is critical for advancing SOTA models and their real-world application to large-scale land mapping with Hyperspectral sensors.Overall,with the disjoint test set,the performance of the deep models achieves 96.36%accuracy on Indian Pines data,99.73%on Pavia University data,98.29%on University of Houston data,99.43%on Botswana data,and 99.88%on Salinas data.
基金supported by the National Science Foundation(Grant No.DMS-1440415)partially supported by a grant from the Simons Foundation,NSF Grants DMS-1720171 and DMS-2110895a Discovery Grant from Natural Sciences and Engineering Research Council of Canada.
文摘We propose a new framework for the sampling,compression,and analysis of distributions of point sets and other geometric objects embedded in Euclidean spaces.Our approach involves constructing a tensor called the RaySense sketch,which captures nearest neighbors from the underlying geometry of points along a set of rays.We explore various operations that can be performed on the RaySense sketch,leading to different properties and potential applications.Statistical information about the data set can be extracted from the sketch,independent of the ray set.Line integrals on point sets can be efficiently computed using the sketch.We also present several examples illustrating applications of the proposed strategy in practical scenarios.
基金Project supported by the Key National Natural Science Foundation of China(Grant No.62136005)the National Natural Science Foundation of China(Grant Nos.61922087,61906201,and 62006238)。
文摘Physics-informed neural networks(PINNs)have become an attractive machine learning framework for obtaining solutions to partial differential equations(PDEs).PINNs embed initial,boundary,and PDE constraints into the loss function.The performance of PINNs is generally affected by both training and sampling.Specifically,training methods focus on how to overcome the training difficulties caused by the special PDE residual loss of PINNs,and sampling methods are concerned with the location and distribution of the sampling points upon which evaluations of PDE residual loss are accomplished.However,a common problem among these original PINNs is that they omit special temporal information utilization during the training or sampling stages when dealing with an important PDE category,namely,time-dependent PDEs,where temporal information plays a key role in the algorithms used.There is one method,called Causal PINN,that considers temporal causality at the training level but not special temporal utilization at the sampling level.Incorporating temporal knowledge into sampling remains to be studied.To fill this gap,we propose a novel temporal causality-based adaptive sampling method that dynamically determines the sampling ratio according to both PDE residual and temporal causality.By designing a sampling ratio determined by both residual loss and temporal causality to control the number and location of sampled points in each temporal sub-domain,we provide a practical solution by incorporating temporal information into sampling.Numerical experiments of several nonlinear time-dependent PDEs,including the Cahn–Hilliard,Korteweg–de Vries,Allen–Cahn and wave equations,show that our proposed sampling method can improve the performance.We demonstrate that using such a relatively simple sampling method can improve prediction performance by up to two orders of magnitude compared with the results from other methods,especially when points are limited.
文摘Dispersion fuels,knowned for their excellent safety performance,are widely used in advanced reactors,such as hightemperature gas-cooled reactors.Compared with deterministic methods,the Monte Carlo method has more advantages in the geometric modeling of stochastic media.The explicit modeling method has high computational accuracy and high computational cost.The chord length sampling(CLS)method can improve computational efficiency by sampling the chord length during neutron transport using the matrix chord length?s probability density function.This study shows that the excluded-volume effect in realistic stochastic media can introduce certain deviations into the CLS.A chord length correction approach is proposed to obtain the chord length correction factor by developing the Particle code based on equivalent transmission probability.Through numerical analysis against reference solutions from explicit modeling in the RMC code,it was demonstrated that CLS with the proposed correction method provides good accuracy for addressing the excludedvolume effect in realistic infinite stochastic media.
基金supported by the Key Projects of the 2022 National Defense Science and Technology Foundation Strengthening Plan 173 (Grant No.2022-173ZD-010)the Equipment PreResearch Foundation of The State Key Laboratory (Grant No.6142101200204)。
文摘Wideband spectrum sensing with a high-speed analog-digital converter(ADC) presents a challenge for practical systems.The Nyquist folding receiver(NYFR) is a promising scheme for achieving cost-effective real-time spectrum sensing,which is subject to the complexity of processing the modulated outputs.In this case,a multipath NYFR architecture with a step-sampling rate for the different paths is proposed.The different numbers of digital channels for each path are designed based on the Chinese remainder theorem(CRT).Then,the detectable frequency range is divided into multiple frequency grids,and the Nyquist zone(NZ) of the input can be obtained by sensing these grids.Thus,high-precision parameter estimation is performed by utilizing the NYFR characteristics.Compared with the existing methods,the scheme proposed in this paper overcomes the challenge of NZ estimation,information damage,many computations,low accuracy,and high false alarm probability.Comparative simulation experiments verify the effectiveness of the proposed architecture in this paper.
文摘BACKGROUND Pancreatic ductal leaks complicated by endoscopic ultrasonography-guided tissue sampling(EUS-TS)can manifest as acute pancreatitis.CASE SUMMARY A 63-year-old man presented with persistent abdominal pain and weight loss.Diagnosis:Laboratory findings revealed elevated carbohydrate antigen 19-9(5920 U/mL)and carcinoembryonic antigen(23.7 ng/mL)levels.Magnetic resonance imaging of the pancreas revealed an approximately 3 cm ill-defined space-occupying lesion in the inferior aspect of the head,with severe encasement of the superior mesenteric artery.Pancreatic ductal adenocarcinoma was confirmed after pathological examination of specimens obtained by EUS-TS using the fanning method.Interventions and outcomes:The following day,the patient experienced severe abdominal pain with high amylase(265 U/L)and lipase(1173 U/L)levels.Computed tomography of the abdomen revealed edematous wall thickening of the second portion of the duodenum with adjacent fluid collections and a suspicious leak from either the distal common bile duct or the main pancreatic duct in the head.Endoscopic retrograde cholangiopancreatography revealed dye leakage in the head of the main pancreatic duct.Therefore,a 5F 7 cm linear plastic stent was deployed into the pancreatic duct to divert the pancreatic juice.The patient’s abdominal pain improved immediately after pancreatic stent insertion,and amylase and lipase levels normalized within a week.Neoadjuvant chemotherapy was then initiated.CONCLUSION Using the fanning method in EUS-TS can inadvertently cause damage to the pancreatic duct and may lead to clinically significant pancreatitis.Placing a pancreatic stent may immediately resolve acute pancreatitis and shorten the waiting time for curative therapy.When using the fanning method during EUSTS,ductal structures should be excluded to prevent pancreatic ductal leakage.
文摘In order to enhance grain sampling efficiency, in this work a truss type multi-rod grain sampling machine is designed and tested. The sampling machine primarily consists of truss support mechanism, main carriage mechanism, auxiliary carriage mechanism, sampling rods, and a PLC controller. The movement of the main carriage on the truss, the auxiliary carriage on the main carriage, and the vertical movement of the sampling rods on the auxiliary carriage are controlled through PLC programming. The sampling machine accurately controls the position of the sampling rods, enabling random sampling with six rods to ensure comprehensive and random sampling. Additionally, sampling experiments were conducted, and the results showed that the multi-rod grain sampling machine simultaneously samples with six rods, achieving a sampling frequency of 38 times per hour. The round trip time for the sampling rods is 33 seconds per cycle, and the sampling length direction reaches 18 m. This study provides valuable insights for the design of multi-rod grain sampling machines.
基金Supported by the Zimin Institute for Engineering Solutions Advancing Better Lives。
文摘Background Functional mapping, despite its proven efficiency, suffers from a “chicken or egg” scenario, in that, poor spatial features lead to inadequate spectral alignment and vice versa during training, often resulting in slow convergence, high computational costs, and learning failures, particularly when small datasets are used. Methods A novel method is presented for dense-shape correspondence, whereby the spatial information transformed by neural networks is combined with the projections onto spectral maps to overcome the “chicken or egg” challenge by selectively sampling only points with high confidence in their alignment. These points then contribute to the alignment and spectral loss terms, boosting training, and accelerating convergence by a factor of five. To ensure full unsupervised learning, the Gromov–Hausdorff distance metric was used to select the points with the maximal alignment score displaying most confidence. Results The effectiveness of the proposed approach was demonstrated on several benchmark datasets, whereby results were reported as superior to those of spectral and spatial-based methods. Conclusions The proposed method provides a promising new approach to dense-shape correspondence, addressing the key challenges in the field and offering significant advantages over the current methods, including faster convergence, improved accuracy, and reduced computational costs.
基金financially supported by the National Natural Science Fundation of China(Grant Nos.42161065 and 41461038)。
文摘Understanding the mechanisms and risks of forest fires by building a spatial prediction model is an important means of controlling forest fires.Non-fire point data are important training data for constructing a model,and their quality significantly impacts the prediction performance of the model.However,non-fire point data obtained using existing sampling methods generally suffer from low representativeness.Therefore,this study proposes a non-fire point data sampling method based on geographical similarity to improve the quality of non-fire point samples.The method is based on the idea that the less similar the geographical environment between a sample point and an already occurred fire point,the greater the confidence in being a non-fire point sample.Yunnan Province,China,with a high frequency of forest fires,was used as the study area.We compared the prediction performance of traditional sampling methods and the proposed method using three commonly used forest fire risk prediction models:logistic regression(LR),support vector machine(SVM),and random forest(RF).The results show that the modeling and prediction accuracies of the forest fire prediction models established based on the proposed sampling method are significantly improved compared with those of the traditional sampling method.Specifically,in 2010,the modeling and prediction accuracies improved by 19.1%and 32.8%,respectively,and in 2020,they improved by 13.1%and 24.3%,respectively.Therefore,we believe that collecting non-fire point samples based on the principle of geographical similarity is an effective way to improve the quality of forest fire samples,and thus enhance the prediction of forest fire risk.
基金funded by the National Science Foundation of China(62006068)Hebei Natural Science Foundation(A2021402008),Natural Science Foundation of Scientific Research Project of Higher Education in Hebei Province(ZD2020185,QN2020188)333 Talent Supported Project of Hebei Province(C20221026).
文摘Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.