Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
With the rise of remote collaboration,the demand for advanced storage and collaboration tools has rapidly increased.However,traditional collaboration tools primarily rely on access control,leaving data stored on cloud...With the rise of remote collaboration,the demand for advanced storage and collaboration tools has rapidly increased.However,traditional collaboration tools primarily rely on access control,leaving data stored on cloud servers vulnerable due to insufficient encryption.This paper introduces a novel mechanism that encrypts data in‘bundle’units,designed to meet the dual requirements of efficiency and security for frequently updated collaborative data.Each bundle includes updated information,allowing only the updated portions to be reencrypted when changes occur.The encryption method proposed in this paper addresses the inefficiencies of traditional encryption modes,such as Cipher Block Chaining(CBC)and Counter(CTR),which require decrypting and re-encrypting the entire dataset whenever updates occur.The proposed method leverages update-specific information embedded within data bundles and metadata that maps the relationship between these bundles and the plaintext data.By utilizing this information,the method accurately identifies the modified portions and applies algorithms to selectively re-encrypt only those sections.This approach significantly enhances the efficiency of data updates while maintaining high performance,particularly in large-scale data environments.To validate this approach,we conducted experiments measuring execution time as both the size of the modified data and the total dataset size varied.Results show that the proposed method significantly outperforms CBC and CTR modes in execution speed,with greater performance gains as data size increases.Additionally,our security evaluation confirms that this method provides robust protection against both passive and active attacks.展开更多
A remarkable marine heatwave,known as the“Blob”,occurred in the Northeast Pacific Ocean from late 2013 to early 2016,which displayed strong warm anomalies extending from the surface to a depth of 300 m.This study em...A remarkable marine heatwave,known as the“Blob”,occurred in the Northeast Pacific Ocean from late 2013 to early 2016,which displayed strong warm anomalies extending from the surface to a depth of 300 m.This study employed two assimilation schemes based on the global Climate Forecast System of Nanjing University of Information Science(NUIST-CFS 1.0)to investigate the impact of ocean data assimilation on the seasonal prediction of this extreme marine heatwave.The sea surface temperature(SST)nudging scheme assimilates SST only,while the deterministic ensemble Kalman filter(EnKF)scheme assimilates observations from the surface to the deep ocean.The latter notably improves the forecasting skill for subsurface temperature anomalies,especially at the depth of 100-300 m(the lower layer),outperforming the SST nudging scheme.It excels in predicting both horizontal and vertical heat transport in the lower layer,contributing to improved forecasts of the lower-layer warming during the Blob.These improvements stem from the assimilation of subsurface observational data,which are important in predicting the upper-ocean conditions.The results suggest that assimilating ocean data with the EnKF scheme significantly enhances the accuracy in predicting subsurface temperature anomalies during the Blob and offers better understanding of its underlying mechanisms.展开更多
There is a growing body of clinical research on the utility of synthetic data derivatives,an emerging research tool in medicine.In nephrology,clinicians can use machine learning and artificial intelligence as powerful...There is a growing body of clinical research on the utility of synthetic data derivatives,an emerging research tool in medicine.In nephrology,clinicians can use machine learning and artificial intelligence as powerful aids in their clinical decision-making while also preserving patient privacy.This is especially important given the epidemiology of chronic kidney disease,renal oncology,and hypertension worldwide.However,there remains a need to create a framework for guidance regarding how to better utilize synthetic data as a practical application in this research.展开更多
In the face of data scarcity in the optimization of maintenance strategies for civil aircraft,traditional failure data-driven methods are encountering challenges owing to the increasing reliability of aircraft design....In the face of data scarcity in the optimization of maintenance strategies for civil aircraft,traditional failure data-driven methods are encountering challenges owing to the increasing reliability of aircraft design.This study addresses this issue by presenting a novel combined data fusion algorithm,which serves to enhance the accuracy and reliability of failure rate analysis for a specific aircraft model by integrating historical failure data from similar models as supplementary information.Through a comprehensive analysis of two different maintenance projects,this study illustrates the application process of the algorithm.Building upon the analysis results,this paper introduces the innovative equal integral value method as a replacement for the conventional equal interval method in the context of maintenance schedule optimization.The Monte Carlo simulation example validates that the equivalent essential value method surpasses the traditional method by over 20%in terms of inspection efficiency ratio.This discovery indicates that the equal critical value method not only upholds maintenance efficiency but also substantially decreases workload and maintenance costs.The findings of this study open up novel perspectives for airlines grappling with data scarcity,offer fresh strategies for the optimization of aviation maintenance practices,and chart a new course toward achieving more efficient and cost-effective maintenance schedule optimization through refined data analysis.展开更多
Air temperature is an important indicator to analyze climate change in mountainous areas.ERA5 reanalysis air temperature data are important products that were widely used to analyze temperature change in mountainous a...Air temperature is an important indicator to analyze climate change in mountainous areas.ERA5 reanalysis air temperature data are important products that were widely used to analyze temperature change in mountainous areas.However,the reliability of ERA5 reanalysis air temperature over the Qilian Mountains(QLM)is unclear.In this study,we evaluated the reliability of ERA5 monthly averaged reanalysis 2 m air temperature data using the observations at 17 meteorological stations in the QLM from 1979 to 2017.The results showed that:ERA5 reanalysis monthly averaged air temperature data have a good applicability in the QLM in general(R2=0.99).ERA5 reanalysis temperature data overestimated the observed temperature in the QLM in general.Root mean square error(RMSE)increases with the increasing of elevation range,showing that the reliability of ERA5 reanalysis temperature data is worse in higher elevation than that in lower altitude.ERA5 reanalysis temperature can capture observational warming rates well.All the smallest warming rates of observational temperature and ERA5 reanalysis temperature are found in winter,with the warming rates of 0.393°C/10a and 0.360°C/10a,respectively.This study will provide a reference for the application of ERA5 reanalysis monthly averaged air temperature data at different elevation ranges in the Qilian Mountains.展开更多
Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpe...Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpected channel volatility and thus developing a re-transmission mechanism(e.g.,hybrid automatic repeat request[HARQ])becomes indispensable.In that regard,instead of discarding previously transmitted information,the incremental knowledge-based HARQ(IK-HARQ)is deemed as a more effective mechanism that could sufficiently utilize the information semantics.However,considering the possible existence of semantic ambiguity in image transmission,a simple bit-level cyclic redundancy check(CRC)might compromise the performance of IK-HARQ.Therefore,there emerges a strong incentive to revolutionize the CRC mechanism,thus more effectively reaping the benefits of both SemCom and HARQ.In this paper,built on top of swin transformer-based joint source-channel coding(JSCC)and IK-HARQ,we propose a semantic image transmission framework SC-TDA-HARQ.In particular,different from the conventional CRC,we introduce a topological data analysis(TDA)-based error detection method,which capably digs out the inner topological and geometric information of images,to capture semantic information and determine the necessity for re-transmission.Extensive numerical results validate the effectiveness and efficiency of the proposed SC-TDA-HARQ framework,especially under the limited bandwidth condition,and manifest the superiority of TDA-based error detection method in image transmission.展开更多
Wireless sensor network deployment optimization is a classic NP-hard problem and a popular topic in academic research.However,the current research on wireless sensor network deployment problems uses overly simplistic ...Wireless sensor network deployment optimization is a classic NP-hard problem and a popular topic in academic research.However,the current research on wireless sensor network deployment problems uses overly simplistic models,and there is a significant gap between the research results and actual wireless sensor networks.Some scholars have now modeled data fusion networks to make them more suitable for practical applications.This paper will explore the deployment problem of a stochastic data fusion wireless sensor network(SDFWSN),a model that reflects the randomness of environmental monitoring and uses data fusion techniques widely used in actual sensor networks for information collection.The deployment problem of SDFWSN is modeled as a multi-objective optimization problem.The network life cycle,spatiotemporal coverage,detection rate,and false alarm rate of SDFWSN are used as optimization objectives to optimize the deployment of network nodes.This paper proposes an enhanced multi-objective mongoose optimization algorithm(EMODMOA)to solve the deployment problem of SDFWSN.First,to overcome the shortcomings of the DMOA algorithm,such as its low convergence and tendency to get stuck in a local optimum,an encircling and hunting strategy is introduced into the original algorithm to propose the EDMOA algorithm.The EDMOA algorithm is designed as the EMODMOA algorithm by selecting reference points using the K-Nearest Neighbor(KNN)algorithm.To verify the effectiveness of the proposed algorithm,the EMODMOA algorithm was tested at CEC 2020 and achieved good results.In the SDFWSN deployment problem,the algorithm was compared with the Non-dominated Sorting Genetic Algorithm II(NSGAII),Multiple Objective Particle Swarm Optimization(MOPSO),Multi-Objective Evolutionary Algorithm based on Decomposition(MOEA/D),and Multi-Objective Grey Wolf Optimizer(MOGWO).By comparing and analyzing the performance evaluation metrics and optimization results of the objective functions of the multi-objective algorithms,the algorithm outperforms the other algorithms in the SDFWSN deployment results.To better demonstrate the superiority of the algorithm,simulations of diverse test cases were also performed,and good results were obtained.展开更多
This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key de...This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key design parameters including casing dimensions and detonation positions.The paper details the finite element analysis for fragmentation,the characterizations of the dynamic hardening and fracture models,the generation of comprehensive datasets,and the training of the ANN model.The results show the influence of casing dimensions on fragment velocity distributions,with the tendencies indicating increased resultant velocity with reduced thickness,increased length and diameter.The model's predictive capability is demonstrated through the accurate predictions for both training and testing datasets,showing its potential for the real-time prediction of fragmentation performance.展开更多
The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and hist...The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and historical context,DL offers a powerful complement by enhancing the speed,objectivity,and precision of the classification process.This research explores the significance of image data augmentation techniques in optimizing the performance of convolutional neural networks(CNNs)for geological image analysis,particularly in the classification of igneous,metamorphic,and sedimentary rock types from rock thin section(RTS)images.This study primarily focuses on classic image augmentation techniques and evaluates their impact on model accuracy and precision.Results demonstrate that augmentation techniques like Equalize significantly enhance the model's classification capabilities,achieving an F1-Score of 0.9869 for igneous rocks,0.9884 for metamorphic rocks,and 0.9929 for sedimentary rocks,representing improvements compared to the baseline original results.Moreover,the weighted average F1-Score across all classes and techniques is 0.9886,indicating an enhancement.Conversely,methods like Distort lead to decreased accuracy and F1-Score,with an F1-Score of 0.949 for igneous rocks,0.954 for metamorphic rocks,and 0.9416 for sedimentary rocks,exacerbating the performance compared to the baseline.The study underscores the practicality of image data augmentation in geological image classification and advocates for the adoption of DL methods in this domain for automation and improved results.The findings of this study can benefit various fields,including remote sensing,mineral exploration,and environmental monitoring,by enhancing the accuracy of geological image analysis both for scientific research and industrial applications.展开更多
As gravity field, magnetic field, electric field and seismic wave field are all physical fields, their object function, reverse function and compound function are certainly infinite continuously differentiable functio...As gravity field, magnetic field, electric field and seismic wave field are all physical fields, their object function, reverse function and compound function are certainly infinite continuously differentiable functions which can be expanded into Taylor (Fourier) series within domain of definition and be further reduced into solving stochastic distribution function of series and statistic inference of optimal approximation. This is the basis of combined gravity-magnetic-electric-seismic inversion of stochastic modeling. It is an uncertainty modeling technology of combining gravity-magnetic-electric-seismic inversion built on the basis of separation of field and source gravity-magnetic difference-value (D-value) trend surface, taking distribution-independent fault system as its unit, depths of seismic and electric interfaces of interests as its corresponding bivariate compound reverse function of gravity-magnetic anomalies and using high order polynomial (high order trigonometric function) approximating to its series distribution. The difference from current dominant inversion techniques is that, first, it does not respectively create gravity-seismic, magnetic-seismic deterministic inversion model from theoretical model, but combines gravity-magnetic-electric-seismic stochastic inversion model from stochastic model; second, after the concept of equivalent geological body being introduced, using feature of independent variable of gravity-magnetic field functions, taking density and susceptibility related to gravity-magnetic function as default parameters of model, the deterministic model is established owing to better solution to the contradiction of difficulty in identifying strata and less test analytical data for density and susceptibility in newly explored area; third, under assumption of independent parent distribution, a real modeling by strata, the problem of difficult plane closure arising in profile modeling is avoided. This technology has richer and more detailed fault and strata information than sparse pattern seismic data in newly explored area, successfully inverses and plots structural map of Indosinian discontinuity in Hefei basin with combined gravity-magnetic-electric-seismic inversion. With development of high precision gravity-magnetic and overall geophysical technology, it is certain for introducing new methods of stochastic modeling and computational intelligence and promoting the development of combined gravity-magnetic-electric-seismic inversion to open a new substantial path.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapi...With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.展开更多
Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende...Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.展开更多
Since the impoundment of Three Gorges Reservoir(TGR)in 2003,numerous slopes have experienced noticeable movement or destabilization owing to reservoir level changes and seasonal rainfall.One case is the Outang landsli...Since the impoundment of Three Gorges Reservoir(TGR)in 2003,numerous slopes have experienced noticeable movement or destabilization owing to reservoir level changes and seasonal rainfall.One case is the Outang landslide,a large-scale and active landslide,on the south bank of the Yangtze River.The latest monitoring data and site investigations available are analyzed to establish spatial and temporal landslide deformation characteristics.Data mining technology,including the two-step clustering and Apriori algorithm,is then used to identify the dominant triggers of landslide movement.In the data mining process,the two-step clustering method clusters the candidate triggers and displacement rate into several groups,and the Apriori algorithm generates correlation criteria for the cause-and-effect.The analysis considers multiple locations of the landslide and incorporates two types of time scales:longterm deformation on a monthly basis and short-term deformation on a daily basis.This analysis shows that the deformations of the Outang landslide are driven by both rainfall and reservoir water while its deformation varies spatiotemporally mainly due to the difference in local responses to hydrological factors.The data mining results reveal different dominant triggering factors depending on the monitoring frequency:the monthly and bi-monthly cumulative rainfall control the monthly deformation,and the 10-d cumulative rainfall and the 5-d cumulative drop of water level in the reservoir dominate the daily deformation of the landslide.It is concluded that the spatiotemporal deformation pattern and data mining rules associated with precipitation and reservoir water level have the potential to be broadly implemented for improving landslide prevention and control in the dam reservoirs and other landslideprone areas.展开更多
Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interp...Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation.展开更多
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to ...When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.展开更多
Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective dia...Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective diagnosis.In this paper,we propose an ensemble summarization method that combines clustering and sampling to create a summary of the original data to ensure the inclusion of rare patterns.To the best of our knowledge,there has been no such technique available to augment the performance of anomaly detection techniques and simultaneously increase the efficiency of medical diagnosis.The performance of popular anomaly detection algorithms increases significantly in terms of accuracy and computational complexity when the summaries are used.Therefore,the medical diagnosis becomes more effective,and our experimental results reflect that the combination of the proposed summarization scheme and all underlying algorithms used in this paper outperforms the most popular anomaly detection techniques.展开更多
Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary w...Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys.展开更多
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
基金supported by the Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(RS-2024-00399401,Development of Quantum-Safe Infrastructure Migration and Quantum Security Verification Technologies).
文摘With the rise of remote collaboration,the demand for advanced storage and collaboration tools has rapidly increased.However,traditional collaboration tools primarily rely on access control,leaving data stored on cloud servers vulnerable due to insufficient encryption.This paper introduces a novel mechanism that encrypts data in‘bundle’units,designed to meet the dual requirements of efficiency and security for frequently updated collaborative data.Each bundle includes updated information,allowing only the updated portions to be reencrypted when changes occur.The encryption method proposed in this paper addresses the inefficiencies of traditional encryption modes,such as Cipher Block Chaining(CBC)and Counter(CTR),which require decrypting and re-encrypting the entire dataset whenever updates occur.The proposed method leverages update-specific information embedded within data bundles and metadata that maps the relationship between these bundles and the plaintext data.By utilizing this information,the method accurately identifies the modified portions and applies algorithms to selectively re-encrypt only those sections.This approach significantly enhances the efficiency of data updates while maintaining high performance,particularly in large-scale data environments.To validate this approach,we conducted experiments measuring execution time as both the size of the modified data and the total dataset size varied.Results show that the proposed method significantly outperforms CBC and CTR modes in execution speed,with greater performance gains as data size increases.Additionally,our security evaluation confirms that this method provides robust protection against both passive and active attacks.
基金supported by the National Natural Science Foundation of China [grant number 42030605]the National Key R&D Program of China [grant number 2020YFA0608004]。
文摘A remarkable marine heatwave,known as the“Blob”,occurred in the Northeast Pacific Ocean from late 2013 to early 2016,which displayed strong warm anomalies extending from the surface to a depth of 300 m.This study employed two assimilation schemes based on the global Climate Forecast System of Nanjing University of Information Science(NUIST-CFS 1.0)to investigate the impact of ocean data assimilation on the seasonal prediction of this extreme marine heatwave.The sea surface temperature(SST)nudging scheme assimilates SST only,while the deterministic ensemble Kalman filter(EnKF)scheme assimilates observations from the surface to the deep ocean.The latter notably improves the forecasting skill for subsurface temperature anomalies,especially at the depth of 100-300 m(the lower layer),outperforming the SST nudging scheme.It excels in predicting both horizontal and vertical heat transport in the lower layer,contributing to improved forecasts of the lower-layer warming during the Blob.These improvements stem from the assimilation of subsurface observational data,which are important in predicting the upper-ocean conditions.The results suggest that assimilating ocean data with the EnKF scheme significantly enhances the accuracy in predicting subsurface temperature anomalies during the Blob and offers better understanding of its underlying mechanisms.
文摘There is a growing body of clinical research on the utility of synthetic data derivatives,an emerging research tool in medicine.In nephrology,clinicians can use machine learning and artificial intelligence as powerful aids in their clinical decision-making while also preserving patient privacy.This is especially important given the epidemiology of chronic kidney disease,renal oncology,and hypertension worldwide.However,there remains a need to create a framework for guidance regarding how to better utilize synthetic data as a practical application in this research.
文摘In the face of data scarcity in the optimization of maintenance strategies for civil aircraft,traditional failure data-driven methods are encountering challenges owing to the increasing reliability of aircraft design.This study addresses this issue by presenting a novel combined data fusion algorithm,which serves to enhance the accuracy and reliability of failure rate analysis for a specific aircraft model by integrating historical failure data from similar models as supplementary information.Through a comprehensive analysis of two different maintenance projects,this study illustrates the application process of the algorithm.Building upon the analysis results,this paper introduces the innovative equal integral value method as a replacement for the conventional equal interval method in the context of maintenance schedule optimization.The Monte Carlo simulation example validates that the equivalent essential value method surpasses the traditional method by over 20%in terms of inspection efficiency ratio.This discovery indicates that the equal critical value method not only upholds maintenance efficiency but also substantially decreases workload and maintenance costs.The findings of this study open up novel perspectives for airlines grappling with data scarcity,offer fresh strategies for the optimization of aviation maintenance practices,and chart a new course toward achieving more efficient and cost-effective maintenance schedule optimization through refined data analysis.
基金financially supported by the National Natural Science Foundation of China(No.41621001)。
文摘Air temperature is an important indicator to analyze climate change in mountainous areas.ERA5 reanalysis air temperature data are important products that were widely used to analyze temperature change in mountainous areas.However,the reliability of ERA5 reanalysis air temperature over the Qilian Mountains(QLM)is unclear.In this study,we evaluated the reliability of ERA5 monthly averaged reanalysis 2 m air temperature data using the observations at 17 meteorological stations in the QLM from 1979 to 2017.The results showed that:ERA5 reanalysis monthly averaged air temperature data have a good applicability in the QLM in general(R2=0.99).ERA5 reanalysis temperature data overestimated the observed temperature in the QLM in general.Root mean square error(RMSE)increases with the increasing of elevation range,showing that the reliability of ERA5 reanalysis temperature data is worse in higher elevation than that in lower altitude.ERA5 reanalysis temperature can capture observational warming rates well.All the smallest warming rates of observational temperature and ERA5 reanalysis temperature are found in winter,with the warming rates of 0.393°C/10a and 0.360°C/10a,respectively.This study will provide a reference for the application of ERA5 reanalysis monthly averaged air temperature data at different elevation ranges in the Qilian Mountains.
基金supported in part by the National Key Research and Development Program of China under Grant 2024YFE0200600in part by the National Natural Science Foundation of China under Grant 62071425+3 种基金in part by the Zhejiang Key Research and Development Plan under Grant 2022C01093in part by the Zhejiang Provincial Natural Science Foundation of China under Grant LR23F010005in part by the National Key Laboratory of Wireless Communications Foundation under Grant 2023KP01601in part by the Big Data and Intelligent Computing Key Lab of CQUPT under Grant BDIC-2023-B-001.
文摘Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpected channel volatility and thus developing a re-transmission mechanism(e.g.,hybrid automatic repeat request[HARQ])becomes indispensable.In that regard,instead of discarding previously transmitted information,the incremental knowledge-based HARQ(IK-HARQ)is deemed as a more effective mechanism that could sufficiently utilize the information semantics.However,considering the possible existence of semantic ambiguity in image transmission,a simple bit-level cyclic redundancy check(CRC)might compromise the performance of IK-HARQ.Therefore,there emerges a strong incentive to revolutionize the CRC mechanism,thus more effectively reaping the benefits of both SemCom and HARQ.In this paper,built on top of swin transformer-based joint source-channel coding(JSCC)and IK-HARQ,we propose a semantic image transmission framework SC-TDA-HARQ.In particular,different from the conventional CRC,we introduce a topological data analysis(TDA)-based error detection method,which capably digs out the inner topological and geometric information of images,to capture semantic information and determine the necessity for re-transmission.Extensive numerical results validate the effectiveness and efficiency of the proposed SC-TDA-HARQ framework,especially under the limited bandwidth condition,and manifest the superiority of TDA-based error detection method in image transmission.
基金supported by the National Natural Science Foundation of China under Grant Nos.U21A20464,62066005Innovation Project of Guangxi Graduate Education under Grant No.YCSW2024313.
文摘Wireless sensor network deployment optimization is a classic NP-hard problem and a popular topic in academic research.However,the current research on wireless sensor network deployment problems uses overly simplistic models,and there is a significant gap between the research results and actual wireless sensor networks.Some scholars have now modeled data fusion networks to make them more suitable for practical applications.This paper will explore the deployment problem of a stochastic data fusion wireless sensor network(SDFWSN),a model that reflects the randomness of environmental monitoring and uses data fusion techniques widely used in actual sensor networks for information collection.The deployment problem of SDFWSN is modeled as a multi-objective optimization problem.The network life cycle,spatiotemporal coverage,detection rate,and false alarm rate of SDFWSN are used as optimization objectives to optimize the deployment of network nodes.This paper proposes an enhanced multi-objective mongoose optimization algorithm(EMODMOA)to solve the deployment problem of SDFWSN.First,to overcome the shortcomings of the DMOA algorithm,such as its low convergence and tendency to get stuck in a local optimum,an encircling and hunting strategy is introduced into the original algorithm to propose the EDMOA algorithm.The EDMOA algorithm is designed as the EMODMOA algorithm by selecting reference points using the K-Nearest Neighbor(KNN)algorithm.To verify the effectiveness of the proposed algorithm,the EMODMOA algorithm was tested at CEC 2020 and achieved good results.In the SDFWSN deployment problem,the algorithm was compared with the Non-dominated Sorting Genetic Algorithm II(NSGAII),Multiple Objective Particle Swarm Optimization(MOPSO),Multi-Objective Evolutionary Algorithm based on Decomposition(MOEA/D),and Multi-Objective Grey Wolf Optimizer(MOGWO).By comparing and analyzing the performance evaluation metrics and optimization results of the objective functions of the multi-objective algorithms,the algorithm outperforms the other algorithms in the SDFWSN deployment results.To better demonstrate the superiority of the algorithm,simulations of diverse test cases were also performed,and good results were obtained.
基金supported by Poongsan-KAIST Future Research Center Projectthe fund support provided by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(Grant No.2023R1A2C2005661)。
文摘This study presents a machine learning-based method for predicting fragment velocity distribution in warhead fragmentation under explosive loading condition.The fragment resultant velocities are correlated with key design parameters including casing dimensions and detonation positions.The paper details the finite element analysis for fragmentation,the characterizations of the dynamic hardening and fracture models,the generation of comprehensive datasets,and the training of the ANN model.The results show the influence of casing dimensions on fragment velocity distributions,with the tendencies indicating increased resultant velocity with reduced thickness,increased length and diameter.The model's predictive capability is demonstrated through the accurate predictions for both training and testing datasets,showing its potential for the real-time prediction of fragmentation performance.
文摘The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and historical context,DL offers a powerful complement by enhancing the speed,objectivity,and precision of the classification process.This research explores the significance of image data augmentation techniques in optimizing the performance of convolutional neural networks(CNNs)for geological image analysis,particularly in the classification of igneous,metamorphic,and sedimentary rock types from rock thin section(RTS)images.This study primarily focuses on classic image augmentation techniques and evaluates their impact on model accuracy and precision.Results demonstrate that augmentation techniques like Equalize significantly enhance the model's classification capabilities,achieving an F1-Score of 0.9869 for igneous rocks,0.9884 for metamorphic rocks,and 0.9929 for sedimentary rocks,representing improvements compared to the baseline original results.Moreover,the weighted average F1-Score across all classes and techniques is 0.9886,indicating an enhancement.Conversely,methods like Distort lead to decreased accuracy and F1-Score,with an F1-Score of 0.949 for igneous rocks,0.954 for metamorphic rocks,and 0.9416 for sedimentary rocks,exacerbating the performance compared to the baseline.The study underscores the practicality of image data augmentation in geological image classification and advocates for the adoption of DL methods in this domain for automation and improved results.The findings of this study can benefit various fields,including remote sensing,mineral exploration,and environmental monitoring,by enhancing the accuracy of geological image analysis both for scientific research and industrial applications.
文摘As gravity field, magnetic field, electric field and seismic wave field are all physical fields, their object function, reverse function and compound function are certainly infinite continuously differentiable functions which can be expanded into Taylor (Fourier) series within domain of definition and be further reduced into solving stochastic distribution function of series and statistic inference of optimal approximation. This is the basis of combined gravity-magnetic-electric-seismic inversion of stochastic modeling. It is an uncertainty modeling technology of combining gravity-magnetic-electric-seismic inversion built on the basis of separation of field and source gravity-magnetic difference-value (D-value) trend surface, taking distribution-independent fault system as its unit, depths of seismic and electric interfaces of interests as its corresponding bivariate compound reverse function of gravity-magnetic anomalies and using high order polynomial (high order trigonometric function) approximating to its series distribution. The difference from current dominant inversion techniques is that, first, it does not respectively create gravity-seismic, magnetic-seismic deterministic inversion model from theoretical model, but combines gravity-magnetic-electric-seismic stochastic inversion model from stochastic model; second, after the concept of equivalent geological body being introduced, using feature of independent variable of gravity-magnetic field functions, taking density and susceptibility related to gravity-magnetic function as default parameters of model, the deterministic model is established owing to better solution to the contradiction of difficulty in identifying strata and less test analytical data for density and susceptibility in newly explored area; third, under assumption of independent parent distribution, a real modeling by strata, the problem of difficult plane closure arising in profile modeling is avoided. This technology has richer and more detailed fault and strata information than sparse pattern seismic data in newly explored area, successfully inverses and plots structural map of Indosinian discontinuity in Hefei basin with combined gravity-magnetic-electric-seismic inversion. With development of high precision gravity-magnetic and overall geophysical technology, it is certain for introducing new methods of stochastic modeling and computational intelligence and promoting the development of combined gravity-magnetic-electric-seismic inversion to open a new substantial path.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
基金supported by China’s National Natural Science Foundation(Nos.62072249,62072056)This work is also funded by the National Science Foundation of Hunan Province(2020JJ2029).
文摘With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.
基金This research was financially supported by the Ministry of Trade,Industry,and Energy(MOTIE),Korea,under the“Project for Research and Development with Middle Markets Enterprises and DNA(Data,Network,AI)Universities”(AI-based Safety Assessment and Management System for Concrete Structures)(ReferenceNumber P0024559)supervised by theKorea Institute for Advancement of Technology(KIAT).
文摘Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.
基金supported by the Natural Science Foundation of Shandong Province,China(Grant No.ZR2021QD032)。
文摘Since the impoundment of Three Gorges Reservoir(TGR)in 2003,numerous slopes have experienced noticeable movement or destabilization owing to reservoir level changes and seasonal rainfall.One case is the Outang landslide,a large-scale and active landslide,on the south bank of the Yangtze River.The latest monitoring data and site investigations available are analyzed to establish spatial and temporal landslide deformation characteristics.Data mining technology,including the two-step clustering and Apriori algorithm,is then used to identify the dominant triggers of landslide movement.In the data mining process,the two-step clustering method clusters the candidate triggers and displacement rate into several groups,and the Apriori algorithm generates correlation criteria for the cause-and-effect.The analysis considers multiple locations of the landslide and incorporates two types of time scales:longterm deformation on a monthly basis and short-term deformation on a daily basis.This analysis shows that the deformations of the Outang landslide are driven by both rainfall and reservoir water while its deformation varies spatiotemporally mainly due to the difference in local responses to hydrological factors.The data mining results reveal different dominant triggering factors depending on the monitoring frequency:the monthly and bi-monthly cumulative rainfall control the monthly deformation,and the 10-d cumulative rainfall and the 5-d cumulative drop of water level in the reservoir dominate the daily deformation of the landslide.It is concluded that the spatiotemporal deformation pattern and data mining rules associated with precipitation and reservoir water level have the potential to be broadly implemented for improving landslide prevention and control in the dam reservoirs and other landslideprone areas.
基金supported by NOAA JTTI award via Grant #NA21OAR4590165, NOAA GOESR Program funding via Grant #NA16OAR4320115provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA-University of Oklahoma Cooperative Agreement #NA11OAR4320072, U.S. Department of Commercesupported by the National Oceanic and Atmospheric Administration (NOAA) of the U.S. Department of Commerce via Grant #NA18NWS4680063。
文摘Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation.
基金supported by the Yunnan Major Scientific and Technological Projects(Grant No.202302AD080001)the National Natural Science Foundation,China(No.52065033).
文摘When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles.
文摘Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective diagnosis.In this paper,we propose an ensemble summarization method that combines clustering and sampling to create a summary of the original data to ensure the inclusion of rare patterns.To the best of our knowledge,there has been no such technique available to augment the performance of anomaly detection techniques and simultaneously increase the efficiency of medical diagnosis.The performance of popular anomaly detection algorithms increases significantly in terms of accuracy and computational complexity when the summaries are used.Therefore,the medical diagnosis becomes more effective,and our experimental results reflect that the combination of the proposed summarization scheme and all underlying algorithms used in this paper outperforms the most popular anomaly detection techniques.
基金Korea Institute of Energy Technology Evaluation and Planning(KETEP)grant funded by the Korea government(Grant No.20214000000140,Graduate School of Convergence for Clean Energy Integrated Power Generation)Korea Basic Science Institute(National Research Facilities and Equipment Center)grant funded by the Ministry of Education(2021R1A6C101A449)the National Research Foundation of Korea grant funded by the Ministry of Science and ICT(2021R1A2C1095139),Republic of Korea。
文摘Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys.