With the rapid development of the global economy, maritime transportation has become much more convenient due to large capacities and low freight. However, this means the sea lanes are becoming more and more crowded,l...With the rapid development of the global economy, maritime transportation has become much more convenient due to large capacities and low freight. However, this means the sea lanes are becoming more and more crowded,leading to high probabilities of marine accidents in complex maritime environments. According to relevant historical statistics, a large number of accidents have happened in water areas that lack high precision navigation data, which can be utilized to enhance navigation safety. The purpose of this work was to carry out ship route planning automatically, by mining historical big automatic identification system(AIS) data. It is well-known that experiential navigation information hidden in maritime big data could be automatically extracted using advanced data mining techniques;assisting in the generation of safe and reliable ship planning routes for complex maritime environments. In this paper, a novel method is proposed to construct a big data-driven framework for generating ship planning routes automatically, under varying navigation conditions. The method performs density-based spatial clustering of applications with noise first on a large number of ship trajectories to form different trajectory vector clusters. Then, it iteratively calculates its centerline in the trajectory vector cluster, and constructs the waterway network from the node-arc topology relationship among these centerlines. The generation of shipping route could be based on the waterway network and conducted by rasterizing the marine environment risks for the sea area not covered by the waterway network. Numerous experiments have been conducted on different AIS data sets in different water areas, and the experimental results have demonstrated the effectiveness of the framework of the ship route planning proposed in this paper.展开更多
This study introduces an innovative“Big Model”strategy to enhance Bridge Structural Health Monitoring(SHM)using a Convolutional Neural Network(CNN),time-frequency analysis,and fine element analysis.Leveraging ensemb...This study introduces an innovative“Big Model”strategy to enhance Bridge Structural Health Monitoring(SHM)using a Convolutional Neural Network(CNN),time-frequency analysis,and fine element analysis.Leveraging ensemble methods,collaborative learning,and distributed computing,the approach effectively manages the complexity and scale of large-scale bridge data.The CNN employs transfer learning,fine-tuning,and continuous monitoring to optimize models for adaptive and accurate structural health assessments,focusing on extracting meaningful features through time-frequency analysis.By integrating Finite Element Analysis,time-frequency analysis,and CNNs,the strategy provides a comprehensive understanding of bridge health.Utilizing diverse sensor data,sophisticated feature extraction,and advanced CNN architecture,the model is optimized through rigorous preprocessing and hyperparameter tuning.This approach significantly enhances the ability to make accurate predictions,monitor structural health,and support proactive maintenance practices,thereby ensuring the safety and longevity of critical infrastructure.展开更多
With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair compar...With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair comparison, and performance rationalization of these methods are lacking, due to the scattered existing studies. To address these issues, we develop 20 capacity estimation methods from three perspectives:charging sequence construction, input forms, and ML models. 22,582 charging curves are generated from 44 cells with different battery chemistry and operating conditions to validate the performance. Through comprehensive and unbiased comparison, the long short-term memory(LSTM) based neural network exhibits the best accuracy and robustness. Across all 6503 tested samples, the mean absolute percentage error(MAPE) for capacity estimation using LSTM is 0.61%, with a maximum error of only 3.94%. Even with the addition of 3 m V voltage noise or the extension of sampling intervals to 60 s, the average MAPE remains below 2%. Furthermore, the charging sequences are provided with physical explanations related to battery degradation to enhance confidence in their application. Recommendations for using other competitive methods are also presented. This work provides valuable insights and guidance for estimating battery capacity based on partial charging curves.展开更多
The reliable operation of high-speed wire rod finishing mills is crucial in the steel production enterprise.As complex system-level equipment,it is difficult for high-speed wire rod finishing mills to realize fault lo...The reliable operation of high-speed wire rod finishing mills is crucial in the steel production enterprise.As complex system-level equipment,it is difficult for high-speed wire rod finishing mills to realize fault location and real-time monitoring.To solve the above problems,an expert experience and data-driven-based hybrid fault diagnosis method for high-speed wire rod finishing mills is proposed in this paper.First,based on its mechanical structure,time and frequency domain analysis are improved in fault feature extraction.The approach of combining virtual value,peak value with kurtosis value index,is adopted in time domain analysis.Speed adjustment and side frequency analysis are proposed in frequency domain analysis to obtain accurate component characteristic frequency and its corresponding sideband.Then,according to time and frequency domain characteristics,fault location based on expert experience is proposed to get an accurate fault result.Finally,the proposed method is implemented in the equipment intelligent diagnosis system.By taking an equipment fault on site,for example,the effectiveness of the proposed method is illustrated in the system.展开更多
The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction...There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction temperature estimation approach based on neural network without additional cost is proposed and the lifetime calculation for IGBT using electric vehicle big data is performed.The direct current(DC)voltage,operation current,switching frequency,negative thermal coefficient thermistor(NTC)temperature and IGBT lifetime are inputs.And the junction temperature(T_(j))is output.With the rain flow counting method,the classified irregular temperatures are brought into the life model for the failure cycles.The fatigue accumulation method is then used to calculate the IGBT lifetime.To solve the limited computational and storage resources of electric vehicle controllers,the operation of IGBT lifetime calculation is running on a big data platform.The lifetime is then transmitted wirelessly to electric vehicles as input for neural network.Thus the junction temperature of IGBT under long-term operating conditions can be accurately estimated.A test platform of the motor controller combined with the vehicle big data server is built for the IGBT accelerated aging test.Subsequently,the IGBT lifetime predictions are derived from the junction temperature estimation by the neural network method and the thermal network method.The experiment shows that the lifetime prediction based on a neural network with big data demonstrates a higher accuracy than that of the thermal network,which improves the reliability evaluation of system.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
Big Bang nucleosynthesis(BBN)theory predicts the primordial abundances of the light elements^(2) H(referred to as deuterium,or D for short),^(3)He,^(4)He,and^(7) Li produced in the early universe.Among these,deuterium...Big Bang nucleosynthesis(BBN)theory predicts the primordial abundances of the light elements^(2) H(referred to as deuterium,or D for short),^(3)He,^(4)He,and^(7) Li produced in the early universe.Among these,deuterium,the first nuclide produced by BBN,is a key primordial material for subsequent reactions.To date,the uncertainty in predicted deuterium abundance(D/H)remains larger than the observational precision.In this study,the Monte Carlo simulation code PRIMAT was used to investigate the sensitivity of 11 important BBN reactions to deuterium abundance.We found that the reaction rate uncertainties of the four reactions d(d,n)^(3)He,d(d,p)t,d(p,γ)^(3)He,and p(n,γ)d had the largest influence on the calculated D/H uncertainty.Currently,the calculated D/H uncertainty cannot reach observational precision even with the recent LUNA precise d(p,γ)^(3) He rate.From the nuclear physics aspect,there is still room to largely reduce the reaction-rate uncertainties;hence,further measurements of the important reactions involved in BBN are still necessary.A photodisintegration experiment will be conducted at the Shanghai Laser Electron Gamma Source Facility to precisely study the deuterium production reaction of p(n,γ)d.展开更多
The shale gas development process is complex in terms of its flow mechanisms and the accuracy of the production forecasting is influenced by geological parameters and engineering parameters.Therefore,to quantitatively...The shale gas development process is complex in terms of its flow mechanisms and the accuracy of the production forecasting is influenced by geological parameters and engineering parameters.Therefore,to quantitatively evaluate the relative importance of model parameters on the production forecasting performance,sensitivity analysis of parameters is required.The parameters are ranked according to the sensitivity coefficients for the subsequent optimization scheme design.A data-driven global sensitivity analysis(GSA)method using convolutional neural networks(CNN)is proposed to identify the influencing parameters in shale gas production.The CNN is trained on a large dataset,validated against numerical simulations,and utilized as a surrogate model for efficient sensitivity analysis.Our approach integrates CNN with the Sobol'global sensitivity analysis method,presenting three key scenarios for sensitivity analysis:analysis of the production stage as a whole,analysis by fixed time intervals,and analysis by declining rate.The findings underscore the predominant influence of reservoir thickness and well length on shale gas production.Furthermore,the temporal sensitivity analysis reveals the dynamic shifts in parameter importance across the distinct production stages.展开更多
For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to in...For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to investigate solutions using the Ptype learning control scheme. Initially, we demonstrate the necessity of gradient information for achieving the best approximation.Subsequently, we propose an input-output-driven learning gain design to handle the imprecise gradients of a class of uncertain systems. However, it is discovered that the desired performance may not be attainable when faced with incomplete information.To address this issue, an extended iterative learning control scheme is introduced. In this scheme, the tracking errors are modified through output data sampling, which incorporates lowmemory footprints and offers flexibility in learning gain design.The input sequence is shown to converge towards the desired input, resulting in an output that is closest to the given reference in the least square sense. Numerical simulations are provided to validate the theoretical findings.展开更多
This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod ...This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod projectiles into semi-infinite metal targets from experimental measurements.The derived mathematical expressions of dimensionless quantities are simplified by the examination of the exponent matrix and coupling relationships between feature variables.As a physics-based dimension reduction methodology,this way reduces high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless quantities in penetrating cases.Then the relative importance of various dimensionless feature variables on the penetration efficiencies for four impacting conditions is evaluated through feature selection engineering.The results indicate that the selected critical dimensionless feature variables by this synergistic method,without referring to the complex theoretical equations and aiding in the detailed knowledge of penetration mechanics,are in accordance with those reported in the reference.Lastly,the determined dimensionless quantities can be efficiently applied to conduct semi-empirical analysis for the specific penetrating case,and the reliability of regression functions is validated.展开更多
Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policy...Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policymanagement efficiency and difficulty in accurately describing the access control policy. To overcome theseproblems, this paper proposes a big data access control mechanism based on a two-layer permission decisionstructure. This mechanism extends the attribute-based access control (ABAC) model. Business attributes areintroduced in the ABAC model as business constraints between entities. The proposed mechanism implementsa two-layer permission decision structure composed of the inherent attributes of access control entities and thebusiness attributes, which constitute the general permission decision algorithm based on logical calculation andthe business permission decision algorithm based on a bi-directional long short-term memory (BiLSTM) neuralnetwork, respectively. The general permission decision algorithm is used to implement accurate policy decisions,while the business permission decision algorithm implements fuzzy decisions based on the business constraints.The BiLSTM neural network is used to calculate the similarity of the business attributes to realize intelligent,adaptive, and efficient access control permission decisions. Through the two-layer permission decision structure,the complex and diverse big data access control management requirements can be satisfied by considering thesecurity and availability of resources. Experimental results show that the proposed mechanism is effective andreliable. In summary, it can efficiently support the secure sharing of big data resources.展开更多
Utilizing machine learning techniques for data-driven diagnosis of high temperature PEM fuel cells is beneficial and meaningful to the system durability. Nevertheless, ensuring the robustness of diagnosis remains a cr...Utilizing machine learning techniques for data-driven diagnosis of high temperature PEM fuel cells is beneficial and meaningful to the system durability. Nevertheless, ensuring the robustness of diagnosis remains a critical and challenging task in real application. To enhance the robustness of diagnosis and achieve a more thorough evaluation of diagnostic performance, a robust diagnostic procedure based on electrochemical impedance spectroscopy (EIS) and a new method for evaluation of the diagnosis robustness was proposed and investigated in this work. To improve the diagnosis robustness: (1) the degradation mechanism of different faults in the high temperature PEM fuel cell was first analyzed via the distribution of relaxation time of EIS to determine the equivalent circuit model (ECM) with better interpretability, simplicity and accuracy;(2) the feature extraction was implemented on the identified parameters of the ECM and extra attention was paid to distinguishing between the long-term normal degradation and other faults;(3) a Siamese Network was adopted to get features with higher robustness in a new embedding. The diagnosis was conducted using 6 classic classification algorithms—support vector machine (SVM), K-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), random forest (RF), and Naive Bayes employing a dataset comprising a total of 1935 collected EIS. To evaluate the robustness of trained models: (1) different levels of errors were added to the features for performance evaluation;(2) a robustness coefficient (Roubust_C) was defined for a quantified and explicit evaluation of the diagnosis robustness. The diagnostic models employing the proposed feature extraction method can not only achieve the higher performance of around 100% but also higher robustness for diagnosis models. Despite the initial performance being similar, the KNN demonstrated a superior robustness after feature selection and re-embedding by triplet-loss method, which suggests the necessity of robustness evaluation for the machine learning models and the effectiveness of the defined robustness coefficient. This work hopes to give new insights to the robust diagnosis of high temperature PEM fuel cells and more comprehensive performance evaluation of the data-driven method for diagnostic application.展开更多
Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-s...Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.展开更多
Lithium-ion batteries are the preferred green energy storage method and are equipped with intelligent battery management systems(BMSs)that efficiently manage the batteries.This not only ensures the safety performance ...Lithium-ion batteries are the preferred green energy storage method and are equipped with intelligent battery management systems(BMSs)that efficiently manage the batteries.This not only ensures the safety performance of the batteries but also significantly improves their efficiency and reduces their damage rate.Throughout their whole life cycle,lithium-ion batteries undergo aging and performance degradation due to diverse external environments and irregular degradation of internal materials.This degradation is reflected in the state of health(SOH)assessment.Therefore,this review offers the first comprehensive analysis of battery SOH estimation strategies across the entire lifecycle over the past five years,highlighting common research focuses rooted in data-driven methods.It delves into various dimensions such as dataset integration and preprocessing,health feature parameter extraction,and the construction of SOH estimation models.These approaches unearth hidden insights within data,addressing the inherent tension between computational complexity and estimation accuracy.To enha nce support for in-vehicle implementation,cloud computing,and the echelon technologies of battery recycling,remanufacturing,and reuse,as well as to offer insights into these technologies,a segmented management approach will be introduced in the future.This will encompass source domain data processing,multi-feature factor reconfiguration,hybrid drive modeling,parameter correction mechanisms,and fulltime health management.Based on the best SOH estimation outcomes,health strategies tailored to different stages can be devised in the future,leading to the establishment of a comprehensive SOH assessment framework.This will mitigate cross-domain distribution disparities and facilitate adaptation to a broader array of dynamic operation protocols.This article reviews the current research landscape from four perspectives and discusses the challenges that lie ahead.Researchers and practitioners can gain a comprehensive understanding of battery SOH estimation methods,offering valuable insights for the development of advanced battery management systems and embedded application research.展开更多
Aiming at the tracking problem of a class of discrete nonaffine nonlinear multi-input multi-output(MIMO) repetitive systems subjected to separable and nonseparable disturbances, a novel data-driven iterative learning ...Aiming at the tracking problem of a class of discrete nonaffine nonlinear multi-input multi-output(MIMO) repetitive systems subjected to separable and nonseparable disturbances, a novel data-driven iterative learning control(ILC) scheme based on the zeroing neural networks(ZNNs) is proposed. First, the equivalent dynamic linearization data model is obtained by means of dynamic linearization technology, which exists theoretically in the iteration domain. Then, the iterative extended state observer(IESO) is developed to estimate the disturbance and the coupling between systems, and the decoupled dynamic linearization model is obtained for the purpose of controller synthesis. To solve the zero-seeking tracking problem with inherent tolerance of noise,an ILC based on noise-tolerant modified ZNN is proposed. The strict assumptions imposed on the initialization conditions of each iteration in the existing ILC methods can be absolutely removed with our method. In addition, theoretical analysis indicates that the modified ZNN can converge to the exact solution of the zero-seeking tracking problem. Finally, a generalized example and an application-oriented example are presented to verify the effectiveness and superiority of the proposed process.展开更多
To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) s...To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) sequestration models do not adequately consider multiple transport mechanisms.Moreover,the evaluation of CO_(2) storage processes usually involves laborious and time-consuming numerical simulations unsuitable for practical prediction and decision-making.In this paper,an integrated model involving gas diffusion,adsorption,dissolution,slip flow,and Darcy flow is proposed to accurately characterize CO_(2) storage in depleted shale reservoirs,supporting the establishment of a training database.On this basis,a hybrid physics-informed data-driven neural network(HPDNN)is developed as a deep learning surrogate for prediction and inversion.By incorporating multiple sources of scientific knowledge,the HPDNN can be configured with limited simulation resources,significantly accelerating the forward and inversion processes.Furthermore,the HPDNN can more intelligently predict injection performance,precisely perform reservoir parameter inversion,and reasonably evaluate the CO_(2) storage capacity under complicated scenarios.The validation and test results demonstrate that the HPDNN can ensure high accuracy and strong robustness across an extensive applicability range when dealing with field data with multiple noise sources.This study has tremendous potential to replace traditional modeling tools for predicting and making decisions about CO_(2) storage projects in depleted shale reservoirs.展开更多
The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff...The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.展开更多
Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist...Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.展开更多
文摘With the rapid development of the global economy, maritime transportation has become much more convenient due to large capacities and low freight. However, this means the sea lanes are becoming more and more crowded,leading to high probabilities of marine accidents in complex maritime environments. According to relevant historical statistics, a large number of accidents have happened in water areas that lack high precision navigation data, which can be utilized to enhance navigation safety. The purpose of this work was to carry out ship route planning automatically, by mining historical big automatic identification system(AIS) data. It is well-known that experiential navigation information hidden in maritime big data could be automatically extracted using advanced data mining techniques;assisting in the generation of safe and reliable ship planning routes for complex maritime environments. In this paper, a novel method is proposed to construct a big data-driven framework for generating ship planning routes automatically, under varying navigation conditions. The method performs density-based spatial clustering of applications with noise first on a large number of ship trajectories to form different trajectory vector clusters. Then, it iteratively calculates its centerline in the trajectory vector cluster, and constructs the waterway network from the node-arc topology relationship among these centerlines. The generation of shipping route could be based on the waterway network and conducted by rasterizing the marine environment risks for the sea area not covered by the waterway network. Numerous experiments have been conducted on different AIS data sets in different water areas, and the experimental results have demonstrated the effectiveness of the framework of the ship route planning proposed in this paper.
文摘This study introduces an innovative“Big Model”strategy to enhance Bridge Structural Health Monitoring(SHM)using a Convolutional Neural Network(CNN),time-frequency analysis,and fine element analysis.Leveraging ensemble methods,collaborative learning,and distributed computing,the approach effectively manages the complexity and scale of large-scale bridge data.The CNN employs transfer learning,fine-tuning,and continuous monitoring to optimize models for adaptive and accurate structural health assessments,focusing on extracting meaningful features through time-frequency analysis.By integrating Finite Element Analysis,time-frequency analysis,and CNNs,the strategy provides a comprehensive understanding of bridge health.Utilizing diverse sensor data,sophisticated feature extraction,and advanced CNN architecture,the model is optimized through rigorous preprocessing and hyperparameter tuning.This approach significantly enhances the ability to make accurate predictions,monitor structural health,and support proactive maintenance practices,thereby ensuring the safety and longevity of critical infrastructure.
基金supported by the National Natural Science Foundation of China (52075420)the National Key Research and Development Program of China (2020YFB1708400)。
文摘With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair comparison, and performance rationalization of these methods are lacking, due to the scattered existing studies. To address these issues, we develop 20 capacity estimation methods from three perspectives:charging sequence construction, input forms, and ML models. 22,582 charging curves are generated from 44 cells with different battery chemistry and operating conditions to validate the performance. Through comprehensive and unbiased comparison, the long short-term memory(LSTM) based neural network exhibits the best accuracy and robustness. Across all 6503 tested samples, the mean absolute percentage error(MAPE) for capacity estimation using LSTM is 0.61%, with a maximum error of only 3.94%. Even with the addition of 3 m V voltage noise or the extension of sampling intervals to 60 s, the average MAPE remains below 2%. Furthermore, the charging sequences are provided with physical explanations related to battery degradation to enhance confidence in their application. Recommendations for using other competitive methods are also presented. This work provides valuable insights and guidance for estimating battery capacity based on partial charging curves.
基金the National Key Research and Development Program of China under Grant 2021YFB3301300the National Natural Science Foundation of China under Grant 62203213+1 种基金the Natural Science Foundation of Jiangsu Province under Grant BK20220332the Open Project Program of Fujian Provincial Key Laboratory of Intelligent Identification and Control of Complex Dynamic System under Grant 2022A0004.
文摘The reliable operation of high-speed wire rod finishing mills is crucial in the steel production enterprise.As complex system-level equipment,it is difficult for high-speed wire rod finishing mills to realize fault location and real-time monitoring.To solve the above problems,an expert experience and data-driven-based hybrid fault diagnosis method for high-speed wire rod finishing mills is proposed in this paper.First,based on its mechanical structure,time and frequency domain analysis are improved in fault feature extraction.The approach of combining virtual value,peak value with kurtosis value index,is adopted in time domain analysis.Speed adjustment and side frequency analysis are proposed in frequency domain analysis to obtain accurate component characteristic frequency and its corresponding sideband.Then,according to time and frequency domain characteristics,fault location based on expert experience is proposed to get an accurate fault result.Finally,the proposed method is implemented in the equipment intelligent diagnosis system.By taking an equipment fault on site,for example,the effectiveness of the proposed method is illustrated in the system.
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
文摘There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction temperature estimation approach based on neural network without additional cost is proposed and the lifetime calculation for IGBT using electric vehicle big data is performed.The direct current(DC)voltage,operation current,switching frequency,negative thermal coefficient thermistor(NTC)temperature and IGBT lifetime are inputs.And the junction temperature(T_(j))is output.With the rain flow counting method,the classified irregular temperatures are brought into the life model for the failure cycles.The fatigue accumulation method is then used to calculate the IGBT lifetime.To solve the limited computational and storage resources of electric vehicle controllers,the operation of IGBT lifetime calculation is running on a big data platform.The lifetime is then transmitted wirelessly to electric vehicles as input for neural network.Thus the junction temperature of IGBT under long-term operating conditions can be accurately estimated.A test platform of the motor controller combined with the vehicle big data server is built for the IGBT accelerated aging test.Subsequently,the IGBT lifetime predictions are derived from the junction temperature estimation by the neural network method and the thermal network method.The experiment shows that the lifetime prediction based on a neural network with big data demonstrates a higher accuracy than that of the thermal network,which improves the reliability evaluation of system.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
基金supported by the National Key R&D Program of China(No.2022YFA1602401)by the National Natural Science Foundation of China(No.11825504)。
文摘Big Bang nucleosynthesis(BBN)theory predicts the primordial abundances of the light elements^(2) H(referred to as deuterium,or D for short),^(3)He,^(4)He,and^(7) Li produced in the early universe.Among these,deuterium,the first nuclide produced by BBN,is a key primordial material for subsequent reactions.To date,the uncertainty in predicted deuterium abundance(D/H)remains larger than the observational precision.In this study,the Monte Carlo simulation code PRIMAT was used to investigate the sensitivity of 11 important BBN reactions to deuterium abundance.We found that the reaction rate uncertainties of the four reactions d(d,n)^(3)He,d(d,p)t,d(p,γ)^(3)He,and p(n,γ)d had the largest influence on the calculated D/H uncertainty.Currently,the calculated D/H uncertainty cannot reach observational precision even with the recent LUNA precise d(p,γ)^(3) He rate.From the nuclear physics aspect,there is still room to largely reduce the reaction-rate uncertainties;hence,further measurements of the important reactions involved in BBN are still necessary.A photodisintegration experiment will be conducted at the Shanghai Laser Electron Gamma Source Facility to precisely study the deuterium production reaction of p(n,γ)d.
基金supported by the National Natural Science Foundation of China (Nos.52274048 and 52374017)Beijing Natural Science Foundation (No.3222037)the CNPC 14th five-year perspective fundamental research project (No.2021DJ2104)。
文摘The shale gas development process is complex in terms of its flow mechanisms and the accuracy of the production forecasting is influenced by geological parameters and engineering parameters.Therefore,to quantitatively evaluate the relative importance of model parameters on the production forecasting performance,sensitivity analysis of parameters is required.The parameters are ranked according to the sensitivity coefficients for the subsequent optimization scheme design.A data-driven global sensitivity analysis(GSA)method using convolutional neural networks(CNN)is proposed to identify the influencing parameters in shale gas production.The CNN is trained on a large dataset,validated against numerical simulations,and utilized as a surrogate model for efficient sensitivity analysis.Our approach integrates CNN with the Sobol'global sensitivity analysis method,presenting three key scenarios for sensitivity analysis:analysis of the production stage as a whole,analysis by fixed time intervals,and analysis by declining rate.The findings underscore the predominant influence of reservoir thickness and well length on shale gas production.Furthermore,the temporal sensitivity analysis reveals the dynamic shifts in parameter importance across the distinct production stages.
基金supported by the National Natural Science Foundation of China (62173333, 12271522)Beijing Natural Science Foundation (Z210002)the Research Fund of Renmin University of China (2021030187)。
文摘For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to investigate solutions using the Ptype learning control scheme. Initially, we demonstrate the necessity of gradient information for achieving the best approximation.Subsequently, we propose an input-output-driven learning gain design to handle the imprecise gradients of a class of uncertain systems. However, it is discovered that the desired performance may not be attainable when faced with incomplete information.To address this issue, an extended iterative learning control scheme is introduced. In this scheme, the tracking errors are modified through output data sampling, which incorporates lowmemory footprints and offers flexibility in learning gain design.The input sequence is shown to converge towards the desired input, resulting in an output that is closest to the given reference in the least square sense. Numerical simulations are provided to validate the theoretical findings.
基金supported by the National Natural Science Foundation of China(Grant Nos.12272257,12102292,12032006)the special fund for Science and Technology Innovation Teams of Shanxi Province(Nos.202204051002006).
文摘This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod projectiles into semi-infinite metal targets from experimental measurements.The derived mathematical expressions of dimensionless quantities are simplified by the examination of the exponent matrix and coupling relationships between feature variables.As a physics-based dimension reduction methodology,this way reduces high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless quantities in penetrating cases.Then the relative importance of various dimensionless feature variables on the penetration efficiencies for four impacting conditions is evaluated through feature selection engineering.The results indicate that the selected critical dimensionless feature variables by this synergistic method,without referring to the complex theoretical equations and aiding in the detailed knowledge of penetration mechanics,are in accordance with those reported in the reference.Lastly,the determined dimensionless quantities can be efficiently applied to conduct semi-empirical analysis for the specific penetrating case,and the reliability of regression functions is validated.
基金Key Research and Development and Promotion Program of Henan Province(No.222102210069)Zhongyuan Science and Technology Innovation Leading Talent Project(224200510003)National Natural Science Foundation of China(No.62102449).
文摘Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policymanagement efficiency and difficulty in accurately describing the access control policy. To overcome theseproblems, this paper proposes a big data access control mechanism based on a two-layer permission decisionstructure. This mechanism extends the attribute-based access control (ABAC) model. Business attributes areintroduced in the ABAC model as business constraints between entities. The proposed mechanism implementsa two-layer permission decision structure composed of the inherent attributes of access control entities and thebusiness attributes, which constitute the general permission decision algorithm based on logical calculation andthe business permission decision algorithm based on a bi-directional long short-term memory (BiLSTM) neuralnetwork, respectively. The general permission decision algorithm is used to implement accurate policy decisions,while the business permission decision algorithm implements fuzzy decisions based on the business constraints.The BiLSTM neural network is used to calculate the similarity of the business attributes to realize intelligent,adaptive, and efficient access control permission decisions. Through the two-layer permission decision structure,the complex and diverse big data access control management requirements can be satisfied by considering thesecurity and availability of resources. Experimental results show that the proposed mechanism is effective andreliable. In summary, it can efficiently support the secure sharing of big data resources.
基金supported by the Chinese Scholarship Council(Nos.202208320055 and 202108320111)the support from the energy department of Aalborg University was acknowledged.
文摘Utilizing machine learning techniques for data-driven diagnosis of high temperature PEM fuel cells is beneficial and meaningful to the system durability. Nevertheless, ensuring the robustness of diagnosis remains a critical and challenging task in real application. To enhance the robustness of diagnosis and achieve a more thorough evaluation of diagnostic performance, a robust diagnostic procedure based on electrochemical impedance spectroscopy (EIS) and a new method for evaluation of the diagnosis robustness was proposed and investigated in this work. To improve the diagnosis robustness: (1) the degradation mechanism of different faults in the high temperature PEM fuel cell was first analyzed via the distribution of relaxation time of EIS to determine the equivalent circuit model (ECM) with better interpretability, simplicity and accuracy;(2) the feature extraction was implemented on the identified parameters of the ECM and extra attention was paid to distinguishing between the long-term normal degradation and other faults;(3) a Siamese Network was adopted to get features with higher robustness in a new embedding. The diagnosis was conducted using 6 classic classification algorithms—support vector machine (SVM), K-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), random forest (RF), and Naive Bayes employing a dataset comprising a total of 1935 collected EIS. To evaluate the robustness of trained models: (1) different levels of errors were added to the features for performance evaluation;(2) a robustness coefficient (Roubust_C) was defined for a quantified and explicit evaluation of the diagnosis robustness. The diagnostic models employing the proposed feature extraction method can not only achieve the higher performance of around 100% but also higher robustness for diagnosis models. Despite the initial performance being similar, the KNN demonstrated a superior robustness after feature selection and re-embedding by triplet-loss method, which suggests the necessity of robustness evaluation for the machine learning models and the effectiveness of the defined robustness coefficient. This work hopes to give new insights to the robust diagnosis of high temperature PEM fuel cells and more comprehensive performance evaluation of the data-driven method for diagnostic application.
基金funding within the Wheat BigData Project(German Federal Ministry of Food and Agriculture,FKZ2818408B18)。
文摘Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.
基金supported by the National Natural Science Foundation of China (No.62173281,52377217,U23A20651)Sichuan Science and Technology Program (No.24NSFSC0024,23ZDYF0734,23NSFSC1436)+2 种基金Dazhou City School Cooperation Project (No.DZXQHZ006)Technopole Talent Summit Project (No.KJCRCFH08)Robert Gordon University。
文摘Lithium-ion batteries are the preferred green energy storage method and are equipped with intelligent battery management systems(BMSs)that efficiently manage the batteries.This not only ensures the safety performance of the batteries but also significantly improves their efficiency and reduces their damage rate.Throughout their whole life cycle,lithium-ion batteries undergo aging and performance degradation due to diverse external environments and irregular degradation of internal materials.This degradation is reflected in the state of health(SOH)assessment.Therefore,this review offers the first comprehensive analysis of battery SOH estimation strategies across the entire lifecycle over the past five years,highlighting common research focuses rooted in data-driven methods.It delves into various dimensions such as dataset integration and preprocessing,health feature parameter extraction,and the construction of SOH estimation models.These approaches unearth hidden insights within data,addressing the inherent tension between computational complexity and estimation accuracy.To enha nce support for in-vehicle implementation,cloud computing,and the echelon technologies of battery recycling,remanufacturing,and reuse,as well as to offer insights into these technologies,a segmented management approach will be introduced in the future.This will encompass source domain data processing,multi-feature factor reconfiguration,hybrid drive modeling,parameter correction mechanisms,and fulltime health management.Based on the best SOH estimation outcomes,health strategies tailored to different stages can be devised in the future,leading to the establishment of a comprehensive SOH assessment framework.This will mitigate cross-domain distribution disparities and facilitate adaptation to a broader array of dynamic operation protocols.This article reviews the current research landscape from four perspectives and discusses the challenges that lie ahead.Researchers and practitioners can gain a comprehensive understanding of battery SOH estimation methods,offering valuable insights for the development of advanced battery management systems and embedded application research.
基金supported by the National Natural Science Foundation of China(U21A20166)in part by the Science and Technology Development Foundation of Jilin Province (20230508095RC)+1 种基金in part by the Development and Reform Commission Foundation of Jilin Province (2023C034-3)in part by the Exploration Foundation of State Key Laboratory of Automotive Simulation and Control。
文摘Aiming at the tracking problem of a class of discrete nonaffine nonlinear multi-input multi-output(MIMO) repetitive systems subjected to separable and nonseparable disturbances, a novel data-driven iterative learning control(ILC) scheme based on the zeroing neural networks(ZNNs) is proposed. First, the equivalent dynamic linearization data model is obtained by means of dynamic linearization technology, which exists theoretically in the iteration domain. Then, the iterative extended state observer(IESO) is developed to estimate the disturbance and the coupling between systems, and the decoupled dynamic linearization model is obtained for the purpose of controller synthesis. To solve the zero-seeking tracking problem with inherent tolerance of noise,an ILC based on noise-tolerant modified ZNN is proposed. The strict assumptions imposed on the initialization conditions of each iteration in the existing ILC methods can be absolutely removed with our method. In addition, theoretical analysis indicates that the modified ZNN can converge to the exact solution of the zero-seeking tracking problem. Finally, a generalized example and an application-oriented example are presented to verify the effectiveness and superiority of the proposed process.
基金This work is funded by National Natural Science Foundation of China(Nos.42202292,42141011)the Program for Jilin University(JLU)Science and Technology Innovative Research Team(No.2019TD-35).The authors would also like to thank the reviewers and editors whose critical comments are very helpful in preparing this article.
文摘To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) sequestration models do not adequately consider multiple transport mechanisms.Moreover,the evaluation of CO_(2) storage processes usually involves laborious and time-consuming numerical simulations unsuitable for practical prediction and decision-making.In this paper,an integrated model involving gas diffusion,adsorption,dissolution,slip flow,and Darcy flow is proposed to accurately characterize CO_(2) storage in depleted shale reservoirs,supporting the establishment of a training database.On this basis,a hybrid physics-informed data-driven neural network(HPDNN)is developed as a deep learning surrogate for prediction and inversion.By incorporating multiple sources of scientific knowledge,the HPDNN can be configured with limited simulation resources,significantly accelerating the forward and inversion processes.Furthermore,the HPDNN can more intelligently predict injection performance,precisely perform reservoir parameter inversion,and reasonably evaluate the CO_(2) storage capacity under complicated scenarios.The validation and test results demonstrate that the HPDNN can ensure high accuracy and strong robustness across an extensive applicability range when dealing with field data with multiple noise sources.This study has tremendous potential to replace traditional modeling tools for predicting and making decisions about CO_(2) storage projects in depleted shale reservoirs.
基金Foundation of National Natural Science Foundation of China(62202118)Scientific and Technological Research Projects from Guizhou Education Department([2023]003)+1 种基金Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project(GCC[2023]018)Top Technology Talent Project from Guizhou Education Department([2022]073).
文摘The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.
基金supported by National Natural Sciences Foundation of China(No.62271165,62027802,62201307)the Guangdong Basic and Applied Basic Research Foundation(No.2023A1515030297)+2 种基金the Shenzhen Science and Technology Program ZDSYS20210623091808025Stable Support Plan Program GXWD20231129102638002the Major Key Project of PCL(No.PCL2024A01)。
文摘Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.