Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challen...Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.展开更多
The inter-agency government information sharing(IAGIS)plays an important role in improving service and efficiency of government agencies.Currently,there is still no effective and secure way for data-driven IAGIS to fu...The inter-agency government information sharing(IAGIS)plays an important role in improving service and efficiency of government agencies.Currently,there is still no effective and secure way for data-driven IAGIS to fulfill dynamic demands of information sharing between government agencies.Motivated by blockchain and data mining,a data-driven framework is proposed for IAGIS in this paper.Firstly,the blockchain is used as the core to design the whole framework for monitoring and preventing leakage and abuse of government information,in order to guarantee information security.Secondly,a four-layer architecture is designed for implementing the proposed framework.Thirdly,the classical data mining algorithms PageRank and Apriori are applied to dynamically design smart contracts for information sharing,for the purposed of flexibly adjusting the information sharing strategies according to the practical demands of government agencies for public management and public service.Finally,a case study is presented to illustrate the operation of the proposed framework.展开更多
Although big data is publicly available on water quality parameters,virtual simulation has not yet been adequately adapted in environmental chemistry research.Digital twin is different from conventional geospatial mod...Although big data is publicly available on water quality parameters,virtual simulation has not yet been adequately adapted in environmental chemistry research.Digital twin is different from conventional geospatial modeling approaches and is particularly useful when systematic laboratory/field experiment is not realistic(e.g.,climate impact and water-related environmental catastrophe)or difficult to design and monitor in a real time(e.g.,pollutant and nutrient cycles in estuaries,soils,and sediments).Data-driven water research could realize early warning and disaster readiness simulations for diverse environmental scenarios,including drinking water contamination.展开更多
Brain tissue is one of the softest parts of the human body,composed of white matter and grey matter.The mechanical behavior of the brain tissue plays an essential role in regulating brain morphology and brain function...Brain tissue is one of the softest parts of the human body,composed of white matter and grey matter.The mechanical behavior of the brain tissue plays an essential role in regulating brain morphology and brain function.Besides,traumatic brain injury(TBI)and various brain diseases are also greatly influenced by the brain's mechanical properties.Whether white matter or grey matter,brain tissue contains multiscale structures composed of neurons,glial cells,fibers,blood vessels,etc.,each with different mechanical properties.As such,brain tissue exhibits complex mechanical behavior,usually with strong nonlinearity,heterogeneity,and directional dependence.Building a constitutive law for multiscale brain tissue using traditional function-based approaches can be very challenging.Instead,this paper proposes a data-driven approach to establish the desired mechanical model of brain tissue.We focus on blood vessels with internal pressure embedded in a white or grey matter matrix material to demonstrate our approach.The matrix is described by an isotropic or anisotropic nonlinear elastic model.A representative unit cell(RUC)with blood vessels is built,which is used to generate the stress-strain data under different internal blood pressure and various proportional displacement loading paths.The generated stress-strain data is then used to train a mechanical law using artificial neural networks to predict the macroscopic mechanical response of brain tissue under different internal pressures.Finally,the trained material model is implemented into finite element software to predict the mechanical behavior of a whole brain under intracranial pressure and distributed body forces.Compared with a direct numerical simulation that employs a reference material model,our proposed approach greatly reduces the computational cost and improves modeling efficiency.The predictions made by our trained model demonstrate sufficient accuracy.Specifically,we find that the level of internal blood pressure can greatly influence stress distribution and determine the possible related damage behaviors.展开更多
Magnesium(Mg)is a promising alternative to lithium(Li)as an anode material in solid-state batteries due to its abundance and high theoretical volumetric capacity.However,the sluggish Mg-ion conduction in the lattice o...Magnesium(Mg)is a promising alternative to lithium(Li)as an anode material in solid-state batteries due to its abundance and high theoretical volumetric capacity.However,the sluggish Mg-ion conduction in the lattice of solidstate electrolytes(SSEs)is one of the key challenges that hamper the development of Mg-ion solid-state batteries.Though various Mg-ion SSEs have been reported in recent years,key insights are hard to be derived from a single literature report.Besides,the structure-performance relationships of Mg-ion SSEs need to be further unraveled to provide a more precise design guideline for SSEs.In this viewpoint article,we analyze the structural characteristics of the Mg-based SSEs with high ionic conductivity reported in the last four decades based upon data mining-we provide big-data-derived insights into the challenges and opportunities in developing next-generation Mg-ion SSEs.展开更多
Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The ...Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The performance of existing long-term navigation algorithm is limited by the cumulative error of inertial sensors, disturbed local magnetic field, and complex motion modes of the pedestrian. This paper develops a robust data and physical model dual-driven based trajectory estimation(DPDD-TE) framework, which can be applied for long-term navigation tasks. A Bi-directional Long Short-Term Memory(Bi-LSTM) based quasi-static magnetic field(QSMF) detection algorithm is developed for extracting useful magnetic observation for heading calibration, and another Bi-LSTM is adopted for walking speed estimation by considering hybrid human motion information under a specific time period. In addition, a data and physical model dual-driven based multi-source fusion model is proposed to integrate basic INS mechanization and multi-level constraint and observations for maintaining accuracy under long-term navigation tasks, and enhanced by the magnetic and trajectory features assisted loop detection algorithm. Real-world experiments indicate that the proposed DPDD-TE outperforms than existing algorithms, and final estimated heading and positioning accuracy indexes reaches 5° and less than 2 m under the time period of 30 min, respectively.展开更多
Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fis...Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fish and shellfish are responsible for more than 90%of food allergies.Here,we provide promising insights using a large-scale data-driven analysis,comparing the mechanistic feature and biological relevance of different ingredients presents in peanuts,tree nuts(walnuts,almonds,cashews,pecans and pistachios)and soybean.Additionally,we have analysed the chemical compositions of peanuts in different processed form raw,boiled and dry-roasted.Using the data-driven approach we are able to generate new hypotheses to explain why nuclear receptors like the peroxisome proliferator-activated receptors(PPARs)and its isoform and their interaction with dietary lipids may have significant effect on allergic response.The results obtained from this study will direct future experimeantal and clinical studies to understand the role of dietary lipids and PPARisoforms to exert pro-inflammatory or anti-inflammatory functions on cells of the innate immunity and influence antigen presentation to the cells of the adaptive immunity.展开更多
With the increased availability of experimental measurements aiming at probing wind resources and wind turbine operations,machine learning(ML)models are poised to advance our understanding of the physics underpinning ...With the increased availability of experimental measurements aiming at probing wind resources and wind turbine operations,machine learning(ML)models are poised to advance our understanding of the physics underpinning the interaction between the atmospheric boundary layer and wind turbine arrays,the generated wakes and their interactions,and wind energy harvesting.However,the majority of the existing ML models for predicting wind turbine wakes merely recreate Computational fluid dynamics(CFD)simulated data with analogous accuracy but reduced computational costs,thus providing surrogate models rather than enhanced data-enabled physics insights.Although ML-based surrogate models are useful to overcome current limitations associated with the high computational costs of CFD models,using ML to unveil processes from experimental data or enhance modeling capabilities is deemed a potential research direction to pursue.In this letter,we discuss recent achievements in the realm of ML modeling of wind turbine wakes and operations,along with new promising research strategies.展开更多
Cable-stayed bridges have been widely used in high-speed railway infrastructure.The accurate determination of cable’s representative temperatures is vital during the intricate processes of design,construction,and mai...Cable-stayed bridges have been widely used in high-speed railway infrastructure.The accurate determination of cable’s representative temperatures is vital during the intricate processes of design,construction,and maintenance of cable-stayed bridges.However,the representative temperatures of stayed cables are not specified in the existing design codes.To address this issue,this study investigates the distribution of the cable temperature and determinates its representative temperature.First,an experimental investigation,spanning over a period of one year,was carried out near the bridge site to obtain the temperature data.According to the statistical analysis of the measured data,it reveals that the temperature distribution is generally uniform along the cable cross-section without significant temperature gradient.Then,based on the limited data,the Monte Carlo,the gradient boosted regression trees(GBRT),and univariate linear regression(ULR)methods are employed to predict the cable’s representative temperature throughout the service life.These methods effectively overcome the limitations of insufficient monitoring data and accurately predict the representative temperature of the cables.However,each method has its own advantages and limitations in terms of applicability and accuracy.A comprehensive evaluation of the performance of these methods is conducted,and practical recommendations are provided for their application.The proposed methods and representative temperatures provide a good basis for the operation and maintenance of in-service long-span cable-stayed bridges.展开更多
Urban functional area(UFA)is a core scientific issue affecting urban sustainability.The current knowledge gap is mainly reflected in the lack of multi-scale quantitative interpretation methods from the perspective of ...Urban functional area(UFA)is a core scientific issue affecting urban sustainability.The current knowledge gap is mainly reflected in the lack of multi-scale quantitative interpretation methods from the perspective of human-land interaction.In this paper,based on multi-source big data include 250 m×250 m resolution cell phone data,1.81×105 Points of Interest(POI)data and administrative boundary data,we built a UFA identification method and demonstrated empirically in Shenyang City,China.We argue that the method we built can effectively identify multi-scale multi-type UFAs based on human activity and further reveal the spatial correlation between urban facilities and human activity.The empirical study suggests that the employment functional zones in Shenyang City are more concentrated in central cities than other single functional zones.There are more mix functional areas in the central city areas,while the planned industrial new cities need to develop comprehensive functions in Shenyang.UFAs have scale effects and human-land interaction patterns.We suggest that city decision makers should apply multi-sources big data to measure urban functional service in a more refined manner from a supply-demand perspective.展开更多
Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend...Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.展开更多
Using Louisiana’s Interstate system, this paper aims to demonstrate how data can be used to evaluate freight movement reliability, economy, and safety of truck freight operations to improve decision-making. Data main...Using Louisiana’s Interstate system, this paper aims to demonstrate how data can be used to evaluate freight movement reliability, economy, and safety of truck freight operations to improve decision-making. Data mainly from the National Performance Management Research Data Set (NPMRDS) and the Louisiana Crash Database were used to analyze Truck Travel Time Reliability Index, commercial vehicle User Delay Costs, and commercial vehicle safety. The results indicate that while Louisiana’s Interstate system remained reliable over the years, some segments were found to be unreliable, which were annually less than 12% of the state’s Interstate system mileage. The User Delay Costs by commercial vehicles on these unreliable segments were, on average, 65.45% of the User Delay Cost by all vehicles on the Interstate highway system between 2016 and 2019, 53.10% between 2020 and 2021, and 70.36% in 2022, which are considerably high. These disproportionate ratios indicate the economic impact of the unreliability of the Interstate system on commercial vehicle operations. Additionally, though the annual crash frequencies remained relatively constant, an increasing proportion of commercial vehicles are involved in crashes, with segments (mileposts) that have high crash frequencies seeming to correspond with locations with recurring congestion on the Interstate highway system. The study highlights the potential of using data to identify areas that need improvement in transportation systems to support better decision-making.展开更多
With the advent of digital therapeutics(DTx),the development of software as a medical device(SaMD)for mobile and wearable devices has gained significant attention in recent years.Existing DTx evaluations,such as rando...With the advent of digital therapeutics(DTx),the development of software as a medical device(SaMD)for mobile and wearable devices has gained significant attention in recent years.Existing DTx evaluations,such as randomized clinical trials,mostly focus on verifying the effectiveness of DTx products.To acquire a deeper understanding of DTx engagement and behavioral adherence,beyond efficacy,a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis.In this work,the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets,to investigate contextual patterns associated with DTx usage,and to establish the(causal)relationship between DTx engagement and behavioral adherence.This review of the key components of datadriven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets,which helps to iteratively improve the receptivity of existing DTx.展开更多
With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapi...With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende...Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.展开更多
Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary w...Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys.展开更多
Prediction and diagnosis of cardiovascular diseases(CVDs)based,among other things,on medical examinations and patient symptoms are the biggest challenges in medicine.About 17.9 million people die from CVDs annually,ac...Prediction and diagnosis of cardiovascular diseases(CVDs)based,among other things,on medical examinations and patient symptoms are the biggest challenges in medicine.About 17.9 million people die from CVDs annually,accounting for 31%of all deaths worldwide.With a timely prognosis and thorough consideration of the patient’s medical history and lifestyle,it is possible to predict CVDs and take preventive measures to eliminate or control this life-threatening disease.In this study,we used various patient datasets from a major hospital in the United States as prognostic factors for CVD.The data was obtained by monitoring a total of 918 patients whose criteria for adults were 28-77 years old.In this study,we present a data mining modeling approach to analyze the performance,classification accuracy and number of clusters on Cardiovascular Disease Prognostic datasets in unsupervised machine learning(ML)using the Orange data mining software.Various techniques are then used to classify the model parameters,such as k-nearest neighbors,support vector machine,random forest,artificial neural network(ANN),naïve bayes,logistic regression,stochastic gradient descent(SGD),and AdaBoost.To determine the number of clusters,various unsupervised ML clustering methods were used,such as k-means,hierarchical,and density-based spatial clustering of applications with noise clustering.The results showed that the best model performance analysis and classification accuracy were SGD and ANN,both of which had a high score of 0.900 on Cardiovascular Disease Prognostic datasets.Based on the results of most clustering methods,such as k-means and hierarchical clustering,Cardiovascular Disease Prognostic datasets can be divided into two clusters.The prognostic accuracy of CVD depends on the accuracy of the proposed model in determining the diagnostic model.The more accurate the model,the better it can predict which patients are at risk for CVD.展开更多
There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction...There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction temperature estimation approach based on neural network without additional cost is proposed and the lifetime calculation for IGBT using electric vehicle big data is performed.The direct current(DC)voltage,operation current,switching frequency,negative thermal coefficient thermistor(NTC)temperature and IGBT lifetime are inputs.And the junction temperature(T_(j))is output.With the rain flow counting method,the classified irregular temperatures are brought into the life model for the failure cycles.The fatigue accumulation method is then used to calculate the IGBT lifetime.To solve the limited computational and storage resources of electric vehicle controllers,the operation of IGBT lifetime calculation is running on a big data platform.The lifetime is then transmitted wirelessly to electric vehicles as input for neural network.Thus the junction temperature of IGBT under long-term operating conditions can be accurately estimated.A test platform of the motor controller combined with the vehicle big data server is built for the IGBT accelerated aging test.Subsequently,the IGBT lifetime predictions are derived from the junction temperature estimation by the neural network method and the thermal network method.The experiment shows that the lifetime prediction based on a neural network with big data demonstrates a higher accuracy than that of the thermal network,which improves the reliability evaluation of system.展开更多
基金support from the Cyber Technology Institute(CTI)at the School of Computer Science and Informatics,De Montfort University,United Kingdom,along with financial assistance from Universiti Tun Hussein Onn Malaysia and the UTHM Publisher’s office through publication fund E15216.
文摘Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.
基金Supported by the Project of Guangdong Science and Technology Department(2020B010166005)the Post-Doctoral Research Project(Z000158)+2 种基金the Ministry of Education Social Science Fund(22YJ630167)the Fund project of Department of Science and Technology of Guangdong Province(GDK TP2021032500)the Guangdong Philosophy and Social Science(GD22YYJ15).
文摘The inter-agency government information sharing(IAGIS)plays an important role in improving service and efficiency of government agencies.Currently,there is still no effective and secure way for data-driven IAGIS to fulfill dynamic demands of information sharing between government agencies.Motivated by blockchain and data mining,a data-driven framework is proposed for IAGIS in this paper.Firstly,the blockchain is used as the core to design the whole framework for monitoring and preventing leakage and abuse of government information,in order to guarantee information security.Secondly,a four-layer architecture is designed for implementing the proposed framework.Thirdly,the classical data mining algorithms PageRank and Apriori are applied to dynamically design smart contracts for information sharing,for the purposed of flexibly adjusting the information sharing strategies according to the practical demands of government agencies for public management and public service.Finally,a case study is presented to illustrate the operation of the proposed framework.
文摘Although big data is publicly available on water quality parameters,virtual simulation has not yet been adequately adapted in environmental chemistry research.Digital twin is different from conventional geospatial modeling approaches and is particularly useful when systematic laboratory/field experiment is not realistic(e.g.,climate impact and water-related environmental catastrophe)or difficult to design and monitor in a real time(e.g.,pollutant and nutrient cycles in estuaries,soils,and sediments).Data-driven water research could realize early warning and disaster readiness simulations for diverse environmental scenarios,including drinking water contamination.
文摘Brain tissue is one of the softest parts of the human body,composed of white matter and grey matter.The mechanical behavior of the brain tissue plays an essential role in regulating brain morphology and brain function.Besides,traumatic brain injury(TBI)and various brain diseases are also greatly influenced by the brain's mechanical properties.Whether white matter or grey matter,brain tissue contains multiscale structures composed of neurons,glial cells,fibers,blood vessels,etc.,each with different mechanical properties.As such,brain tissue exhibits complex mechanical behavior,usually with strong nonlinearity,heterogeneity,and directional dependence.Building a constitutive law for multiscale brain tissue using traditional function-based approaches can be very challenging.Instead,this paper proposes a data-driven approach to establish the desired mechanical model of brain tissue.We focus on blood vessels with internal pressure embedded in a white or grey matter matrix material to demonstrate our approach.The matrix is described by an isotropic or anisotropic nonlinear elastic model.A representative unit cell(RUC)with blood vessels is built,which is used to generate the stress-strain data under different internal blood pressure and various proportional displacement loading paths.The generated stress-strain data is then used to train a mechanical law using artificial neural networks to predict the macroscopic mechanical response of brain tissue under different internal pressures.Finally,the trained material model is implemented into finite element software to predict the mechanical behavior of a whole brain under intracranial pressure and distributed body forces.Compared with a direct numerical simulation that employs a reference material model,our proposed approach greatly reduces the computational cost and improves modeling efficiency.The predictions made by our trained model demonstrate sufficient accuracy.Specifically,we find that the level of internal blood pressure can greatly influence stress distribution and determine the possible related damage behaviors.
基金supported by the Ensemble Grant for Early Career Researchers 2022-2023 and the 2023 Ensemble Continuation Grant of Tohoku University,the Hirose Foundation,and the AIMR Fusion Research Grantsupported by JSPS KAKENHI Nos.JP23K13599,JP23K13703,JP22H01803,JP18H05513,and JP23K13542.F.Y.and Q.W.acknowledge the China Scholarship Council(CSC)to support their studies in Japan.
文摘Magnesium(Mg)is a promising alternative to lithium(Li)as an anode material in solid-state batteries due to its abundance and high theoretical volumetric capacity.However,the sluggish Mg-ion conduction in the lattice of solidstate electrolytes(SSEs)is one of the key challenges that hamper the development of Mg-ion solid-state batteries.Though various Mg-ion SSEs have been reported in recent years,key insights are hard to be derived from a single literature report.Besides,the structure-performance relationships of Mg-ion SSEs need to be further unraveled to provide a more precise design guideline for SSEs.In this viewpoint article,we analyze the structural characteristics of the Mg-based SSEs with high ionic conductivity reported in the last four decades based upon data mining-we provide big-data-derived insights into the challenges and opportunities in developing next-generation Mg-ion SSEs.
文摘Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The performance of existing long-term navigation algorithm is limited by the cumulative error of inertial sensors, disturbed local magnetic field, and complex motion modes of the pedestrian. This paper develops a robust data and physical model dual-driven based trajectory estimation(DPDD-TE) framework, which can be applied for long-term navigation tasks. A Bi-directional Long Short-Term Memory(Bi-LSTM) based quasi-static magnetic field(QSMF) detection algorithm is developed for extracting useful magnetic observation for heading calibration, and another Bi-LSTM is adopted for walking speed estimation by considering hybrid human motion information under a specific time period. In addition, a data and physical model dual-driven based multi-source fusion model is proposed to integrate basic INS mechanization and multi-level constraint and observations for maintaining accuracy under long-term navigation tasks, and enhanced by the magnetic and trajectory features assisted loop detection algorithm. Real-world experiments indicate that the proposed DPDD-TE outperforms than existing algorithms, and final estimated heading and positioning accuracy indexes reaches 5° and less than 2 m under the time period of 30 min, respectively.
文摘Peanut allergy is majorly related to severe food induced allergic reactions.Several food including cow's milk,hen's eggs,soy,wheat,peanuts,tree nuts(walnuts,hazelnuts,almonds,cashews,pecans and pistachios),fish and shellfish are responsible for more than 90%of food allergies.Here,we provide promising insights using a large-scale data-driven analysis,comparing the mechanistic feature and biological relevance of different ingredients presents in peanuts,tree nuts(walnuts,almonds,cashews,pecans and pistachios)and soybean.Additionally,we have analysed the chemical compositions of peanuts in different processed form raw,boiled and dry-roasted.Using the data-driven approach we are able to generate new hypotheses to explain why nuclear receptors like the peroxisome proliferator-activated receptors(PPARs)and its isoform and their interaction with dietary lipids may have significant effect on allergic response.The results obtained from this study will direct future experimeantal and clinical studies to understand the role of dietary lipids and PPARisoforms to exert pro-inflammatory or anti-inflammatory functions on cells of the innate immunity and influence antigen presentation to the cells of the adaptive immunity.
基金supported by the National Science Foundation(NSF)CBET,Fluid Dynamics CAREER program(Grant No.2046160),program manager Ron Joslin.
文摘With the increased availability of experimental measurements aiming at probing wind resources and wind turbine operations,machine learning(ML)models are poised to advance our understanding of the physics underpinning the interaction between the atmospheric boundary layer and wind turbine arrays,the generated wakes and their interactions,and wind energy harvesting.However,the majority of the existing ML models for predicting wind turbine wakes merely recreate Computational fluid dynamics(CFD)simulated data with analogous accuracy but reduced computational costs,thus providing surrogate models rather than enhanced data-enabled physics insights.Although ML-based surrogate models are useful to overcome current limitations associated with the high computational costs of CFD models,using ML to unveil processes from experimental data or enhance modeling capabilities is deemed a potential research direction to pursue.In this letter,we discuss recent achievements in the realm of ML modeling of wind turbine wakes and operations,along with new promising research strategies.
基金Project(2017G006-N)supported by the Project of Science and Technology Research and Development Program of China Railway Corporation。
文摘Cable-stayed bridges have been widely used in high-speed railway infrastructure.The accurate determination of cable’s representative temperatures is vital during the intricate processes of design,construction,and maintenance of cable-stayed bridges.However,the representative temperatures of stayed cables are not specified in the existing design codes.To address this issue,this study investigates the distribution of the cable temperature and determinates its representative temperature.First,an experimental investigation,spanning over a period of one year,was carried out near the bridge site to obtain the temperature data.According to the statistical analysis of the measured data,it reveals that the temperature distribution is generally uniform along the cable cross-section without significant temperature gradient.Then,based on the limited data,the Monte Carlo,the gradient boosted regression trees(GBRT),and univariate linear regression(ULR)methods are employed to predict the cable’s representative temperature throughout the service life.These methods effectively overcome the limitations of insufficient monitoring data and accurately predict the representative temperature of the cables.However,each method has its own advantages and limitations in terms of applicability and accuracy.A comprehensive evaluation of the performance of these methods is conducted,and practical recommendations are provided for their application.The proposed methods and representative temperatures provide a good basis for the operation and maintenance of in-service long-span cable-stayed bridges.
基金Under the auspices of Natural Science Foundation of China(No.41971166)。
文摘Urban functional area(UFA)is a core scientific issue affecting urban sustainability.The current knowledge gap is mainly reflected in the lack of multi-scale quantitative interpretation methods from the perspective of human-land interaction.In this paper,based on multi-source big data include 250 m×250 m resolution cell phone data,1.81×105 Points of Interest(POI)data and administrative boundary data,we built a UFA identification method and demonstrated empirically in Shenyang City,China.We argue that the method we built can effectively identify multi-scale multi-type UFAs based on human activity and further reveal the spatial correlation between urban facilities and human activity.The empirical study suggests that the employment functional zones in Shenyang City are more concentrated in central cities than other single functional zones.There are more mix functional areas in the central city areas,while the planned industrial new cities need to develop comprehensive functions in Shenyang.UFAs have scale effects and human-land interaction patterns.We suggest that city decision makers should apply multi-sources big data to measure urban functional service in a more refined manner from a supply-demand perspective.
文摘Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.
文摘Using Louisiana’s Interstate system, this paper aims to demonstrate how data can be used to evaluate freight movement reliability, economy, and safety of truck freight operations to improve decision-making. Data mainly from the National Performance Management Research Data Set (NPMRDS) and the Louisiana Crash Database were used to analyze Truck Travel Time Reliability Index, commercial vehicle User Delay Costs, and commercial vehicle safety. The results indicate that while Louisiana’s Interstate system remained reliable over the years, some segments were found to be unreliable, which were annually less than 12% of the state’s Interstate system mileage. The User Delay Costs by commercial vehicles on these unreliable segments were, on average, 65.45% of the User Delay Cost by all vehicles on the Interstate highway system between 2016 and 2019, 53.10% between 2020 and 2021, and 70.36% in 2022, which are considerably high. These disproportionate ratios indicate the economic impact of the unreliability of the Interstate system on commercial vehicle operations. Additionally, though the annual crash frequencies remained relatively constant, an increasing proportion of commercial vehicles are involved in crashes, with segments (mileposts) that have high crash frequencies seeming to correspond with locations with recurring congestion on the Interstate highway system. The study highlights the potential of using data to identify areas that need improvement in transportation systems to support better decision-making.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Korea government(MSIT)(2020R1A4A1018774)。
文摘With the advent of digital therapeutics(DTx),the development of software as a medical device(SaMD)for mobile and wearable devices has gained significant attention in recent years.Existing DTx evaluations,such as randomized clinical trials,mostly focus on verifying the effectiveness of DTx products.To acquire a deeper understanding of DTx engagement and behavioral adherence,beyond efficacy,a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis.In this work,the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets,to investigate contextual patterns associated with DTx usage,and to establish the(causal)relationship between DTx engagement and behavioral adherence.This review of the key components of datadriven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets,which helps to iteratively improve the receptivity of existing DTx.
基金supported by China’s National Natural Science Foundation(Nos.62072249,62072056)This work is also funded by the National Science Foundation of Hunan Province(2020JJ2029).
文摘With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
基金This research was financially supported by the Ministry of Trade,Industry,and Energy(MOTIE),Korea,under the“Project for Research and Development with Middle Markets Enterprises and DNA(Data,Network,AI)Universities”(AI-based Safety Assessment and Management System for Concrete Structures)(ReferenceNumber P0024559)supervised by theKorea Institute for Advancement of Technology(KIAT).
文摘Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.
基金Korea Institute of Energy Technology Evaluation and Planning(KETEP)grant funded by the Korea government(Grant No.20214000000140,Graduate School of Convergence for Clean Energy Integrated Power Generation)Korea Basic Science Institute(National Research Facilities and Equipment Center)grant funded by the Ministry of Education(2021R1A6C101A449)the National Research Foundation of Korea grant funded by the Ministry of Science and ICT(2021R1A2C1095139),Republic of Korea。
文摘Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys.
文摘Prediction and diagnosis of cardiovascular diseases(CVDs)based,among other things,on medical examinations and patient symptoms are the biggest challenges in medicine.About 17.9 million people die from CVDs annually,accounting for 31%of all deaths worldwide.With a timely prognosis and thorough consideration of the patient’s medical history and lifestyle,it is possible to predict CVDs and take preventive measures to eliminate or control this life-threatening disease.In this study,we used various patient datasets from a major hospital in the United States as prognostic factors for CVD.The data was obtained by monitoring a total of 918 patients whose criteria for adults were 28-77 years old.In this study,we present a data mining modeling approach to analyze the performance,classification accuracy and number of clusters on Cardiovascular Disease Prognostic datasets in unsupervised machine learning(ML)using the Orange data mining software.Various techniques are then used to classify the model parameters,such as k-nearest neighbors,support vector machine,random forest,artificial neural network(ANN),naïve bayes,logistic regression,stochastic gradient descent(SGD),and AdaBoost.To determine the number of clusters,various unsupervised ML clustering methods were used,such as k-means,hierarchical,and density-based spatial clustering of applications with noise clustering.The results showed that the best model performance analysis and classification accuracy were SGD and ANN,both of which had a high score of 0.900 on Cardiovascular Disease Prognostic datasets.Based on the results of most clustering methods,such as k-means and hierarchical clustering,Cardiovascular Disease Prognostic datasets can be divided into two clusters.The prognostic accuracy of CVD depends on the accuracy of the proposed model in determining the diagnostic model.The more accurate the model,the better it can predict which patients are at risk for CVD.
文摘There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction temperature estimation approach based on neural network without additional cost is proposed and the lifetime calculation for IGBT using electric vehicle big data is performed.The direct current(DC)voltage,operation current,switching frequency,negative thermal coefficient thermistor(NTC)temperature and IGBT lifetime are inputs.And the junction temperature(T_(j))is output.With the rain flow counting method,the classified irregular temperatures are brought into the life model for the failure cycles.The fatigue accumulation method is then used to calculate the IGBT lifetime.To solve the limited computational and storage resources of electric vehicle controllers,the operation of IGBT lifetime calculation is running on a big data platform.The lifetime is then transmitted wirelessly to electric vehicles as input for neural network.Thus the junction temperature of IGBT under long-term operating conditions can be accurately estimated.A test platform of the motor controller combined with the vehicle big data server is built for the IGBT accelerated aging test.Subsequently,the IGBT lifetime predictions are derived from the junction temperature estimation by the neural network method and the thermal network method.The experiment shows that the lifetime prediction based on a neural network with big data demonstrates a higher accuracy than that of the thermal network,which improves the reliability evaluation of system.