Spatio-temporal heterogeneous data is the database for decisionmaking in many fields,and checking its accuracy can provide data support for making decisions.Due to the randomness,complexity,global and local correlatio...Spatio-temporal heterogeneous data is the database for decisionmaking in many fields,and checking its accuracy can provide data support for making decisions.Due to the randomness,complexity,global and local correlation of spatiotemporal heterogeneous data in the temporal and spatial dimensions,traditional detection methods can not guarantee both detection speed and accuracy.Therefore,this article proposes a method for detecting the accuracy of spatiotemporal heterogeneous data by fusing graph convolution and temporal convolution networks.Firstly,the geographic weighting function is introduced and improved to quantify the degree of association between nodes and calculate the weighted adjacency value to simplify the complex topology.Secondly,design spatiotemporal convolutional units based on graph convolutional neural networks and temporal convolutional networks to improve detection speed and accuracy.Finally,the proposed method is compared with three methods,ARIMA,T-GCN,and STGCN,in real scenarios to verify its effectiveness in terms of detection speed,detection accuracy and stability.The experimental results show that the RMSE,MAE,and MAPE of this method are the smallest in the cases of simple connectivity and complex connectivity degree,which are 13.82/12.08,2.77/2.41,and 16.70/14.73,respectively.Also,it detects the shortest time of 672.31/887.36,respectively.In addition,the evaluation results are the same under different time periods of processing and complex topology environment,which indicates that the detection accuracy of this method is the highest and has good research value and application prospects.展开更多
A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow acc...A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow accurately.However,accurately predicting traffic flow at the individual road level is extremely difficult due to the complex interplay of spatial and temporal factors.This paper proposes a technique for predicting short-term traffic flow data using an architecture that utilizes convolutional bidirectional long short-term memory(Conv-BiLSTM)with attention mechanisms.Prior studies neglected to include data pertaining to factors such as holidays,weather conditions,and vehicle types,which are interconnected and significantly impact the accuracy of forecast outcomes.In addition,this research incorporates recurring monthly periodic pattern data that significantly enhances the accuracy of forecast outcomes.The experimental findings demonstrate a performance improvement of 21.68%when incorporating the vehicle type feature.展开更多
The power Internet of Things(IoT)is a significant trend in technology and a requirement for national strategic development.With the deepening digital transformation of the power grid,China’s power system has initiall...The power Internet of Things(IoT)is a significant trend in technology and a requirement for national strategic development.With the deepening digital transformation of the power grid,China’s power system has initially built a power IoT architecture comprising a perception,network,and platform application layer.However,owing to the structural complexity of the power system,the construction of the power IoT continues to face problems such as complex access management of massive heterogeneous equipment,diverse IoT protocol access methods,high concurrency of network communications,and weak data security protection.To address these issues,this study optimizes the existing architecture of the power IoT and designs an integrated management framework for the access of multi-source heterogeneous data in the power IoT,comprising cloud,pipe,edge,and terminal parts.It further reviews and analyzes the key technologies involved in the power IoT,such as the unified management of the physical model,high concurrent access,multi-protocol access,multi-source heterogeneous data storage management,and data security control,to provide a more flexible,efficient,secure,and easy-to-use solution for multi-source heterogeneous data access in the power IoT.展开更多
Predicting traffic flow is a crucial component of an intelligent transportation system.Precisely monitoring and predicting traffic flow remains a challenging endeavor.However,existingmethods for predicting traffic flo...Predicting traffic flow is a crucial component of an intelligent transportation system.Precisely monitoring and predicting traffic flow remains a challenging endeavor.However,existingmethods for predicting traffic flow do not incorporate various external factors or consider the spatiotemporal correlation between spatially adjacent nodes,resulting in the loss of essential information and lower forecast performance.On the other hand,the availability of spatiotemporal data is limited.This research offers alternative spatiotemporal data with three specific features as input,vehicle type(5 types),holidays(3 types),and weather(10 conditions).In this study,the proposed model combines the advantages of the capability of convolutional(CNN)layers to extract valuable information and learn the internal representation of time-series data that can be interpreted as an image,as well as the efficiency of long short-term memory(LSTM)layers for identifying short-term and long-term dependencies.Our approach may utilize the heterogeneous spatiotemporal correlation features of the traffic flowdataset to deliver better performance traffic flow prediction than existing deep learning models.The research findings show that adding spatiotemporal feature data increases the forecast’s performance;weather by 25.85%,vehicle type by 23.70%,and holiday by 14.02%.展开更多
Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when ...Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.展开更多
To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing ...To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.展开更多
Federated learning(FL) is a machine learning paradigm for data silos and privacy protection,which aims to organize multiple clients for training global machine learning models without exposing data to all parties.Howe...Federated learning(FL) is a machine learning paradigm for data silos and privacy protection,which aims to organize multiple clients for training global machine learning models without exposing data to all parties.However,when dealing with non-independently identically distributed(non-ⅡD) client data,FL cannot obtain more satisfactory results than centrally trained machine learning and even fails to match the accuracy of the local model obtained by client training alone.To analyze and address the above issues,we survey the state-of-theart methods in the literature related to FL on non-ⅡD data.On this basis,a motivation-based taxonomy,which classifies these methods into two categories,including heterogeneity reducing strategies and adaptability enhancing strategies,is proposed.Moreover,the core ideas and main challenges of these methods are analyzed.Finally,we envision several promising research directions that have not been thoroughly studied,in hope of promoting research in related fields to a certain extent.展开更多
Data-driven methods are widely considered for fault diagnosis in complex systems.However,in practice,the between-class imbalance due to limited faulty samples may deteriorate their classification performance.To addres...Data-driven methods are widely considered for fault diagnosis in complex systems.However,in practice,the between-class imbalance due to limited faulty samples may deteriorate their classification performance.To address this issue,synthetic minority methods for enhancing data have been proved to be effective in many applications.Generative adversarial networks(GANs),capable of automatic features extraction,can also be adopted for augmenting the faulty samples.However,the monitoring data of a complex system may include not only continuous signals but also discrete/categorical signals.Since the current GAN methods still have some challenges in handling such heterogeneous monitoring data,a Mixed Dual Discriminator GAN(noted as M-D2GAN)is proposed in this work.In order to render the expanded fault samples more aligned with the real situation and improve the accuracy and robustness of the fault diagnosis model,different types of variables are generated in different ways,including floating-point,integer,categorical,and hierarchical.For effectively considering the class imbalance problem,proper modifications are made to the GAN model,where a normal class discriminator is added.A practical case study concerning the braking system of a high-speed train is carried out to verify the effectiveness of the proposed framework.Compared to the classic GAN,the proposed framework achieves better results with respect to F-measure and G-mean metrics.展开更多
With the rapid advancements in edge computing and artificial intelligence,federated learning(FL)has gained momentum as a promising approach to collaborative data utilization across organizations and devices,while ensu...With the rapid advancements in edge computing and artificial intelligence,federated learning(FL)has gained momentum as a promising approach to collaborative data utilization across organizations and devices,while ensuring data privacy and information security.In order to further harness the energy efficiency of wireless networks,an integrated sensing,communication and computation(ISCC)framework has been proposed,which is anticipated to be a key enabler in the era of 6G networks.Although the advantages of pushing intelligence to edge devices are multi-fold,some challenges arise when incorporating FL into wireless networks under the umbrella of ISCC.This paper provides a comprehensive survey of FL,with special emphasis on the design and optimization of ISCC.We commence by introducing the background and fundamentals of FL and the ISCC framework.Subsequently,the aforementioned challenges are highlighted and the state of the art in potential solutions is reviewed.Finally,design guidelines are provided for the incorporation of FL and ISCC.Overall,this paper aims to contribute to the understanding of FL in the context of wireless networks,with a focus on the ISCC framework,and provide insights into addressing the challenges and optimizing the design for the integration of FL into future 6G networks.展开更多
Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full ...Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full use of both integrated and distributed loads,a modeling paradigm,called the heterogeneous data-driven aerodynamic modeling,is presented.The essential concept is to incorporate the physical information of distributed loads as additional constraints within the end-to-end aerodynamic modeling.Towards heterogenous data,a novel and easily applicable physical feature embedding modeling framework is designed.This framework extracts lowdimensional physical features from pressure distribution and then effectively enhances the modeling of the integrated loads via feature embedding.The proposed framework can be coupled with multiple feature extraction methods,and the well-performed generalization capabilities over different airfoils are verified through a transonic case.Compared with traditional direct modeling,the proposed framework can reduce testing errors by almost 50%.Given the same prediction accuracy,it can save more than half of the training samples.Furthermore,the visualization analysis has revealed a significant correlation between the discovered low-dimensional physical features and the heterogeneous aerodynamic loads,which shows the interpretability and credibility of the superior performance offered by the proposed deep learning framework.展开更多
Structural change in panel data is a widespread phenomena. This paper proposes a fluctuation test to detect a structural change at an unknown date in heterogeneous panel data models with or without common correlated e...Structural change in panel data is a widespread phenomena. This paper proposes a fluctuation test to detect a structural change at an unknown date in heterogeneous panel data models with or without common correlated effects. The asymptotic properties of the fluctuation statistics in two cases are developed under the null and local alternative hypothesis. Furthermore, the consistency of the change point estimator is proven. Monte Carlo simulation shows that the fluctuation test can control the probability of type I error in most cases, and the empirical power is high in case of small and moderate sample sizes. An application of the procedure to a real data is presented.展开更多
Federated learning(FL)allows data owners to train neural networks together without sharing local data,allowing the industrial Internet of Things(IIoT)to share a variety of data.However,traditional FL frameworks suffer...Federated learning(FL)allows data owners to train neural networks together without sharing local data,allowing the industrial Internet of Things(IIoT)to share a variety of data.However,traditional FL frameworks suffer from data heterogeneity and outdated models.To address these issues,this paper proposes a dualblockchain based multi-layer grouping federated learning(BMFL)architecture.BMFL divides the participant groups based on the training tasks,then realizes the model training by combining synchronous and asynchronous FL through the multi-layer grouping structure,and uses the model blockchain to record the characteristic tags of the global model,allowing group-manners to extract the model based on the feature requirements and solving the problem of data heterogeneity.In addition,to protect the privacy of the model gradient parameters and manage the key,the global model is stored in ciphertext,and the chameleon hash algorithm is used to perform the modification and management of the encrypted key on the key blockchain while keeping the block header hash unchanged.Finally,we evaluate the performance of BMFL on different public datasets and verify the practicality of the scheme with real fault datasets.The experimental results show that the proposed BMFL exhibits more stable and accurate convergence behavior than the classic FL algorithm,and the key revocation overhead time is reasonable.展开更多
Alpha-synuclein plays an important role in Parkinson's disease(PD).The current study of alpha-synuclein mainly concentrates at the gene level.However, it is found that the study at the protein level has special si...Alpha-synuclein plays an important role in Parkinson's disease(PD).The current study of alpha-synuclein mainly concentrates at the gene level.However, it is found that the study at the protein level has special significance.Meanwhile, there is free information on the Internet, such as databases and algorithms of protein-protein interactions(PPIs).In this paper, a novel method which integrates distributed heterogeneous data sources and algorithms to predict PPIs for alpha-synuclein in silico is proposed.The PPIs generated by the method take advantage of various experimental data, and indicate new information about PPIs for alpha-synuclein.In the end of this paper, the result illustrates that the method is practical.It is hoped that the prediction result obtained by this method can provide guidance for biological experiments of PPIs for alpha-synuclein to reveal possible mechanisms of PD.展开更多
The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information...The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information Retrieval(IR)systems.The Semantic Web(SW)can solve this issue by integrating data into a single platform for information exchange and knowledge retrieval.This paper focuses on exploiting the SWbase systemto provide interoperability through ontologies by combining the data concepts with ontology classes.This paper presents a 4-phase weather data model:data processing,ontology creation,SW processing,and query engine.The developed Oceanographic Weather Ontology helps to enhance data analysis,discovery,IR,and decision making.In addition to that,it also evaluates the developed ontology with other state-of-the-art ontologies.The proposed ontology’s quality has improved by 39.28%in terms of completeness,and structural complexity has decreased by 45.29%,11%and 37.7%in Precision and Accuracy.Indian Meteorological Satellite INSAT-3D’s ocean data is a typical example of testing the proposed model.The experimental result shows the effectiveness of the proposed data model and its advantages in machine understanding and IR.展开更多
OpenStreetMap has a large number of volunteers.There is a hypothesis that volunteers with different cultural backgrounds may have different editing behaviors when contributing to OSM.It may be strongly related to data...OpenStreetMap has a large number of volunteers.There is a hypothesis that volunteers with different cultural backgrounds may have different editing behaviors when contributing to OSM.It may be strongly related to data quality and data reliability on OSM.As for the heterogeneity and the reliability of OSM data,previous research usually focuses on the geometric accuracy,spatial location accuracy and semantic integrity of OSM data,while few researchers have analyzed these problems from the perspective of editing behavior.On the grounds of relationship between mapping motivation and editing behavior,the dispersion of editing trajectory and clockwise direction index are proposed in the paper to explore whether the volunteers are sufficiently motivated and knowledgeable.In the experiments,the historical OSM data of four countries suggested that developed countries have lower trajectory dispersion.The lower degree of trajectory dispersion reflects the higher concentration and professionalism of volunteers.A high degree of drawing direction consistency shows volunteers who mapped French data were natives with local knowledge.From this point of view,this paper verifies that volunteer editing behavior is an effective method to analyze data quality heterogeneity and data reliability.展开更多
Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collect...Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collected,and further analyzed to discover device maintenance and health related potential knowledge behind.IoT data-based fault diagnosis for industrial devices is very helpful to the sustainability and applicability of an IoT ecosystem.But how to efficiently use and fuse this multimodal heterogeneous data to realize intelligent fault diagnosis is still a challenge.In this paper,a novel Deep Multimodal Learning and Fusion(DMLF)based fault diagnosis method is proposed for addressing heterogeneous data from IoT environments where industrial devices coexist.First,a DMLF model is designed by combining a Convolution Neural Network(CNN)and Stacked Denoising Autoencoder(SDAE)together to capture more comprehensive fault knowledge and extract features from different modal data.Second,these multimodal features are seamlessly integrated at a fusion layer and the resulting fused features are further used to train a classifier for recognizing potential faults.Third,a two-stage training algorithm is proposed by combining supervised pre-training and fine-tuning to simplify the training process for deep structure models.A series of experiments are conducted over multimodal heterogeneous data from a gear device to verify our proposed fault diagnosis method.The experimental results show that our method outperforms the benchmarking ones in fault diagnosis accuracy.展开更多
Detailed information on the spatio-temporal changes of cropland soil organic carbon(SOC) can significantly contribute to the improvement of soil fertility and mitigate climate change. Nonetheless, information and know...Detailed information on the spatio-temporal changes of cropland soil organic carbon(SOC) can significantly contribute to the improvement of soil fertility and mitigate climate change. Nonetheless, information and knowledge on the national scale spatio-temporal changes and the corresponding uncertainties of SOC in Chinese upland soils remain limited. The CENTURY model was used to estimate the SOC storages and their changes in Chinese uplands from 1980 to 2010. With the Monte Carlo method, the uncertainties of CENTURY-modelled SOC dynamics associated with the spatial heterogeneous model inputs were quantified. Results revealed that the SOC storage in Chinese uplands increased from 3.03(1.59 to 4.78) Pg C in 1980 to 3.40(2.39 to 4.62) Pg C in 2010. Increment of SOC storage during this period was 370 Tg C, with an uncertainty interval of –440 to 1110 Tg C. The regional disparities of SOC changes reached a significant level, with considerable SOC accumulation in the Huang-Huai-Hai Plain of China and SOC loss in the northeastern China. The SOC lost from Meadow soils, Black soils and Chernozems was most severe, whilst SOC accumulation in Fluvo-aquic soils, Cinnamon soils and Purplish soils was most significant. In modelling large-scale SOC dynamics, the initial soil properties were major sources of uncertainty. Hence, more detailed information concerning the soil properties must be collected. The SOC stock of Chinese uplands in 2010 was still relatively low, manifesting that recommended agricultural management practices in conjunction with effectively economic and policy incentives to farmers for soil fertility improvement were indispensable for future carbon sequestration in these regions.展开更多
With the growing awareness of data privacy,federated learning(FL)has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind,which allows building models in ...With the growing awareness of data privacy,federated learning(FL)has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind,which allows building models in a collaborative but private way without exchanging data.However,most FL clients are currently unimodal.With the rise of edge computing,various types of sensors and wearable devices generate a large amount of data from different modalities,which has inspired research efforts in multimodal federated learning(MMFL).In this survey,we explore the area of MMFL to address the fundamental challenges of FL on multimodal data.First,we analyse the key motivations for MMFL.Second,the currently proposed MMFL methods are technically classified according to the modality distributions and modality annotations in MMFL.Then,we discuss the datasets and application scenarios of MMFL.Finally,we highlight the limitations and challenges of MMFL and provide insights and methods for future research.展开更多
The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational d...The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships.展开更多
A heterogeneous wireless sensor network comprises a number of inexpensive energy constrained wireless sensor nodes which collect data from the sensing environment and transmit them toward the improved cluster head in ...A heterogeneous wireless sensor network comprises a number of inexpensive energy constrained wireless sensor nodes which collect data from the sensing environment and transmit them toward the improved cluster head in a coordinated way. Employing clustering techniques in such networks can achieve balanced energy consumption of member nodes and prolong the network lifetimes.In classical clustering techniques, clustering and in-cluster data routes are usually separated into independent operations. Although separate considerations of these two issues simplify the system design, it is often the non-optimal lifetime expectancy for wireless sensor networks. This paper proposes an integral framework that integrates these two correlated items in an interactive entirety. For that,we develop the clustering problems using nonlinear programming. Evolution process of clustering is provided in simulations. Results show that our joint-design proposal reaches the near optimal match between member nodes and cluster heads.展开更多
基金supported by the National Natural Science Foundation of China under Grants 42172161by the Heilongjiang Provincial Natural Science Foundation of China under Grant LH2020F003+2 种基金by the Heilongjiang Provincial Department of Education Project of China under Grants UNPYSCT-2020144by the Innovation Guidance Fund of Heilongjiang Province of China under Grants 15071202202by the Science and Technology Bureau Project of Qinhuangdao Province of China under Grants 202101A226.
文摘Spatio-temporal heterogeneous data is the database for decisionmaking in many fields,and checking its accuracy can provide data support for making decisions.Due to the randomness,complexity,global and local correlation of spatiotemporal heterogeneous data in the temporal and spatial dimensions,traditional detection methods can not guarantee both detection speed and accuracy.Therefore,this article proposes a method for detecting the accuracy of spatiotemporal heterogeneous data by fusing graph convolution and temporal convolution networks.Firstly,the geographic weighting function is introduced and improved to quantify the degree of association between nodes and calculate the weighted adjacency value to simplify the complex topology.Secondly,design spatiotemporal convolutional units based on graph convolutional neural networks and temporal convolutional networks to improve detection speed and accuracy.Finally,the proposed method is compared with three methods,ARIMA,T-GCN,and STGCN,in real scenarios to verify its effectiveness in terms of detection speed,detection accuracy and stability.The experimental results show that the RMSE,MAE,and MAPE of this method are the smallest in the cases of simple connectivity and complex connectivity degree,which are 13.82/12.08,2.77/2.41,and 16.70/14.73,respectively.Also,it detects the shortest time of 672.31/887.36,respectively.In addition,the evaluation results are the same under different time periods of processing and complex topology environment,which indicates that the detection accuracy of this method is the highest and has good research value and application prospects.
文摘A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow accurately.However,accurately predicting traffic flow at the individual road level is extremely difficult due to the complex interplay of spatial and temporal factors.This paper proposes a technique for predicting short-term traffic flow data using an architecture that utilizes convolutional bidirectional long short-term memory(Conv-BiLSTM)with attention mechanisms.Prior studies neglected to include data pertaining to factors such as holidays,weather conditions,and vehicle types,which are interconnected and significantly impact the accuracy of forecast outcomes.In addition,this research incorporates recurring monthly periodic pattern data that significantly enhances the accuracy of forecast outcomes.The experimental findings demonstrate a performance improvement of 21.68%when incorporating the vehicle type feature.
基金supported by the National Key Research and Development Program of China(grant number 2019YFE0123600)。
文摘The power Internet of Things(IoT)is a significant trend in technology and a requirement for national strategic development.With the deepening digital transformation of the power grid,China’s power system has initially built a power IoT architecture comprising a perception,network,and platform application layer.However,owing to the structural complexity of the power system,the construction of the power IoT continues to face problems such as complex access management of massive heterogeneous equipment,diverse IoT protocol access methods,high concurrency of network communications,and weak data security protection.To address these issues,this study optimizes the existing architecture of the power IoT and designs an integrated management framework for the access of multi-source heterogeneous data in the power IoT,comprising cloud,pipe,edge,and terminal parts.It further reviews and analyzes the key technologies involved in the power IoT,such as the unified management of the physical model,high concurrent access,multi-protocol access,multi-source heterogeneous data storage management,and data security control,to provide a more flexible,efficient,secure,and easy-to-use solution for multi-source heterogeneous data access in the power IoT.
基金Supported by Universitas Muhammadiyah Yogyakarta,Indonesia and Asia University,Taiwan.
文摘Predicting traffic flow is a crucial component of an intelligent transportation system.Precisely monitoring and predicting traffic flow remains a challenging endeavor.However,existingmethods for predicting traffic flow do not incorporate various external factors or consider the spatiotemporal correlation between spatially adjacent nodes,resulting in the loss of essential information and lower forecast performance.On the other hand,the availability of spatiotemporal data is limited.This research offers alternative spatiotemporal data with three specific features as input,vehicle type(5 types),holidays(3 types),and weather(10 conditions).In this study,the proposed model combines the advantages of the capability of convolutional(CNN)layers to extract valuable information and learn the internal representation of time-series data that can be interpreted as an image,as well as the efficiency of long short-term memory(LSTM)layers for identifying short-term and long-term dependencies.Our approach may utilize the heterogeneous spatiotemporal correlation features of the traffic flowdataset to deliver better performance traffic flow prediction than existing deep learning models.The research findings show that adding spatiotemporal feature data increases the forecast’s performance;weather by 25.85%,vehicle type by 23.70%,and holiday by 14.02%.
基金supported by the National Nature Science Foundation of China(Grant No.71401052)the National Social Science Foundation of China(Grant No.17BGL156)the Key Project of the National Social Science Foundation of China(Grant No.14AZD024)
文摘Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.
文摘To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.
文摘Federated learning(FL) is a machine learning paradigm for data silos and privacy protection,which aims to organize multiple clients for training global machine learning models without exposing data to all parties.However,when dealing with non-independently identically distributed(non-ⅡD) client data,FL cannot obtain more satisfactory results than centrally trained machine learning and even fails to match the accuracy of the local model obtained by client training alone.To analyze and address the above issues,we survey the state-of-theart methods in the literature related to FL on non-ⅡD data.On this basis,a motivation-based taxonomy,which classifies these methods into two categories,including heterogeneity reducing strategies and adaptability enhancing strategies,is proposed.Moreover,the core ideas and main challenges of these methods are analyzed.Finally,we envision several promising research directions that have not been thoroughly studied,in hope of promoting research in related fields to a certain extent.
文摘Data-driven methods are widely considered for fault diagnosis in complex systems.However,in practice,the between-class imbalance due to limited faulty samples may deteriorate their classification performance.To address this issue,synthetic minority methods for enhancing data have been proved to be effective in many applications.Generative adversarial networks(GANs),capable of automatic features extraction,can also be adopted for augmenting the faulty samples.However,the monitoring data of a complex system may include not only continuous signals but also discrete/categorical signals.Since the current GAN methods still have some challenges in handling such heterogeneous monitoring data,a Mixed Dual Discriminator GAN(noted as M-D2GAN)is proposed in this work.In order to render the expanded fault samples more aligned with the real situation and improve the accuracy and robustness of the fault diagnosis model,different types of variables are generated in different ways,including floating-point,integer,categorical,and hierarchical.For effectively considering the class imbalance problem,proper modifications are made to the GAN model,where a normal class discriminator is added.A practical case study concerning the braking system of a high-speed train is carried out to verify the effectiveness of the proposed framework.Compared to the classic GAN,the proposed framework achieves better results with respect to F-measure and G-mean metrics.
文摘With the rapid advancements in edge computing and artificial intelligence,federated learning(FL)has gained momentum as a promising approach to collaborative data utilization across organizations and devices,while ensuring data privacy and information security.In order to further harness the energy efficiency of wireless networks,an integrated sensing,communication and computation(ISCC)framework has been proposed,which is anticipated to be a key enabler in the era of 6G networks.Although the advantages of pushing intelligence to edge devices are multi-fold,some challenges arise when incorporating FL into wireless networks under the umbrella of ISCC.This paper provides a comprehensive survey of FL,with special emphasis on the design and optimization of ISCC.We commence by introducing the background and fundamentals of FL and the ISCC framework.Subsequently,the aforementioned challenges are highlighted and the state of the art in potential solutions is reviewed.Finally,design guidelines are provided for the incorporation of FL and ISCC.Overall,this paper aims to contribute to the understanding of FL in the context of wireless networks,with a focus on the ISCC framework,and provide insights into addressing the challenges and optimizing the design for the integration of FL into future 6G networks.
基金supported by the National Natural Science Foundation of China(Nos.92152301,12072282)。
文摘Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full use of both integrated and distributed loads,a modeling paradigm,called the heterogeneous data-driven aerodynamic modeling,is presented.The essential concept is to incorporate the physical information of distributed loads as additional constraints within the end-to-end aerodynamic modeling.Towards heterogenous data,a novel and easily applicable physical feature embedding modeling framework is designed.This framework extracts lowdimensional physical features from pressure distribution and then effectively enhances the modeling of the integrated loads via feature embedding.The proposed framework can be coupled with multiple feature extraction methods,and the well-performed generalization capabilities over different airfoils are verified through a transonic case.Compared with traditional direct modeling,the proposed framework can reduce testing errors by almost 50%.Given the same prediction accuracy,it can save more than half of the training samples.Furthermore,the visualization analysis has revealed a significant correlation between the discovered low-dimensional physical features and the heterogeneous aerodynamic loads,which shows the interpretability and credibility of the superior performance offered by the proposed deep learning framework.
基金supported by the National Natural Science Foundation of China under Grant Nos. 11801438,12161072 and 12171388the Natural Science Basic Research Plan in Shaanxi Province of China under Grant No. 2023-JC-YB-058the Innovation Capability Support Program of Shaanxi under Grant No. 2020PT-023。
文摘Structural change in panel data is a widespread phenomena. This paper proposes a fluctuation test to detect a structural change at an unknown date in heterogeneous panel data models with or without common correlated effects. The asymptotic properties of the fluctuation statistics in two cases are developed under the null and local alternative hypothesis. Furthermore, the consistency of the change point estimator is proven. Monte Carlo simulation shows that the fluctuation test can control the probability of type I error in most cases, and the empirical power is high in case of small and moderate sample sizes. An application of the procedure to a real data is presented.
基金supported in part by Natural Science Basic Research Program of Shaanxi under Grant No.2022JM-346.
文摘Federated learning(FL)allows data owners to train neural networks together without sharing local data,allowing the industrial Internet of Things(IIoT)to share a variety of data.However,traditional FL frameworks suffer from data heterogeneity and outdated models.To address these issues,this paper proposes a dualblockchain based multi-layer grouping federated learning(BMFL)architecture.BMFL divides the participant groups based on the training tasks,then realizes the model training by combining synchronous and asynchronous FL through the multi-layer grouping structure,and uses the model blockchain to record the characteristic tags of the global model,allowing group-manners to extract the model based on the feature requirements and solving the problem of data heterogeneity.In addition,to protect the privacy of the model gradient parameters and manage the key,the global model is stored in ciphertext,and the chameleon hash algorithm is used to perform the modification and management of the encrypted key on the key blockchain while keeping the block header hash unchanged.Finally,we evaluate the performance of BMFL on different public datasets and verify the practicality of the scheme with real fault datasets.The experimental results show that the proposed BMFL exhibits more stable and accurate convergence behavior than the classic FL algorithm,and the key revocation overhead time is reasonable.
基金supported by the National Basic Research Program of China (Grant No.2006CB500702)the Shanghai Lead-ing Academic Discipline Project (Grant No.J50103)Shanghai University Systems Biology Reasearch Funding (GrantNo.SBR08001)
文摘Alpha-synuclein plays an important role in Parkinson's disease(PD).The current study of alpha-synuclein mainly concentrates at the gene level.However, it is found that the study at the protein level has special significance.Meanwhile, there is free information on the Internet, such as databases and algorithms of protein-protein interactions(PPIs).In this paper, a novel method which integrates distributed heterogeneous data sources and algorithms to predict PPIs for alpha-synuclein in silico is proposed.The PPIs generated by the method take advantage of various experimental data, and indicate new information about PPIs for alpha-synuclein.In the end of this paper, the result illustrates that the method is practical.It is hoped that the prediction result obtained by this method can provide guidance for biological experiments of PPIs for alpha-synuclein to reveal possible mechanisms of PD.
基金This work is financially supported by the Ministry of Earth Science(MoES),Government of India,(Grant.No.MoES/36/OOIS/Extra/45/2015),URL:https://www.moes.gov.in。
文摘The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information Retrieval(IR)systems.The Semantic Web(SW)can solve this issue by integrating data into a single platform for information exchange and knowledge retrieval.This paper focuses on exploiting the SWbase systemto provide interoperability through ontologies by combining the data concepts with ontology classes.This paper presents a 4-phase weather data model:data processing,ontology creation,SW processing,and query engine.The developed Oceanographic Weather Ontology helps to enhance data analysis,discovery,IR,and decision making.In addition to that,it also evaluates the developed ontology with other state-of-the-art ontologies.The proposed ontology’s quality has improved by 39.28%in terms of completeness,and structural complexity has decreased by 45.29%,11%and 37.7%in Precision and Accuracy.Indian Meteorological Satellite INSAT-3D’s ocean data is a typical example of testing the proposed model.The experimental result shows the effectiveness of the proposed data model and its advantages in machine understanding and IR.
基金National Natural Science Foundation of China(No.41771484)。
文摘OpenStreetMap has a large number of volunteers.There is a hypothesis that volunteers with different cultural backgrounds may have different editing behaviors when contributing to OSM.It may be strongly related to data quality and data reliability on OSM.As for the heterogeneity and the reliability of OSM data,previous research usually focuses on the geometric accuracy,spatial location accuracy and semantic integrity of OSM data,while few researchers have analyzed these problems from the perspective of editing behavior.On the grounds of relationship between mapping motivation and editing behavior,the dispersion of editing trajectory and clockwise direction index are proposed in the paper to explore whether the volunteers are sufficiently motivated and knowledgeable.In the experiments,the historical OSM data of four countries suggested that developed countries have lower trajectory dispersion.The lower degree of trajectory dispersion reflects the higher concentration and professionalism of volunteers.A high degree of drawing direction consistency shows volunteers who mapped French data were natives with local knowledge.From this point of view,this paper verifies that volunteer editing behavior is an effective method to analyze data quality heterogeneity and data reliability.
基金supported in part by the National Key Research and Development Program of China(No.2018YFB1003700)in part by the National Natural Science Foundation of China(No.61836001)。
文摘Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collected,and further analyzed to discover device maintenance and health related potential knowledge behind.IoT data-based fault diagnosis for industrial devices is very helpful to the sustainability and applicability of an IoT ecosystem.But how to efficiently use and fuse this multimodal heterogeneous data to realize intelligent fault diagnosis is still a challenge.In this paper,a novel Deep Multimodal Learning and Fusion(DMLF)based fault diagnosis method is proposed for addressing heterogeneous data from IoT environments where industrial devices coexist.First,a DMLF model is designed by combining a Convolution Neural Network(CNN)and Stacked Denoising Autoencoder(SDAE)together to capture more comprehensive fault knowledge and extract features from different modal data.Second,these multimodal features are seamlessly integrated at a fusion layer and the resulting fused features are further used to train a classifier for recognizing potential faults.Third,a two-stage training algorithm is proposed by combining supervised pre-training and fine-tuning to simplify the training process for deep structure models.A series of experiments are conducted over multimodal heterogeneous data from a gear device to verify our proposed fault diagnosis method.The experimental results show that our method outperforms the benchmarking ones in fault diagnosis accuracy.
基金Under the auspices of National Key Research and Development Program of China(No.2017YFA0603002)National Natural Science Foundation of China(No.31800358,31700369)+1 种基金Jiangsu Agricultural Science and Technology Innovation Fund(No.CX(19)3099)the Foundation of Jiangsu Vocational College of Agriculture and Forestry(No.2019kj014)。
文摘Detailed information on the spatio-temporal changes of cropland soil organic carbon(SOC) can significantly contribute to the improvement of soil fertility and mitigate climate change. Nonetheless, information and knowledge on the national scale spatio-temporal changes and the corresponding uncertainties of SOC in Chinese upland soils remain limited. The CENTURY model was used to estimate the SOC storages and their changes in Chinese uplands from 1980 to 2010. With the Monte Carlo method, the uncertainties of CENTURY-modelled SOC dynamics associated with the spatial heterogeneous model inputs were quantified. Results revealed that the SOC storage in Chinese uplands increased from 3.03(1.59 to 4.78) Pg C in 1980 to 3.40(2.39 to 4.62) Pg C in 2010. Increment of SOC storage during this period was 370 Tg C, with an uncertainty interval of –440 to 1110 Tg C. The regional disparities of SOC changes reached a significant level, with considerable SOC accumulation in the Huang-Huai-Hai Plain of China and SOC loss in the northeastern China. The SOC lost from Meadow soils, Black soils and Chernozems was most severe, whilst SOC accumulation in Fluvo-aquic soils, Cinnamon soils and Purplish soils was most significant. In modelling large-scale SOC dynamics, the initial soil properties were major sources of uncertainty. Hence, more detailed information concerning the soil properties must be collected. The SOC stock of Chinese uplands in 2010 was still relatively low, manifesting that recommended agricultural management practices in conjunction with effectively economic and policy incentives to farmers for soil fertility improvement were indispensable for future carbon sequestration in these regions.
基金supported by the National Natural Science Foundation of China(No.62036006)the Fundamental Research Funds for the Central Universities,Chinathe Innovation Fund of Xidian University,China.
文摘With the growing awareness of data privacy,federated learning(FL)has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind,which allows building models in a collaborative but private way without exchanging data.However,most FL clients are currently unimodal.With the rise of edge computing,various types of sensors and wearable devices generate a large amount of data from different modalities,which has inspired research efforts in multimodal federated learning(MMFL).In this survey,we explore the area of MMFL to address the fundamental challenges of FL on multimodal data.First,we analyse the key motivations for MMFL.Second,the currently proposed MMFL methods are technically classified according to the modality distributions and modality annotations in MMFL.Then,we discuss the datasets and application scenarios of MMFL.Finally,we highlight the limitations and challenges of MMFL and provide insights and methods for future research.
文摘The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships.
基金supported by National Natural Science Foundation of China(Nos.61304131 and 61402147)Grant of China Scholarship Council(No.201608130174)+2 种基金Natural Science Foundation of Hebei Province(Nos.F2016402054 and F2014402075)the Scientific Research Plan Projects of Hebei Education Department(Nos.BJ2014019,ZD2015087 and QN2015046)the Research Program of Talent Cultivation Project in Hebei Province(No.A2016002023)
文摘A heterogeneous wireless sensor network comprises a number of inexpensive energy constrained wireless sensor nodes which collect data from the sensing environment and transmit them toward the improved cluster head in a coordinated way. Employing clustering techniques in such networks can achieve balanced energy consumption of member nodes and prolong the network lifetimes.In classical clustering techniques, clustering and in-cluster data routes are usually separated into independent operations. Although separate considerations of these two issues simplify the system design, it is often the non-optimal lifetime expectancy for wireless sensor networks. This paper proposes an integral framework that integrates these two correlated items in an interactive entirety. For that,we develop the clustering problems using nonlinear programming. Evolution process of clustering is provided in simulations. Results show that our joint-design proposal reaches the near optimal match between member nodes and cluster heads.