Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist...Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.展开更多
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha...Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detec...There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detection model for big data based on nondestructive partitioning and balanced allocation.A data allocation strategy based on capacity and workload is introduced to achieve local load balance,and a dynamic load adjustment strategy is adopted to maintain global load balance of cluster.Moreover,data integrity is protected by using session reassemble and session partitioning.The simulation results show that the new model enjoys favorable advantages such as good load balance,higher detection rate and detection efficiency.展开更多
On 21 May 2021(UTC),an MW 7.4 earthquake jolted the east Bayan Har block in the Tibetan Plateau.The earthquake received widespread attention as it is the largest event in the Tibetan Plateau and its surroundings since...On 21 May 2021(UTC),an MW 7.4 earthquake jolted the east Bayan Har block in the Tibetan Plateau.The earthquake received widespread attention as it is the largest event in the Tibetan Plateau and its surroundings since the 2008 Wenchuan earthquake,and especially in proximity to the seismic gaps on the east Kunlun fault.Here we use satellite interferometric synthetic aperture radar data and subpixel offset observations along the range directions to characterize the coseismic deformation of the earthquake.Range offset displacements depict clear surface ruptures with a total length of~170 km involving two possible activated fault segments in the earthquake.Coseismic modeling results indicate that the earthquake was dominated by left-lateral strike-slip motions of up to 7 m within the top 12 km of the crust.The well-resolved slip variations are characterized by five major slip patches along strike and 64%of shallow slip deficit,suggesting a young seismogenic structure.Spatial-temporal changes of the postseismic deformation are mapped from early 6-day and 24-day InSAR observations,and are well explained by time-dependent afterslip models.Analysis of Global Navigation Satellite System(GNSS)velocity profiles and strain rates suggests that the eastward extrusion of plateau is diffusely distributed across the east Bayan Har block,but exhibits significant lateral heterogeneities,as evidenced by magnetotelluric observations.The block-wide distributed deformation of the east Bayan Har block along with the significant co-and post-seismic stress loadings from the Madoi earthquake imply high seismic risks along regional faults,especially the Tuosuo Lake and Maqên-Maqu segments of the Kunlun fault that are known as seismic gaps.展开更多
Glacier disasters occur frequently in alpine regions around the world,but the current conventional geological disaster measurement technology cannot be directly used for glacier disaster measurement.Hence,in this stud...Glacier disasters occur frequently in alpine regions around the world,but the current conventional geological disaster measurement technology cannot be directly used for glacier disaster measurement.Hence,in this study,a distributed multi-sensor measurement system for glacier deformation was established by integrating piezoelectric sensing,coded sensing,attitude sensing technology and wireless communication technology.The traditional Modbus protocol was optimized to solve the problem of data identification confusion of different acquisition nodes.Through indoor wireless transmission,adaptive performance analysis,error measurement experiment and landslide simulation experiment,the performance of the measurement system was analyzed and evaluated.Using unmanned aerial vehicle technology,the reliability and effectiveness of the measurement system were verified on the site of Galongla glacier in southeastern Tibet,China.The results show that the mean absolute percentage errors were only 1.13%and 2.09%for the displacement and temperature,respectively.The distributed glacier deformation real-time measurement system provides a new means for the assessment of the development process of glacier disasters and disaster prevention and mitigation.展开更多
Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of dat...Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.展开更多
As an emerging hot technology,smart grids(SGs)are being employed in many fields,such as smart homes and smart cities.Moreover,the application of artificial intelligence(AI)in SGs has promoted the development of the po...As an emerging hot technology,smart grids(SGs)are being employed in many fields,such as smart homes and smart cities.Moreover,the application of artificial intelligence(AI)in SGs has promoted the development of the power industry.However,as users’demands for electricity increase,traditional centralized power trading is unable to well meet the user demands and an increasing number of small distributed generators are being employed in trading activities.This not only leads to numerous security risks for the trading data but also has a negative impact on the cost of power generation,electrical security,and other aspects.Accordingly,this study proposes a distributed power trading scheme based on blockchain and AI.To protect the legitimate rights and interests of consumers and producers,credibility is used as an indicator to restrict untrustworthy behavior.Simultaneously,the reliability and communication capabilities of nodes are considered in block verification to improve the transaction confirmation efficiency,and a weighted communication tree construction algorithm is designed to achieve superior data forwarding.Finally,AI sensors are set up in power equipment to detect electricity generation and transmission,which alert users when security hazards occur,such as thunderstorms or typhoons.The experimental results show that the proposed scheme can not only improve the trading security but also reduce system communication delays.展开更多
Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.Th...Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.展开更多
Deep neural networks are gaining importance and popularity in applications and services.Due to the enormous number of learnable parameters and datasets,the training of neural networks is computationally costly.Paralle...Deep neural networks are gaining importance and popularity in applications and services.Due to the enormous number of learnable parameters and datasets,the training of neural networks is computationally costly.Parallel and distributed computation-based strategies are used to accelerate this training process.Generative Adversarial Networks(GAN)are a recent technological achievement in deep learning.These generative models are computationally expensive because a GAN consists of two neural networks and trains on enormous datasets.Typically,a GAN is trained on a single server.Conventional deep learning accelerator designs are challenged by the unique properties of GAN,like the enormous computation stages with non-traditional convolution layers.This work addresses the issue of distributing GANs so that they can train on datasets distributed over many TPUs(Tensor Processing Unit).Distributed learning training accelerates the learning process and decreases computation time.In this paper,the Generative Adversarial Network is accelerated using the distributed multi-core TPU in distributed data-parallel synchronous model.For adequate acceleration of the GAN network,the data parallel SGD(Stochastic Gradient Descent)model is implemented in multi-core TPU using distributed TensorFlow with mixed precision,bfloat16,and XLA(Accelerated Linear Algebra).The study was conducted on the MNIST dataset for varying batch sizes from 64 to 512 for 30 epochs in distributed SGD in TPU v3 with 128×128 systolic array.An extensive batch technique is implemented in bfloat16 to decrease the storage cost and speed up floating-point computations.The accelerated learning curve for the generator and discriminator network is obtained.The training time was reduced by 79%by varying the batch size from 64 to 512 in multi-core TPU.展开更多
Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable opera...Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.展开更多
In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated alon...In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated along with time-consuming to process a massive amount of data.Thus,to design the Distribution Preserving Framework for BD,a novel methodology has been proposed utilizing Manhattan Distance(MD)-centered Partition Around Medoid(MD–PAM)along with Conjugate Gradient Artificial Neural Network(CG-ANN),which undergoes various steps to reduce the complications of BD.Firstly,the data are processed in the pre-processing phase by mitigating the data repetition utilizing the map-reduce function;subsequently,the missing data are handled by substituting or by ignoring the missed values.After that,the data are transmuted into a normalized form.Next,to enhance the classification performance,the data’s dimensionalities are minimized by employing Gaussian Kernel(GK)-Fisher Discriminant Analysis(GK-FDA).Afterwards,the processed data is submitted to the partitioning phase after transmuting it into a structured format.In the partition phase,by utilizing the MD-PAM,the data are partitioned along with grouped into a cluster.Lastly,by employing CG-ANN,the data are classified in the classification phase so that the needed data can be effortlessly retrieved by the user.To analogize the outcomes of the CG-ANN with the prevailing methodologies,the NSL-KDD openly accessible datasets are utilized.The experiential outcomes displayed that an efficient result along with a reduced computation cost was shown by the proposed CG-ANN.The proposed work outperforms well in terms of accuracy,sensitivity and specificity than the existing systems.展开更多
Adaptive packet scheduling can efficiently enhance the performance of multipath Data Transmission.However,realizing precise packet scheduling is challenging due to the nature of high dynamics and unpredictability of n...Adaptive packet scheduling can efficiently enhance the performance of multipath Data Transmission.However,realizing precise packet scheduling is challenging due to the nature of high dynamics and unpredictability of network link states.To this end,this paper proposes a distributed asynchronous deep reinforcement learning framework to intensify the dynamics and prediction of adaptive packet scheduling.Our framework contains two parts:local asynchronous packet scheduling and distributed cooperative control center.In local asynchronous packet scheduling,an asynchronous prioritized replay double deep Q-learning packets scheduling algorithm is proposed for dynamic adaptive packet scheduling learning,which makes a combination of prioritized replay double deep Q-learning network(P-DDQN)to make the fitting analysis.In distributed cooperative control center,a distributed scheduling learning and neural fitting acceleration algorithm to adaptively update neural network parameters of P-DDQN for more precise packet scheduling.Experimental results show that our solution has a better performance than Random weight algorithm and Round-Robin algorithm in throughput and loss ratio.Further,our solution has 1.32 times and 1.54 times better than Random weight algorithm and Round-Robin algorithm on the stability of multipath data transmission,respectively.展开更多
Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-qual...Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-quality data consistently.In the power system,the electricity consumption data of some large users cannot be normally collected resulting in missing data,which affects the calculation of power supply and eventually leads to a large error in the daily power line loss rate.For the problem of missing electricity consumption data,this study proposes a group method of data handling(GMDH)based data interpolation method in distribution power networks and applies it in the analysis of actually collected electricity data.First,the dependent and independent variables are defined from the original data,and the upper and lower limits of missing values are determined according to prior knowledge or existing data information.All missing data are randomly interpolated within the upper and lower limits.Then,the GMDH network is established to obtain the optimal complexity model,which is used to predict the missing data to replace the last imputed electricity consumption data.At last,this process is implemented iteratively until the missing values do not change.Under a relatively small noise level(α=0.25),the proposed approach achieves a maximum error of no more than 0.605%.Experimental findings demonstrate the efficacy and feasibility of the proposed approach,which realizes the transformation from incomplete data to complete data.Also,this proposed data interpolation approach provides a strong basis for the electricity theft diagnosis and metering fault analysis of electricity enterprises.展开更多
This paper presents a data fusion method in distributed multi-sensor system including GPS and INS sensors’ data processing. First, a residual χ 2 \|test strategy with the corresponding algorithm is designed. Then a ...This paper presents a data fusion method in distributed multi-sensor system including GPS and INS sensors’ data processing. First, a residual χ 2 \|test strategy with the corresponding algorithm is designed. Then a coefficient matrices calculation method of the information sharing principle is derived. Finally, the federated Kalman filter is used to combine these independent, parallel, real\|time data. A pseudolite (PL) simulation example is given.展开更多
This paper describes the architecture of global distributed storage system for data grid. It focue on the management and the capability for the maximum users and maximum resources on the Internet, as well as performan...This paper describes the architecture of global distributed storage system for data grid. It focue on the management and the capability for the maximum users and maximum resources on the Internet, as well as performance and other issues.展开更多
This Paper presents a data fusion method with distributed sequence detection for on hypothasis testingtheory including the data fusion algorithm of sequence detection based on least error probability rule, the decisio...This Paper presents a data fusion method with distributed sequence detection for on hypothasis testingtheory including the data fusion algorithm of sequence detection based on least error probability rule, the decision ruleand the calcation formula of the detction times and the simulation result of system performance as well.展开更多
Aiming at the shortcomings in intrusion detection systems (IDSs) used incommercial and research fields, we propose the MA-IDS system, a distributed intrusion detectionsystem based on data mining. In this model, misuse...Aiming at the shortcomings in intrusion detection systems (IDSs) used incommercial and research fields, we propose the MA-IDS system, a distributed intrusion detectionsystem based on data mining. In this model, misuse intrusion detection system CM1DS) and anomalyintrusion de-lection system (AIDS) are combined. Data mining is applied to raise detectionperformance, and distributed mechanism is employed to increase the scalability and efficiency. Host-and network-based mining algorithms employ an improved. Bayes-ian decision theorem that suits forreal security environment to minimize the risks incurred by false decisions. We describe the overallarchitecture of the MA-IDS system, and discuss specific design and implementation issue.展开更多
HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research fiel...HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.展开更多
With the reform of rural network enterprise system,the speed of transfer property rights in rural power enterprises is accelerated.The evaluation of the operation and development status of rural power enterprises is d...With the reform of rural network enterprise system,the speed of transfer property rights in rural power enterprises is accelerated.The evaluation of the operation and development status of rural power enterprises is directly related to the future development and investment direction of rural power enterprises.At present,the evaluation of the production and operation of rural network enterprises and the development status of power network only relies on the experience of the evaluation personnel,sets the reference index,and forms the evaluation results through artificial scoring.Due to the strong subjective consciousness of the evaluation results,the practical guiding significance is weak.Therefore,distributed data mining method in rural power enterprises status evaluation was proposed which had been applied in many fields,such as food science,economy or chemical industry.The distributed mathematical model was established by using principal component analysis(PCA)and regression analysis.By screening various technical indicators and determining their relevance,the reference value of evaluation results was improved.Combined with statistical program for social sciences(SPSS)data analysis software,the operation status of rural network enterprises was evaluated,and the rationality,effectiveness and economy of the evaluation was verified through comparison with current evaluation results and calculation examples of actual grid operation data.展开更多
基金supported by National Natural Sciences Foundation of China(No.62271165,62027802,62201307)the Guangdong Basic and Applied Basic Research Foundation(No.2023A1515030297)+2 种基金the Shenzhen Science and Technology Program ZDSYS20210623091808025Stable Support Plan Program GXWD20231129102638002the Major Key Project of PCL(No.PCL2024A01)。
文摘Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.
基金supported by STI 2030-Major Projects 2021ZD0200400National Natural Science Foundation of China(62276233 and 62072405)Key Research Project of Zhejiang Province(2023C01048).
文摘Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
文摘There are two key issues in distributed intrusion detection system,that is,maintaining load balance of system and protecting data integrity.To address these issues,this paper proposes a new distributed intrusion detection model for big data based on nondestructive partitioning and balanced allocation.A data allocation strategy based on capacity and workload is introduced to achieve local load balance,and a dynamic load adjustment strategy is adopted to maintain global load balance of cluster.Moreover,data integrity is protected by using session reassemble and session partitioning.The simulation results show that the new model enjoys favorable advantages such as good load balance,higher detection rate and detection efficiency.
基金supported by the Natural Science Foundation of Jiangsu Province(Grant No.SBK2020043202)by Key Laboratory of Geospace Environment and Geodesy,Ministry of Education,Wuhan University(No.19-01-08).
文摘On 21 May 2021(UTC),an MW 7.4 earthquake jolted the east Bayan Har block in the Tibetan Plateau.The earthquake received widespread attention as it is the largest event in the Tibetan Plateau and its surroundings since the 2008 Wenchuan earthquake,and especially in proximity to the seismic gaps on the east Kunlun fault.Here we use satellite interferometric synthetic aperture radar data and subpixel offset observations along the range directions to characterize the coseismic deformation of the earthquake.Range offset displacements depict clear surface ruptures with a total length of~170 km involving two possible activated fault segments in the earthquake.Coseismic modeling results indicate that the earthquake was dominated by left-lateral strike-slip motions of up to 7 m within the top 12 km of the crust.The well-resolved slip variations are characterized by five major slip patches along strike and 64%of shallow slip deficit,suggesting a young seismogenic structure.Spatial-temporal changes of the postseismic deformation are mapped from early 6-day and 24-day InSAR observations,and are well explained by time-dependent afterslip models.Analysis of Global Navigation Satellite System(GNSS)velocity profiles and strain rates suggests that the eastward extrusion of plateau is diffusely distributed across the east Bayan Har block,but exhibits significant lateral heterogeneities,as evidenced by magnetotelluric observations.The block-wide distributed deformation of the east Bayan Har block along with the significant co-and post-seismic stress loadings from the Madoi earthquake imply high seismic risks along regional faults,especially the Tuosuo Lake and Maqên-Maqu segments of the Kunlun fault that are known as seismic gaps.
基金funded by National Key R&D Program of China((Nos.2022YFC3003403 and 2018YFC1505203)Key Research and Development Program of Tibet Autonomous Region(XZ202301ZY0039G)+1 种基金Natural Science Foundation of Hebei Province(No.F2021201031)Geological Survey Project of China Geological Survey(No.DD20221747)。
文摘Glacier disasters occur frequently in alpine regions around the world,but the current conventional geological disaster measurement technology cannot be directly used for glacier disaster measurement.Hence,in this study,a distributed multi-sensor measurement system for glacier deformation was established by integrating piezoelectric sensing,coded sensing,attitude sensing technology and wireless communication technology.The traditional Modbus protocol was optimized to solve the problem of data identification confusion of different acquisition nodes.Through indoor wireless transmission,adaptive performance analysis,error measurement experiment and landslide simulation experiment,the performance of the measurement system was analyzed and evaluated.Using unmanned aerial vehicle technology,the reliability and effectiveness of the measurement system were verified on the site of Galongla glacier in southeastern Tibet,China.The results show that the mean absolute percentage errors were only 1.13%and 2.09%for the displacement and temperature,respectively.The distributed glacier deformation real-time measurement system provides a new means for the assessment of the development process of glacier disasters and disaster prevention and mitigation.
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.
基金supported by the National Natural Science Foundation of China with Grants 61771289 and 61832012the Natural Science Foundation of Shandong Province with Grants ZR2021QF050 and ZR2021MF075+3 种基金Shandong Natural Science Foundation Major Basic Research with Grant ZR2019ZD10Shandong Key Research and Development Program with Grant 2019GGX1050Shandong Major Agricultural Application Technology Innovation Project with Grant SD2019NJ007National Natural Science Foundation of Shandong Province Grants ZR2022MF304.
文摘As an emerging hot technology,smart grids(SGs)are being employed in many fields,such as smart homes and smart cities.Moreover,the application of artificial intelligence(AI)in SGs has promoted the development of the power industry.However,as users’demands for electricity increase,traditional centralized power trading is unable to well meet the user demands and an increasing number of small distributed generators are being employed in trading activities.This not only leads to numerous security risks for the trading data but also has a negative impact on the cost of power generation,electrical security,and other aspects.Accordingly,this study proposes a distributed power trading scheme based on blockchain and AI.To protect the legitimate rights and interests of consumers and producers,credibility is used as an indicator to restrict untrustworthy behavior.Simultaneously,the reliability and communication capabilities of nodes are considered in block verification to improve the transaction confirmation efficiency,and a weighted communication tree construction algorithm is designed to achieve superior data forwarding.Finally,AI sensors are set up in power equipment to detect electricity generation and transmission,which alert users when security hazards occur,such as thunderstorms or typhoons.The experimental results show that the proposed scheme can not only improve the trading security but also reduce system communication delays.
基金supported by the Science and Technology Project of China Southern Power Grid(GZHKJXM20210043-080041KK52210002).
文摘Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.
文摘Deep neural networks are gaining importance and popularity in applications and services.Due to the enormous number of learnable parameters and datasets,the training of neural networks is computationally costly.Parallel and distributed computation-based strategies are used to accelerate this training process.Generative Adversarial Networks(GAN)are a recent technological achievement in deep learning.These generative models are computationally expensive because a GAN consists of two neural networks and trains on enormous datasets.Typically,a GAN is trained on a single server.Conventional deep learning accelerator designs are challenged by the unique properties of GAN,like the enormous computation stages with non-traditional convolution layers.This work addresses the issue of distributing GANs so that they can train on datasets distributed over many TPUs(Tensor Processing Unit).Distributed learning training accelerates the learning process and decreases computation time.In this paper,the Generative Adversarial Network is accelerated using the distributed multi-core TPU in distributed data-parallel synchronous model.For adequate acceleration of the GAN network,the data parallel SGD(Stochastic Gradient Descent)model is implemented in multi-core TPU using distributed TensorFlow with mixed precision,bfloat16,and XLA(Accelerated Linear Algebra).The study was conducted on the MNIST dataset for varying batch sizes from 64 to 512 for 30 epochs in distributed SGD in TPU v3 with 128×128 systolic array.An extensive batch technique is implemented in bfloat16 to decrease the storage cost and speed up floating-point computations.The accelerated learning curve for the generator and discriminator network is obtained.The training time was reduced by 79%by varying the batch size from 64 to 512 in multi-core TPU.
文摘Distribution networks denote important public infrastructure necessary for people’s livelihoods.However,extreme natural disasters,such as earthquakes,typhoons,and mudslides,severely threaten the safe and stable operation of distribution networks and power supplies needed for daily life.Therefore,considering the requirements for distribution network disaster prevention and mitigation,there is an urgent need for in-depth research on risk assessment methods of distribution networks under extreme natural disaster conditions.This paper accessesmultisource data,presents the data quality improvement methods of distribution networks,and conducts data-driven active fault diagnosis and disaster damage analysis and evaluation using data-driven theory.Furthermore,the paper realizes real-time,accurate access to distribution network disaster information.The proposed approach performs an accurate and rapid assessment of cross-sectional risk through case study.The minimal average annual outage time can be reduced to 3 h/a in the ring network through case study.The approach proposed in this paper can provide technical support to the further improvement of the ability of distribution networks to cope with extreme natural disasters.
文摘In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated along with time-consuming to process a massive amount of data.Thus,to design the Distribution Preserving Framework for BD,a novel methodology has been proposed utilizing Manhattan Distance(MD)-centered Partition Around Medoid(MD–PAM)along with Conjugate Gradient Artificial Neural Network(CG-ANN),which undergoes various steps to reduce the complications of BD.Firstly,the data are processed in the pre-processing phase by mitigating the data repetition utilizing the map-reduce function;subsequently,the missing data are handled by substituting or by ignoring the missed values.After that,the data are transmuted into a normalized form.Next,to enhance the classification performance,the data’s dimensionalities are minimized by employing Gaussian Kernel(GK)-Fisher Discriminant Analysis(GK-FDA).Afterwards,the processed data is submitted to the partitioning phase after transmuting it into a structured format.In the partition phase,by utilizing the MD-PAM,the data are partitioned along with grouped into a cluster.Lastly,by employing CG-ANN,the data are classified in the classification phase so that the needed data can be effortlessly retrieved by the user.To analogize the outcomes of the CG-ANN with the prevailing methodologies,the NSL-KDD openly accessible datasets are utilized.The experiential outcomes displayed that an efficient result along with a reduced computation cost was shown by the proposed CG-ANN.The proposed work outperforms well in terms of accuracy,sensitivity and specificity than the existing systems.
基金the National Key Research and Development Program of China under Grant No.2018YFE0206800the National Natural Science Foundation of Beijing,China,under Grant No.4212010the National Natural Science Foundation of China,under Grant No.61971028。
文摘Adaptive packet scheduling can efficiently enhance the performance of multipath Data Transmission.However,realizing precise packet scheduling is challenging due to the nature of high dynamics and unpredictability of network link states.To this end,this paper proposes a distributed asynchronous deep reinforcement learning framework to intensify the dynamics and prediction of adaptive packet scheduling.Our framework contains two parts:local asynchronous packet scheduling and distributed cooperative control center.In local asynchronous packet scheduling,an asynchronous prioritized replay double deep Q-learning packets scheduling algorithm is proposed for dynamic adaptive packet scheduling learning,which makes a combination of prioritized replay double deep Q-learning network(P-DDQN)to make the fitting analysis.In distributed cooperative control center,a distributed scheduling learning and neural fitting acceleration algorithm to adaptively update neural network parameters of P-DDQN for more precise packet scheduling.Experimental results show that our solution has a better performance than Random weight algorithm and Round-Robin algorithm in throughput and loss ratio.Further,our solution has 1.32 times and 1.54 times better than Random weight algorithm and Round-Robin algorithm on the stability of multipath data transmission,respectively.
基金This research was funded by the National Nature Sciences Foundation of China(Grant No.42250410321).
文摘Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-quality data consistently.In the power system,the electricity consumption data of some large users cannot be normally collected resulting in missing data,which affects the calculation of power supply and eventually leads to a large error in the daily power line loss rate.For the problem of missing electricity consumption data,this study proposes a group method of data handling(GMDH)based data interpolation method in distribution power networks and applies it in the analysis of actually collected electricity data.First,the dependent and independent variables are defined from the original data,and the upper and lower limits of missing values are determined according to prior knowledge or existing data information.All missing data are randomly interpolated within the upper and lower limits.Then,the GMDH network is established to obtain the optimal complexity model,which is used to predict the missing data to replace the last imputed electricity consumption data.At last,this process is implemented iteratively until the missing values do not change.Under a relatively small noise level(α=0.25),the proposed approach achieves a maximum error of no more than 0.605%.Experimental findings demonstrate the efficacy and feasibility of the proposed approach,which realizes the transformation from incomplete data to complete data.Also,this proposed data interpolation approach provides a strong basis for the electricity theft diagnosis and metering fault analysis of electricity enterprises.
文摘This paper presents a data fusion method in distributed multi-sensor system including GPS and INS sensors’ data processing. First, a residual χ 2 \|test strategy with the corresponding algorithm is designed. Then a coefficient matrices calculation method of the information sharing principle is derived. Finally, the federated Kalman filter is used to combine these independent, parallel, real\|time data. A pseudolite (PL) simulation example is given.
文摘This paper describes the architecture of global distributed storage system for data grid. It focue on the management and the capability for the maximum users and maximum resources on the Internet, as well as performance and other issues.
文摘This Paper presents a data fusion method with distributed sequence detection for on hypothasis testingtheory including the data fusion algorithm of sequence detection based on least error probability rule, the decision ruleand the calcation formula of the detction times and the simulation result of system performance as well.
文摘Aiming at the shortcomings in intrusion detection systems (IDSs) used incommercial and research fields, we propose the MA-IDS system, a distributed intrusion detectionsystem based on data mining. In this model, misuse intrusion detection system CM1DS) and anomalyintrusion de-lection system (AIDS) are combined. Data mining is applied to raise detectionperformance, and distributed mechanism is employed to increase the scalability and efficiency. Host-and network-based mining algorithms employ an improved. Bayes-ian decision theorem that suits forreal security environment to minimize the risks incurred by false decisions. We describe the overallarchitecture of the MA-IDS system, and discuss specific design and implementation issue.
基金The project supported by the Meg-science Engineering Project of the Chinese Academy of Sciences
文摘HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.
基金Supported by Funding(2017RAXXJ075)from Harbin Applied Technology Research and Development Project
文摘With the reform of rural network enterprise system,the speed of transfer property rights in rural power enterprises is accelerated.The evaluation of the operation and development status of rural power enterprises is directly related to the future development and investment direction of rural power enterprises.At present,the evaluation of the production and operation of rural network enterprises and the development status of power network only relies on the experience of the evaluation personnel,sets the reference index,and forms the evaluation results through artificial scoring.Due to the strong subjective consciousness of the evaluation results,the practical guiding significance is weak.Therefore,distributed data mining method in rural power enterprises status evaluation was proposed which had been applied in many fields,such as food science,economy or chemical industry.The distributed mathematical model was established by using principal component analysis(PCA)and regression analysis.By screening various technical indicators and determining their relevance,the reference value of evaluation results was improved.Combined with statistical program for social sciences(SPSS)data analysis software,the operation status of rural network enterprises was evaluated,and the rationality,effectiveness and economy of the evaluation was verified through comparison with current evaluation results and calculation examples of actual grid operation data.