The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive st...The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive structure for measuring the worth of data elements,hindering effective navigation of the changing digital environment.This paper aims to fill this research gap by introducing the innovative concept of“data components.”It proposes a graphtheoretic representation model that presents a clear mathematical definition and demonstrates the superiority of data components over traditional processing methods.Additionally,the paper introduces an information measurement model that provides a way to calculate the information entropy of data components and establish their increased informational value.The paper also assesses the value of information,suggesting a pricing mechanism based on its significance.In conclusion,this paper establishes a robust framework for understanding and quantifying the value of implicit information in data,laying the groundwork for future research and practical applications.展开更多
Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policy...Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policymanagement efficiency and difficulty in accurately describing the access control policy. To overcome theseproblems, this paper proposes a big data access control mechanism based on a two-layer permission decisionstructure. This mechanism extends the attribute-based access control (ABAC) model. Business attributes areintroduced in the ABAC model as business constraints between entities. The proposed mechanism implementsa two-layer permission decision structure composed of the inherent attributes of access control entities and thebusiness attributes, which constitute the general permission decision algorithm based on logical calculation andthe business permission decision algorithm based on a bi-directional long short-term memory (BiLSTM) neuralnetwork, respectively. The general permission decision algorithm is used to implement accurate policy decisions,while the business permission decision algorithm implements fuzzy decisions based on the business constraints.The BiLSTM neural network is used to calculate the similarity of the business attributes to realize intelligent,adaptive, and efficient access control permission decisions. Through the two-layer permission decision structure,the complex and diverse big data access control management requirements can be satisfied by considering thesecurity and availability of resources. Experimental results show that the proposed mechanism is effective andreliable. In summary, it can efficiently support the secure sharing of big data resources.展开更多
With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for clou...With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis.展开更多
Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-qual...Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-quality data consistently.In the power system,the electricity consumption data of some large users cannot be normally collected resulting in missing data,which affects the calculation of power supply and eventually leads to a large error in the daily power line loss rate.For the problem of missing electricity consumption data,this study proposes a group method of data handling(GMDH)based data interpolation method in distribution power networks and applies it in the analysis of actually collected electricity data.First,the dependent and independent variables are defined from the original data,and the upper and lower limits of missing values are determined according to prior knowledge or existing data information.All missing data are randomly interpolated within the upper and lower limits.Then,the GMDH network is established to obtain the optimal complexity model,which is used to predict the missing data to replace the last imputed electricity consumption data.At last,this process is implemented iteratively until the missing values do not change.Under a relatively small noise level(α=0.25),the proposed approach achieves a maximum error of no more than 0.605%.Experimental findings demonstrate the efficacy and feasibility of the proposed approach,which realizes the transformation from incomplete data to complete data.Also,this proposed data interpolation approach provides a strong basis for the electricity theft diagnosis and metering fault analysis of electricity enterprises.展开更多
Cloud computing has emerged as a viable alternative to traditional computing infrastructures,offering various benefits.However,the adoption of cloud storage poses significant risks to data secrecy and integrity.This a...Cloud computing has emerged as a viable alternative to traditional computing infrastructures,offering various benefits.However,the adoption of cloud storage poses significant risks to data secrecy and integrity.This article presents an effective mechanism to preserve the secrecy and integrity of data stored on the public cloud by leveraging blockchain technology,smart contracts,and cryptographic primitives.The proposed approach utilizes a Solidity-based smart contract as an auditor for maintaining and verifying the integrity of outsourced data.To preserve data secrecy,symmetric encryption systems are employed to encrypt user data before outsourcing it.An extensive performance analysis is conducted to illustrate the efficiency of the proposed mechanism.Additionally,a rigorous assessment is conducted to ensure that the developed smart contract is free from vulnerabilities and to measure its associated running costs.The security analysis of the proposed system confirms that our approach can securely maintain the confidentiality and integrity of cloud storage,even in the presence of malicious entities.The proposed mechanism contributes to enhancing data security in cloud computing environments and can be used as a foundation for developing more secure cloud storage systems.展开更多
Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende...Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.展开更多
The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may...The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may also be caused by these personalised requirements.To address the matter,this article develops a personalised data publishing method for multiple SAs.According to the requirements of individuals,the new method partitions SAs values into two categories:private values and public values,and breaks the association between them for privacy guarantees.For the private values,this paper takes the process of anonymisation,while the public values are released without this process.An algorithm is designed to achieve the privacy mode,where the selectivity is determined by the sensitive value frequency and undesirable objects.The experimental results show that the proposed method can provide more information utility when compared with previous methods.The theoretic analyses and experiments also indicate that the privacy can be guaranteed even though the public values are known to an adversary.The overgeneralisation and privacy breach caused by the personalised requirement can be avoided by the new method.展开更多
With the recent technological developments,massive vehicular ad hoc networks(VANETs)have been established,enabling numerous vehicles and their respective Road Side Unit(RSU)components to communicate with oneanother.Th...With the recent technological developments,massive vehicular ad hoc networks(VANETs)have been established,enabling numerous vehicles and their respective Road Side Unit(RSU)components to communicate with oneanother.The best way to enhance traffic flow for vehicles and traffic management departments is to share thedata they receive.There needs to be more protection for the VANET systems.An effective and safe methodof outsourcing is suggested,which reduces computation costs by achieving data security using a homomorphicmapping based on the conjugate operation of matrices.This research proposes a VANET-based data outsourcingsystem to fix the issues.To keep data outsourcing secure,the suggested model takes cryptography models intoaccount.Fog will keep the generated keys for the purpose of vehicle authentication.For controlling and overseeingthe outsourced data while preserving privacy,the suggested approach considers the Trusted Certified Auditor(TCA).Using the secret key,TCA can identify the genuine identity of VANETs when harmful messages aredetected.The proposed model develops a TCA-based unique static vehicle labeling system using cryptography(TCA-USVLC)for secure data outsourcing and privacy preservation in VANETs.The proposed model calculatesthe trust of vehicles in 16 ms for an average of 180 vehicles and achieves 98.6%accuracy for data encryption toprovide security.The proposedmodel achieved 98.5%accuracy in data outsourcing and 98.6%accuracy in privacypreservation in fog-enabled VANETs.Elliptical curve cryptography models can be applied in the future for betterencryption and decryption rates with lightweight cryptography operations.展开更多
Guest Editors Prof.Ling Tian Prof.Jian-Hua Tao University of Electronic Science and Technology of China Tsinghua University lingtian@uestc.edu.cn jhtao@tsinghua.edu.cn Dr.Bin Zhou National University of Defense Techno...Guest Editors Prof.Ling Tian Prof.Jian-Hua Tao University of Electronic Science and Technology of China Tsinghua University lingtian@uestc.edu.cn jhtao@tsinghua.edu.cn Dr.Bin Zhou National University of Defense Technology binzhou@nudt.edu.cn, Since the concept of “Big Data” was first introduced in Nature in 2008, it has been widely applied in fields, such as business, healthcare, national defense, education, transportation, and security. With the maturity of artificial intelligence technology, big data analysis techniques tailored to various fields have made significant progress, but still face many challenges in terms of data quality, algorithms, and computing power.展开更多
Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interp...Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation.展开更多
In source detection in the Tianlai project,locating the interferometric fringe in visibility data accurately will influence downstream tasks drastically,such as physical parameter estimation and weak source exploratio...In source detection in the Tianlai project,locating the interferometric fringe in visibility data accurately will influence downstream tasks drastically,such as physical parameter estimation and weak source exploration.Considering that traditional locating methods are time-consuming and supervised methods require a great quantity of expensive labeled data,in this paper,we first investigate characteristics of interferometric fringes in the simulation and real scenario separately,and integrate an almost parameter-free unsupervised clustering method and seeding filling or eraser algorithm to propose a hierarchical plug and play method to improve location accuracy.Then,we apply our method to locate single and multiple sources’interferometric fringes in simulation data.Next,we apply our method to real data taken from the Tianlai radio telescope array.Finally,we compare with unsupervised methods that are state of the art.These results show that our method has robustness in different scenarios and can improve location measurement accuracy effectively.展开更多
Research data infrastructures form the cornerstone in both cyber and physical spaces,driving the progression of the data-intensive scientific research paradigm.This opinion paper presents an overview of global researc...Research data infrastructures form the cornerstone in both cyber and physical spaces,driving the progression of the data-intensive scientific research paradigm.This opinion paper presents an overview of global research data infrastructure,drawing insights from national roadmaps and strategic documents related to research data infrastructure.It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem.The four new missions of research data infrastructures are:(1)as a pioneer,to transcend the disciplinary border and address complex,cutting-edge scientific and social challenges with problem-and data-oriented insights;(2)as an architect,to establish a digital,intelligent,flexible research and knowledge services environment;(3)as a platform,to foster the high-end academic communication;(4)as a coordinator,to balance scientific openness with ethics needs.展开更多
With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapi...With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.展开更多
The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections an...The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.展开更多
Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore...Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore,data augmentation is crucial for this task.Existing data augmentationmethods often employ pixel-wise transformations,whichmay inadvertently disrupt edge features.In this paper,we propose a data augmentationmethod formonocular depth estimation,which we refer to as the Perpendicular-Cutdepth method.This method involves cutting realworld depth maps along perpendicular directions and pasting them onto input images,thereby diversifying the data without compromising edge features.To validate the effectiveness of the algorithm,we compared it with existing convolutional neural network(CNN)against the current mainstream data augmentation algorithms.Additionally,to verify the algorithm’s applicability to Transformer networks,we designed an encoder-decoder network structure based on Transformer to assess the generalization of our proposed algorithm.Experimental results demonstrate that,in the field of monocular depth estimation,our proposed method,Perpendicular-Cutdepth,outperforms traditional data augmentationmethods.On the indoor dataset NYU,our method increases accuracy from0.900 to 0.907 and reduces the error rate from0.357 to 0.351.On the outdoor dataset KITTI,our method improves accuracy from 0.9638 to 0.9642 and decreases the error rate from 0.060 to 0.0598.展开更多
Traditional Io T systems suffer from high equipment management costs and difficulty in trustworthy data sharing caused by centralization.Blockchain provides a feasible research direction to solve these problems. The m...Traditional Io T systems suffer from high equipment management costs and difficulty in trustworthy data sharing caused by centralization.Blockchain provides a feasible research direction to solve these problems. The main challenge at this stage is to integrate the blockchain from the resourceconstrained Io T devices and ensure the data of Io T system is credible. We provide a general framework for intelligent Io T data acquisition and sharing in an untrusted environment based on the blockchain, where gateways become Oracles. A distributed Oracle network based on Byzantine Fault Tolerant algorithm is used to provide trusted data for the blockchain to make intelligent Io T data trustworthy. An aggregation contract is deployed to collect data from various Oracle and share the credible data to all on-chain users. We also propose a gateway data aggregation scheme based on the REST API event publishing/subscribing mechanism which uses SQL to achieve flexible data aggregation. The experimental results show that the proposed scheme can alleviate the problem of limited performance of Io T equipment, make data reliable, and meet the diverse data needs on the chain.展开更多
Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary w...Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys.展开更多
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims...Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.展开更多
Data is regarded as a valuable asset,and sharing data is a prerequisite for fully exploiting the value of data.However,the current medical data sharing scheme lacks a fair incentive mechanism,and the authenticity of d...Data is regarded as a valuable asset,and sharing data is a prerequisite for fully exploiting the value of data.However,the current medical data sharing scheme lacks a fair incentive mechanism,and the authenticity of data cannot be guaranteed,resulting in low enthusiasm of participants.A fair and trusted medical data trading scheme based on smart contracts is proposed,which aims to encourage participants to be honest and improve their enthusiasm for participation.The scheme uses zero-knowledge range proof for trusted verification,verifies the authenticity of the patient’s data and the specific attributes of the data before the transaction,and realizes privacy protection.At the same time,the game pricing strategy selects the best revenue strategy for all parties involved and realizes the fairness and incentive of the transaction price.The smart contract is used to complete the verification and game bargaining process,and the blockchain is used as a distributed ledger to record the medical data transaction process to prevent data tampering and transaction denial.Finally,by deploying smart contracts on the Ethereum test network and conducting experiments and theoretical calculations,it is proved that the transaction scheme achieves trusted verification and fair bargaining while ensuring privacy protection in a decentralized environment.The experimental results show that the model improves the credibility and fairness of medical data transactions,maximizes social benefits,encourages more patients and medical institutions to participate in the circulation of medical data,and more fully taps the potential value of medical data.展开更多
基金supported by the EU H2020 Research and Innovation Program under the Marie Sklodowska-Curie Grant Agreement(Project-DEEP,Grant number:101109045)National Key R&D Program of China with Grant number 2018YFB1800804+2 种基金the National Natural Science Foundation of China(Nos.NSFC 61925105,and 62171257)Tsinghua University-China Mobile Communications Group Co.,Ltd,Joint Institutethe Fundamental Research Funds for the Central Universities,China(No.FRF-NP-20-03)。
文摘The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive structure for measuring the worth of data elements,hindering effective navigation of the changing digital environment.This paper aims to fill this research gap by introducing the innovative concept of“data components.”It proposes a graphtheoretic representation model that presents a clear mathematical definition and demonstrates the superiority of data components over traditional processing methods.Additionally,the paper introduces an information measurement model that provides a way to calculate the information entropy of data components and establish their increased informational value.The paper also assesses the value of information,suggesting a pricing mechanism based on its significance.In conclusion,this paper establishes a robust framework for understanding and quantifying the value of implicit information in data,laying the groundwork for future research and practical applications.
基金Key Research and Development and Promotion Program of Henan Province(No.222102210069)Zhongyuan Science and Technology Innovation Leading Talent Project(224200510003)National Natural Science Foundation of China(No.62102449).
文摘Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policymanagement efficiency and difficulty in accurately describing the access control policy. To overcome theseproblems, this paper proposes a big data access control mechanism based on a two-layer permission decisionstructure. This mechanism extends the attribute-based access control (ABAC) model. Business attributes areintroduced in the ABAC model as business constraints between entities. The proposed mechanism implementsa two-layer permission decision structure composed of the inherent attributes of access control entities and thebusiness attributes, which constitute the general permission decision algorithm based on logical calculation andthe business permission decision algorithm based on a bi-directional long short-term memory (BiLSTM) neuralnetwork, respectively. The general permission decision algorithm is used to implement accurate policy decisions,while the business permission decision algorithm implements fuzzy decisions based on the business constraints.The BiLSTM neural network is used to calculate the similarity of the business attributes to realize intelligent,adaptive, and efficient access control permission decisions. Through the two-layer permission decision structure,the complex and diverse big data access control management requirements can be satisfied by considering thesecurity and availability of resources. Experimental results show that the proposed mechanism is effective andreliable. In summary, it can efficiently support the secure sharing of big data resources.
基金sponsored by the National Natural Science Foundation of China under grant number No. 62172353, No. 62302114, No. U20B2046 and No. 62172115Innovation Fund Program of the Engineering Research Center for Integration and Application of Digital Learning Technology of Ministry of Education No.1331007 and No. 1311022+1 种基金Natural Science Foundation of the Jiangsu Higher Education Institutions Grant No. 17KJB520044Six Talent Peaks Project in Jiangsu Province No.XYDXX-108
文摘With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis.
基金This research was funded by the National Nature Sciences Foundation of China(Grant No.42250410321).
文摘Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-quality data consistently.In the power system,the electricity consumption data of some large users cannot be normally collected resulting in missing data,which affects the calculation of power supply and eventually leads to a large error in the daily power line loss rate.For the problem of missing electricity consumption data,this study proposes a group method of data handling(GMDH)based data interpolation method in distribution power networks and applies it in the analysis of actually collected electricity data.First,the dependent and independent variables are defined from the original data,and the upper and lower limits of missing values are determined according to prior knowledge or existing data information.All missing data are randomly interpolated within the upper and lower limits.Then,the GMDH network is established to obtain the optimal complexity model,which is used to predict the missing data to replace the last imputed electricity consumption data.At last,this process is implemented iteratively until the missing values do not change.Under a relatively small noise level(α=0.25),the proposed approach achieves a maximum error of no more than 0.605%.Experimental findings demonstrate the efficacy and feasibility of the proposed approach,which realizes the transformation from incomplete data to complete data.Also,this proposed data interpolation approach provides a strong basis for the electricity theft diagnosis and metering fault analysis of electricity enterprises.
文摘Cloud computing has emerged as a viable alternative to traditional computing infrastructures,offering various benefits.However,the adoption of cloud storage poses significant risks to data secrecy and integrity.This article presents an effective mechanism to preserve the secrecy and integrity of data stored on the public cloud by leveraging blockchain technology,smart contracts,and cryptographic primitives.The proposed approach utilizes a Solidity-based smart contract as an auditor for maintaining and verifying the integrity of outsourced data.To preserve data secrecy,symmetric encryption systems are employed to encrypt user data before outsourcing it.An extensive performance analysis is conducted to illustrate the efficiency of the proposed mechanism.Additionally,a rigorous assessment is conducted to ensure that the developed smart contract is free from vulnerabilities and to measure its associated running costs.The security analysis of the proposed system confirms that our approach can securely maintain the confidentiality and integrity of cloud storage,even in the presence of malicious entities.The proposed mechanism contributes to enhancing data security in cloud computing environments and can be used as a foundation for developing more secure cloud storage systems.
基金This research was financially supported by the Ministry of Trade,Industry,and Energy(MOTIE),Korea,under the“Project for Research and Development with Middle Markets Enterprises and DNA(Data,Network,AI)Universities”(AI-based Safety Assessment and Management System for Concrete Structures)(ReferenceNumber P0024559)supervised by theKorea Institute for Advancement of Technology(KIAT).
文摘Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.
基金Doctoral research start-up fund of Guangxi Normal UniversityGuangzhou Research Institute of Communication University of China Common Construction Project,Sunflower-the Aging Intelligent CommunityGuangxi project of improving Middle-aged/Young teachers'ability,Grant/Award Number:2020KY020323。
文摘The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may also be caused by these personalised requirements.To address the matter,this article develops a personalised data publishing method for multiple SAs.According to the requirements of individuals,the new method partitions SAs values into two categories:private values and public values,and breaks the association between them for privacy guarantees.For the private values,this paper takes the process of anonymisation,while the public values are released without this process.An algorithm is designed to achieve the privacy mode,where the selectivity is determined by the sensitive value frequency and undesirable objects.The experimental results show that the proposed method can provide more information utility when compared with previous methods.The theoretic analyses and experiments also indicate that the privacy can be guaranteed even though the public values are known to an adversary.The overgeneralisation and privacy breach caused by the personalised requirement can be avoided by the new method.
文摘With the recent technological developments,massive vehicular ad hoc networks(VANETs)have been established,enabling numerous vehicles and their respective Road Side Unit(RSU)components to communicate with oneanother.The best way to enhance traffic flow for vehicles and traffic management departments is to share thedata they receive.There needs to be more protection for the VANET systems.An effective and safe methodof outsourcing is suggested,which reduces computation costs by achieving data security using a homomorphicmapping based on the conjugate operation of matrices.This research proposes a VANET-based data outsourcingsystem to fix the issues.To keep data outsourcing secure,the suggested model takes cryptography models intoaccount.Fog will keep the generated keys for the purpose of vehicle authentication.For controlling and overseeingthe outsourced data while preserving privacy,the suggested approach considers the Trusted Certified Auditor(TCA).Using the secret key,TCA can identify the genuine identity of VANETs when harmful messages aredetected.The proposed model develops a TCA-based unique static vehicle labeling system using cryptography(TCA-USVLC)for secure data outsourcing and privacy preservation in VANETs.The proposed model calculatesthe trust of vehicles in 16 ms for an average of 180 vehicles and achieves 98.6%accuracy for data encryption toprovide security.The proposedmodel achieved 98.5%accuracy in data outsourcing and 98.6%accuracy in privacypreservation in fog-enabled VANETs.Elliptical curve cryptography models can be applied in the future for betterencryption and decryption rates with lightweight cryptography operations.
文摘Guest Editors Prof.Ling Tian Prof.Jian-Hua Tao University of Electronic Science and Technology of China Tsinghua University lingtian@uestc.edu.cn jhtao@tsinghua.edu.cn Dr.Bin Zhou National University of Defense Technology binzhou@nudt.edu.cn, Since the concept of “Big Data” was first introduced in Nature in 2008, it has been widely applied in fields, such as business, healthcare, national defense, education, transportation, and security. With the maturity of artificial intelligence technology, big data analysis techniques tailored to various fields have made significant progress, but still face many challenges in terms of data quality, algorithms, and computing power.
基金supported by NOAA JTTI award via Grant #NA21OAR4590165, NOAA GOESR Program funding via Grant #NA16OAR4320115provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA-University of Oklahoma Cooperative Agreement #NA11OAR4320072, U.S. Department of Commercesupported by the National Oceanic and Atmospheric Administration (NOAA) of the U.S. Department of Commerce via Grant #NA18NWS4680063。
文摘Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation.
基金supported by the National Natural Science Foundation of China(NSFC,grant Nos.42172323 and 12371454)。
文摘In source detection in the Tianlai project,locating the interferometric fringe in visibility data accurately will influence downstream tasks drastically,such as physical parameter estimation and weak source exploration.Considering that traditional locating methods are time-consuming and supervised methods require a great quantity of expensive labeled data,in this paper,we first investigate characteristics of interferometric fringes in the simulation and real scenario separately,and integrate an almost parameter-free unsupervised clustering method and seeding filling or eraser algorithm to propose a hierarchical plug and play method to improve location accuracy.Then,we apply our method to locate single and multiple sources’interferometric fringes in simulation data.Next,we apply our method to real data taken from the Tianlai radio telescope array.Finally,we compare with unsupervised methods that are state of the art.These results show that our method has robustness in different scenarios and can improve location measurement accuracy effectively.
基金the National Social Science Fund of China(Grant No.22CTQ031)Special Project on Library Capacity Building of the Chinese Academy of Sciences(Grant No.E2290431).
文摘Research data infrastructures form the cornerstone in both cyber and physical spaces,driving the progression of the data-intensive scientific research paradigm.This opinion paper presents an overview of global research data infrastructure,drawing insights from national roadmaps and strategic documents related to research data infrastructure.It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem.The four new missions of research data infrastructures are:(1)as a pioneer,to transcend the disciplinary border and address complex,cutting-edge scientific and social challenges with problem-and data-oriented insights;(2)as an architect,to establish a digital,intelligent,flexible research and knowledge services environment;(3)as a platform,to foster the high-end academic communication;(4)as a coordinator,to balance scientific openness with ethics needs.
基金supported by China’s National Natural Science Foundation(Nos.62072249,62072056)This work is also funded by the National Science Foundation of Hunan Province(2020JJ2029).
文摘With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.
文摘The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.
基金the Grant of Program for Scientific ResearchInnovation Team in Colleges and Universities of Anhui Province(2022AH010095)The Grant ofScientific Research and Talent Development Foundation of the Hefei University(No.21-22RC15)+2 种基金The Key Research Plan of Anhui Province(No.2022k07020011)The Grant of Anhui Provincial940 CMC,2024,vol.79,no.1Natural Science Foundation,No.2308085MF213The Open Fund of Information Materials andIntelligent Sensing Laboratory of Anhui Province IMIS202205,as well as the AI General ComputingPlatform of Hefei University.
文摘Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore,data augmentation is crucial for this task.Existing data augmentationmethods often employ pixel-wise transformations,whichmay inadvertently disrupt edge features.In this paper,we propose a data augmentationmethod formonocular depth estimation,which we refer to as the Perpendicular-Cutdepth method.This method involves cutting realworld depth maps along perpendicular directions and pasting them onto input images,thereby diversifying the data without compromising edge features.To validate the effectiveness of the algorithm,we compared it with existing convolutional neural network(CNN)against the current mainstream data augmentation algorithms.Additionally,to verify the algorithm’s applicability to Transformer networks,we designed an encoder-decoder network structure based on Transformer to assess the generalization of our proposed algorithm.Experimental results demonstrate that,in the field of monocular depth estimation,our proposed method,Perpendicular-Cutdepth,outperforms traditional data augmentationmethods.On the indoor dataset NYU,our method increases accuracy from0.900 to 0.907 and reduces the error rate from0.357 to 0.351.On the outdoor dataset KITTI,our method improves accuracy from 0.9638 to 0.9642 and decreases the error rate from 0.060 to 0.0598.
基金supported by the open research fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology(Nanjing University of Posts and Telecommunications),Ministry of Education(No.JZNY202114)Postgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYCX210734).
文摘Traditional Io T systems suffer from high equipment management costs and difficulty in trustworthy data sharing caused by centralization.Blockchain provides a feasible research direction to solve these problems. The main challenge at this stage is to integrate the blockchain from the resourceconstrained Io T devices and ensure the data of Io T system is credible. We provide a general framework for intelligent Io T data acquisition and sharing in an untrusted environment based on the blockchain, where gateways become Oracles. A distributed Oracle network based on Byzantine Fault Tolerant algorithm is used to provide trusted data for the blockchain to make intelligent Io T data trustworthy. An aggregation contract is deployed to collect data from various Oracle and share the credible data to all on-chain users. We also propose a gateway data aggregation scheme based on the REST API event publishing/subscribing mechanism which uses SQL to achieve flexible data aggregation. The experimental results show that the proposed scheme can alleviate the problem of limited performance of Io T equipment, make data reliable, and meet the diverse data needs on the chain.
基金Korea Institute of Energy Technology Evaluation and Planning(KETEP)grant funded by the Korea government(Grant No.20214000000140,Graduate School of Convergence for Clean Energy Integrated Power Generation)Korea Basic Science Institute(National Research Facilities and Equipment Center)grant funded by the Ministry of Education(2021R1A6C101A449)the National Research Foundation of Korea grant funded by the Ministry of Science and ICT(2021R1A2C1095139),Republic of Korea。
文摘Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys.
基金This research was funded by the National Natural Science Foundation of China(Grant No.72001190)by the Ministry of Education’s Humanities and Social Science Project via the China Ministry of Education(Grant No.20YJC630173)by Zhejiang A&F University(Grant No.2022LFR062).
文摘Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021201052)。
文摘Data is regarded as a valuable asset,and sharing data is a prerequisite for fully exploiting the value of data.However,the current medical data sharing scheme lacks a fair incentive mechanism,and the authenticity of data cannot be guaranteed,resulting in low enthusiasm of participants.A fair and trusted medical data trading scheme based on smart contracts is proposed,which aims to encourage participants to be honest and improve their enthusiasm for participation.The scheme uses zero-knowledge range proof for trusted verification,verifies the authenticity of the patient’s data and the specific attributes of the data before the transaction,and realizes privacy protection.At the same time,the game pricing strategy selects the best revenue strategy for all parties involved and realizes the fairness and incentive of the transaction price.The smart contract is used to complete the verification and game bargaining process,and the blockchain is used as a distributed ledger to record the medical data transaction process to prevent data tampering and transaction denial.Finally,by deploying smart contracts on the Ethereum test network and conducting experiments and theoretical calculations,it is proved that the transaction scheme achieves trusted verification and fair bargaining while ensuring privacy protection in a decentralized environment.The experimental results show that the model improves the credibility and fairness of medical data transactions,maximizes social benefits,encourages more patients and medical institutions to participate in the circulation of medical data,and more fully taps the potential value of medical data.