Getting insight into the spatiotemporal distribution patterns of knowledge innovation is receiving increasing attention from policymakers and economic research organizations.Many studies use bibliometric data to analy...Getting insight into the spatiotemporal distribution patterns of knowledge innovation is receiving increasing attention from policymakers and economic research organizations.Many studies use bibliometric data to analyze the popularity of certain research topics,well-adopted methodologies,influential authors,and the interrelationships among research disciplines.However,the visual exploration of the patterns of research topics with an emphasis on their spatial and temporal distribution remains challenging.This study combined a Space-Time Cube(STC)and a 3D glyph to represent the complex multivariate bibliographic data.We further implemented a visual design by developing an interactive interface.The effectiveness,understandability,and engagement of ST-Map are evaluated by seven experts in geovisualization.The results suggest that it is promising to use three-dimensional visualization to show the overview and on-demand details on a single screen.展开更多
Ratoon rice,which refers to a second harvest of rice obtained from the regenerated tillers originating from the stubble of the first harvested crop,plays an important role in both food security and agroecology while r...Ratoon rice,which refers to a second harvest of rice obtained from the regenerated tillers originating from the stubble of the first harvested crop,plays an important role in both food security and agroecology while requiring minimal agricultural inputs.However,accurately identifying ratoon rice crops is challenging due to the similarity of its spectral features with other rice cropping systems(e.g.,double rice).Moreover,images with a high spatiotemporal resolution are essential since ratoon rice is generally cultivated in fragmented croplands within regions that frequently exhibit cloudy and rainy weather.In this study,taking Qichun County in Hubei Province,China as an example,we developed a new phenology-based ratoon rice vegetation index(PRVI)for the purpose of ratoon rice mapping at a 30 m spatial resolution using a robust time series generated from Harmonized Landsat and Sentinel-2(HLS)images.The PRVI that incorporated the red,near-infrared,and shortwave infrared 1 bands was developed based on the analysis of spectro-phenological separability and feature selection.Based on actual field samples,the performance of the PRVI for ratoon rice mapping was carefully evaluated by comparing it to several vegetation indices,including normalized difference vegetation index(NDVI),enhanced vegetation index(EVI)and land surface water index(LSWI).The results suggested that the PRVI could sufficiently capture the specific characteristics of ratoon rice,leading to a favorable separability between ratoon rice and other land cover types.Furthermore,the PRVI showed the best performance for identifying ratoon rice in the phenological phases characterized by grain filling and harvesting to tillering of the ratoon crop(GHS-TS2),indicating that only several images are required to obtain an accurate ratoon rice map.Finally,the PRVI performed better than NDVI,EVI,LSWI and their combination at the GHS-TS2 stages,with producer's accuracy and user's accuracy of 92.22 and 89.30%,respectively.These results demonstrate that the proposed PRVI based on HLS data can effectively identify ratoon rice in fragmented croplands at crucial phenological stages,which is promising for identifying the earliest timing of ratoon rice planting and can provide a fundamental dataset for crop management activities.展开更多
Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policy...Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policymanagement efficiency and difficulty in accurately describing the access control policy. To overcome theseproblems, this paper proposes a big data access control mechanism based on a two-layer permission decisionstructure. This mechanism extends the attribute-based access control (ABAC) model. Business attributes areintroduced in the ABAC model as business constraints between entities. The proposed mechanism implementsa two-layer permission decision structure composed of the inherent attributes of access control entities and thebusiness attributes, which constitute the general permission decision algorithm based on logical calculation andthe business permission decision algorithm based on a bi-directional long short-term memory (BiLSTM) neuralnetwork, respectively. The general permission decision algorithm is used to implement accurate policy decisions,while the business permission decision algorithm implements fuzzy decisions based on the business constraints.The BiLSTM neural network is used to calculate the similarity of the business attributes to realize intelligent,adaptive, and efficient access control permission decisions. Through the two-layer permission decision structure,the complex and diverse big data access control management requirements can be satisfied by considering thesecurity and availability of resources. Experimental results show that the proposed mechanism is effective andreliable. In summary, it can efficiently support the secure sharing of big data resources.展开更多
With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for clou...With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis.展开更多
A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow acc...A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow accurately.However,accurately predicting traffic flow at the individual road level is extremely difficult due to the complex interplay of spatial and temporal factors.This paper proposes a technique for predicting short-term traffic flow data using an architecture that utilizes convolutional bidirectional long short-term memory(Conv-BiLSTM)with attention mechanisms.Prior studies neglected to include data pertaining to factors such as holidays,weather conditions,and vehicle types,which are interconnected and significantly impact the accuracy of forecast outcomes.In addition,this research incorporates recurring monthly periodic pattern data that significantly enhances the accuracy of forecast outcomes.The experimental findings demonstrate a performance improvement of 21.68%when incorporating the vehicle type feature.展开更多
Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-s...Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may...The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may also be caused by these personalised requirements.To address the matter,this article develops a personalised data publishing method for multiple SAs.According to the requirements of individuals,the new method partitions SAs values into two categories:private values and public values,and breaks the association between them for privacy guarantees.For the private values,this paper takes the process of anonymisation,while the public values are released without this process.An algorithm is designed to achieve the privacy mode,where the selectivity is determined by the sensitive value frequency and undesirable objects.The experimental results show that the proposed method can provide more information utility when compared with previous methods.The theoretic analyses and experiments also indicate that the privacy can be guaranteed even though the public values are known to an adversary.The overgeneralisation and privacy breach caused by the personalised requirement can be avoided by the new method.展开更多
With the recent technological developments,massive vehicular ad hoc networks(VANETs)have been established,enabling numerous vehicles and their respective Road Side Unit(RSU)components to communicate with oneanother.Th...With the recent technological developments,massive vehicular ad hoc networks(VANETs)have been established,enabling numerous vehicles and their respective Road Side Unit(RSU)components to communicate with oneanother.The best way to enhance traffic flow for vehicles and traffic management departments is to share thedata they receive.There needs to be more protection for the VANET systems.An effective and safe methodof outsourcing is suggested,which reduces computation costs by achieving data security using a homomorphicmapping based on the conjugate operation of matrices.This research proposes a VANET-based data outsourcingsystem to fix the issues.To keep data outsourcing secure,the suggested model takes cryptography models intoaccount.Fog will keep the generated keys for the purpose of vehicle authentication.For controlling and overseeingthe outsourced data while preserving privacy,the suggested approach considers the Trusted Certified Auditor(TCA).Using the secret key,TCA can identify the genuine identity of VANETs when harmful messages aredetected.The proposed model develops a TCA-based unique static vehicle labeling system using cryptography(TCA-USVLC)for secure data outsourcing and privacy preservation in VANETs.The proposed model calculatesthe trust of vehicles in 16 ms for an average of 180 vehicles and achieves 98.6%accuracy for data encryption toprovide security.The proposedmodel achieved 98.5%accuracy in data outsourcing and 98.6%accuracy in privacypreservation in fog-enabled VANETs.Elliptical curve cryptography models can be applied in the future for betterencryption and decryption rates with lightweight cryptography operations.展开更多
Large-scale wireless sensor networks(WSNs)play a critical role in monitoring dangerous scenarios and responding to medical emergencies.However,the inherent instability and error-prone nature of wireless links present ...Large-scale wireless sensor networks(WSNs)play a critical role in monitoring dangerous scenarios and responding to medical emergencies.However,the inherent instability and error-prone nature of wireless links present significant challenges,necessitating efficient data collection and reliable transmission services.This paper addresses the limitations of existing data transmission and recovery protocols by proposing a systematic end-to-end design tailored for medical event-driven cluster-based large-scale WSNs.The primary goal is to enhance the reliability of data collection and transmission services,ensuring a comprehensive and practical approach.Our approach focuses on refining the hop-count-based routing scheme to achieve fairness in forwarding reliability.Additionally,it emphasizes reliable data collection within clusters and establishes robust data transmission over multiple hops.These systematic improvements are designed to optimize the overall performance of the WSN in real-world scenarios.Simulation results of the proposed protocol validate its exceptional performance compared to other prominent data transmission schemes.The evaluation spans varying sensor densities,wireless channel conditions,and packet transmission rates,showcasing the protocol’s superiority in ensuring reliable and efficient data transfer.Our systematic end-to-end design successfully addresses the challenges posed by the instability of wireless links in large-scaleWSNs.By prioritizing fairness,reliability,and efficiency,the proposed protocol demonstrates its efficacy in enhancing data collection and transmission services,thereby offering a valuable contribution to the field of medical event-drivenWSNs.展开更多
Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision...Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision-making across diverse domains. Conversely, Python is indispensable for professional programming due to its versatility, readability, extensive libraries, and robust community support. It enables efficient development, advanced data analysis, data mining, and automation, catering to diverse industries and applications. However, one primary issue when using Microsoft Excel with Python libraries is compatibility and interoperability. While Excel is a widely used tool for data storage and analysis, it may not seamlessly integrate with Python libraries, leading to challenges in reading and writing data, especially in complex or large datasets. Additionally, manipulating Excel files with Python may not always preserve formatting or formulas accurately, potentially affecting data integrity. Moreover, dependency on Excel’s graphical user interface (GUI) for automation can limit scalability and reproducibility compared to Python’s scripting capabilities. This paper covers the integration solution of empowering non-programmers to leverage Python’s capabilities within the familiar Excel environment. This enables users to perform advanced data analysis and automation tasks without requiring extensive programming knowledge. Based on Soliciting feedback from non-programmers who have tested the integration solution, the case study shows how the solution evaluates the ease of implementation, performance, and compatibility of Python with Excel versions.展开更多
Guest Editors Prof.Ling Tian Prof.Jian-Hua Tao University of Electronic Science and Technology of China Tsinghua University lingtian@uestc.edu.cn jhtao@tsinghua.edu.cn Dr.Bin Zhou National University of Defense Techno...Guest Editors Prof.Ling Tian Prof.Jian-Hua Tao University of Electronic Science and Technology of China Tsinghua University lingtian@uestc.edu.cn jhtao@tsinghua.edu.cn Dr.Bin Zhou National University of Defense Technology binzhou@nudt.edu.cn, Since the concept of “Big Data” was first introduced in Nature in 2008, it has been widely applied in fields, such as business, healthcare, national defense, education, transportation, and security. With the maturity of artificial intelligence technology, big data analysis techniques tailored to various fields have made significant progress, but still face many challenges in terms of data quality, algorithms, and computing power.展开更多
Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interp...Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation.展开更多
Achieving a balance between accuracy and efficiency in target detection applications is an important research topic.To detect abnormal targets on power transmission lines at the power edge,this paper proposes an effec...Achieving a balance between accuracy and efficiency in target detection applications is an important research topic.To detect abnormal targets on power transmission lines at the power edge,this paper proposes an effective method for reducing the data bit width of the network for floating-point quantization.By performing exponent prealignment and mantissa shifting operations,this method avoids the frequent alignment operations of standard floating-point data,thereby further reducing the exponent and mantissa bit width input into the training process.This enables training low-data-bit width models with low hardware-resource consumption while maintaining accuracy.Experimental tests were conducted on a dataset of real-world images of abnormal targets on transmission lines.The results indicate that while maintaining accuracy at a basic level,the proposed method can significantly reduce the data bit width compared with single-precision data.This suggests that the proposed method has a marked ability to enhance the real-time detection of abnormal targets in transmission circuits.Furthermore,a qualitative analysis indicated that the proposed quantization method is particularly suitable for hardware architectures that integrate storage and computation and exhibit good transferability.展开更多
Although big data is publicly available on water quality parameters,virtual simulation has not yet been adequately adapted in environmental chemistry research.Digital twin is different from conventional geospatial mod...Although big data is publicly available on water quality parameters,virtual simulation has not yet been adequately adapted in environmental chemistry research.Digital twin is different from conventional geospatial modeling approaches and is particularly useful when systematic laboratory/field experiment is not realistic(e.g.,climate impact and water-related environmental catastrophe)or difficult to design and monitor in a real time(e.g.,pollutant and nutrient cycles in estuaries,soils,and sediments).Data-driven water research could realize early warning and disaster readiness simulations for diverse environmental scenarios,including drinking water contamination.展开更多
Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore...Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore,data augmentation is crucial for this task.Existing data augmentationmethods often employ pixel-wise transformations,whichmay inadvertently disrupt edge features.In this paper,we propose a data augmentationmethod formonocular depth estimation,which we refer to as the Perpendicular-Cutdepth method.This method involves cutting realworld depth maps along perpendicular directions and pasting them onto input images,thereby diversifying the data without compromising edge features.To validate the effectiveness of the algorithm,we compared it with existing convolutional neural network(CNN)against the current mainstream data augmentation algorithms.Additionally,to verify the algorithm’s applicability to Transformer networks,we designed an encoder-decoder network structure based on Transformer to assess the generalization of our proposed algorithm.Experimental results demonstrate that,in the field of monocular depth estimation,our proposed method,Perpendicular-Cutdepth,outperforms traditional data augmentationmethods.On the indoor dataset NYU,our method increases accuracy from0.900 to 0.907 and reduces the error rate from0.357 to 0.351.On the outdoor dataset KITTI,our method improves accuracy from 0.9638 to 0.9642 and decreases the error rate from 0.060 to 0.0598.展开更多
The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections an...The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.展开更多
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims...Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.展开更多
El Nino-Southern Oscillation(ENSO),the leading mode of global interannual variability,usually intensifies the Hadley Circulation(HC),and meanwhile constrains its meridional extension,leading to an equatorward movement...El Nino-Southern Oscillation(ENSO),the leading mode of global interannual variability,usually intensifies the Hadley Circulation(HC),and meanwhile constrains its meridional extension,leading to an equatorward movement of the jet system.Previous studies have investigated the response of HC to ENSO events using different reanalysis datasets and evaluated their capability in capturing the main features of ENSO-associated HC anomalies.However,these studies mainly focused on the global HC,represented by a zonal-mean mass stream function(MSF).Comparatively fewer studies have evaluated HC responses from a regional perspective,partly due to the prerequisite of the Stokes MSF,which prevents us from integrating a regional HC.In this study,we adopt a recently developed technique to construct the three-dimensional structure of HC and evaluate the capability of eight state-of-the-art reanalyses in reproducing the regional HC response to ENSO events.Results show that all eight reanalyses reproduce the spatial structure of HC responses well,with an intensified HC around the central-eastern Pacific but weakened circulations around the Indo-Pacific warm pool and tropical Atlantic.The spatial correlation coefficient of the three-dimensional HC anomalies among the different datasets is always larger than 0.93.However,these datasets may not capture the amplitudes of the HC responses well.This uncertainty is especially large for ENSO-associated equatorially asymmetric HC anomalies,with the maximum amplitude in Climate Forecast System Reanalysis(CFSR)being about 2.7 times the minimum value in the Twentieth Century Reanalysis(20CR).One should be careful when using reanalysis data to evaluate the intensity of ENSO-associated HC anomalies.展开更多
文摘Getting insight into the spatiotemporal distribution patterns of knowledge innovation is receiving increasing attention from policymakers and economic research organizations.Many studies use bibliometric data to analyze the popularity of certain research topics,well-adopted methodologies,influential authors,and the interrelationships among research disciplines.However,the visual exploration of the patterns of research topics with an emphasis on their spatial and temporal distribution remains challenging.This study combined a Space-Time Cube(STC)and a 3D glyph to represent the complex multivariate bibliographic data.We further implemented a visual design by developing an interactive interface.The effectiveness,understandability,and engagement of ST-Map are evaluated by seven experts in geovisualization.The results suggest that it is promising to use three-dimensional visualization to show the overview and on-demand details on a single screen.
基金supported by the National Natural Science Foundation of China(42271360 and 42271399)the Young Elite Scientists Sponsorship Program by China Association for Science and Technology(CAST)(2020QNRC001)the Fundamental Research Funds for the Central Universities,China(2662021JC013,CCNU22QN018)。
文摘Ratoon rice,which refers to a second harvest of rice obtained from the regenerated tillers originating from the stubble of the first harvested crop,plays an important role in both food security and agroecology while requiring minimal agricultural inputs.However,accurately identifying ratoon rice crops is challenging due to the similarity of its spectral features with other rice cropping systems(e.g.,double rice).Moreover,images with a high spatiotemporal resolution are essential since ratoon rice is generally cultivated in fragmented croplands within regions that frequently exhibit cloudy and rainy weather.In this study,taking Qichun County in Hubei Province,China as an example,we developed a new phenology-based ratoon rice vegetation index(PRVI)for the purpose of ratoon rice mapping at a 30 m spatial resolution using a robust time series generated from Harmonized Landsat and Sentinel-2(HLS)images.The PRVI that incorporated the red,near-infrared,and shortwave infrared 1 bands was developed based on the analysis of spectro-phenological separability and feature selection.Based on actual field samples,the performance of the PRVI for ratoon rice mapping was carefully evaluated by comparing it to several vegetation indices,including normalized difference vegetation index(NDVI),enhanced vegetation index(EVI)and land surface water index(LSWI).The results suggested that the PRVI could sufficiently capture the specific characteristics of ratoon rice,leading to a favorable separability between ratoon rice and other land cover types.Furthermore,the PRVI showed the best performance for identifying ratoon rice in the phenological phases characterized by grain filling and harvesting to tillering of the ratoon crop(GHS-TS2),indicating that only several images are required to obtain an accurate ratoon rice map.Finally,the PRVI performed better than NDVI,EVI,LSWI and their combination at the GHS-TS2 stages,with producer's accuracy and user's accuracy of 92.22 and 89.30%,respectively.These results demonstrate that the proposed PRVI based on HLS data can effectively identify ratoon rice in fragmented croplands at crucial phenological stages,which is promising for identifying the earliest timing of ratoon rice planting and can provide a fundamental dataset for crop management activities.
基金Key Research and Development and Promotion Program of Henan Province(No.222102210069)Zhongyuan Science and Technology Innovation Leading Talent Project(224200510003)National Natural Science Foundation of China(No.62102449).
文摘Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policymanagement efficiency and difficulty in accurately describing the access control policy. To overcome theseproblems, this paper proposes a big data access control mechanism based on a two-layer permission decisionstructure. This mechanism extends the attribute-based access control (ABAC) model. Business attributes areintroduced in the ABAC model as business constraints between entities. The proposed mechanism implementsa two-layer permission decision structure composed of the inherent attributes of access control entities and thebusiness attributes, which constitute the general permission decision algorithm based on logical calculation andthe business permission decision algorithm based on a bi-directional long short-term memory (BiLSTM) neuralnetwork, respectively. The general permission decision algorithm is used to implement accurate policy decisions,while the business permission decision algorithm implements fuzzy decisions based on the business constraints.The BiLSTM neural network is used to calculate the similarity of the business attributes to realize intelligent,adaptive, and efficient access control permission decisions. Through the two-layer permission decision structure,the complex and diverse big data access control management requirements can be satisfied by considering thesecurity and availability of resources. Experimental results show that the proposed mechanism is effective andreliable. In summary, it can efficiently support the secure sharing of big data resources.
基金sponsored by the National Natural Science Foundation of China under grant number No. 62172353, No. 62302114, No. U20B2046 and No. 62172115Innovation Fund Program of the Engineering Research Center for Integration and Application of Digital Learning Technology of Ministry of Education No.1331007 and No. 1311022+1 种基金Natural Science Foundation of the Jiangsu Higher Education Institutions Grant No. 17KJB520044Six Talent Peaks Project in Jiangsu Province No.XYDXX-108
文摘With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis.
文摘A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow accurately.However,accurately predicting traffic flow at the individual road level is extremely difficult due to the complex interplay of spatial and temporal factors.This paper proposes a technique for predicting short-term traffic flow data using an architecture that utilizes convolutional bidirectional long short-term memory(Conv-BiLSTM)with attention mechanisms.Prior studies neglected to include data pertaining to factors such as holidays,weather conditions,and vehicle types,which are interconnected and significantly impact the accuracy of forecast outcomes.In addition,this research incorporates recurring monthly periodic pattern data that significantly enhances the accuracy of forecast outcomes.The experimental findings demonstrate a performance improvement of 21.68%when incorporating the vehicle type feature.
基金funding within the Wheat BigData Project(German Federal Ministry of Food and Agriculture,FKZ2818408B18)。
文摘Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
基金Doctoral research start-up fund of Guangxi Normal UniversityGuangzhou Research Institute of Communication University of China Common Construction Project,Sunflower-the Aging Intelligent CommunityGuangxi project of improving Middle-aged/Young teachers'ability,Grant/Award Number:2020KY020323。
文摘The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may also be caused by these personalised requirements.To address the matter,this article develops a personalised data publishing method for multiple SAs.According to the requirements of individuals,the new method partitions SAs values into two categories:private values and public values,and breaks the association between them for privacy guarantees.For the private values,this paper takes the process of anonymisation,while the public values are released without this process.An algorithm is designed to achieve the privacy mode,where the selectivity is determined by the sensitive value frequency and undesirable objects.The experimental results show that the proposed method can provide more information utility when compared with previous methods.The theoretic analyses and experiments also indicate that the privacy can be guaranteed even though the public values are known to an adversary.The overgeneralisation and privacy breach caused by the personalised requirement can be avoided by the new method.
文摘With the recent technological developments,massive vehicular ad hoc networks(VANETs)have been established,enabling numerous vehicles and their respective Road Side Unit(RSU)components to communicate with oneanother.The best way to enhance traffic flow for vehicles and traffic management departments is to share thedata they receive.There needs to be more protection for the VANET systems.An effective and safe methodof outsourcing is suggested,which reduces computation costs by achieving data security using a homomorphicmapping based on the conjugate operation of matrices.This research proposes a VANET-based data outsourcingsystem to fix the issues.To keep data outsourcing secure,the suggested model takes cryptography models intoaccount.Fog will keep the generated keys for the purpose of vehicle authentication.For controlling and overseeingthe outsourced data while preserving privacy,the suggested approach considers the Trusted Certified Auditor(TCA).Using the secret key,TCA can identify the genuine identity of VANETs when harmful messages aredetected.The proposed model develops a TCA-based unique static vehicle labeling system using cryptography(TCA-USVLC)for secure data outsourcing and privacy preservation in VANETs.The proposed model calculatesthe trust of vehicles in 16 ms for an average of 180 vehicles and achieves 98.6%accuracy for data encryption toprovide security.The proposedmodel achieved 98.5%accuracy in data outsourcing and 98.6%accuracy in privacypreservation in fog-enabled VANETs.Elliptical curve cryptography models can be applied in the future for betterencryption and decryption rates with lightweight cryptography operations.
文摘Large-scale wireless sensor networks(WSNs)play a critical role in monitoring dangerous scenarios and responding to medical emergencies.However,the inherent instability and error-prone nature of wireless links present significant challenges,necessitating efficient data collection and reliable transmission services.This paper addresses the limitations of existing data transmission and recovery protocols by proposing a systematic end-to-end design tailored for medical event-driven cluster-based large-scale WSNs.The primary goal is to enhance the reliability of data collection and transmission services,ensuring a comprehensive and practical approach.Our approach focuses on refining the hop-count-based routing scheme to achieve fairness in forwarding reliability.Additionally,it emphasizes reliable data collection within clusters and establishes robust data transmission over multiple hops.These systematic improvements are designed to optimize the overall performance of the WSN in real-world scenarios.Simulation results of the proposed protocol validate its exceptional performance compared to other prominent data transmission schemes.The evaluation spans varying sensor densities,wireless channel conditions,and packet transmission rates,showcasing the protocol’s superiority in ensuring reliable and efficient data transfer.Our systematic end-to-end design successfully addresses the challenges posed by the instability of wireless links in large-scaleWSNs.By prioritizing fairness,reliability,and efficiency,the proposed protocol demonstrates its efficacy in enhancing data collection and transmission services,thereby offering a valuable contribution to the field of medical event-drivenWSNs.
文摘Microsoft Excel is essential for the End-User Approach (EUA), offering versatility in data organization, analysis, and visualization, as well as widespread accessibility. It fosters collaboration and informed decision-making across diverse domains. Conversely, Python is indispensable for professional programming due to its versatility, readability, extensive libraries, and robust community support. It enables efficient development, advanced data analysis, data mining, and automation, catering to diverse industries and applications. However, one primary issue when using Microsoft Excel with Python libraries is compatibility and interoperability. While Excel is a widely used tool for data storage and analysis, it may not seamlessly integrate with Python libraries, leading to challenges in reading and writing data, especially in complex or large datasets. Additionally, manipulating Excel files with Python may not always preserve formatting or formulas accurately, potentially affecting data integrity. Moreover, dependency on Excel’s graphical user interface (GUI) for automation can limit scalability and reproducibility compared to Python’s scripting capabilities. This paper covers the integration solution of empowering non-programmers to leverage Python’s capabilities within the familiar Excel environment. This enables users to perform advanced data analysis and automation tasks without requiring extensive programming knowledge. Based on Soliciting feedback from non-programmers who have tested the integration solution, the case study shows how the solution evaluates the ease of implementation, performance, and compatibility of Python with Excel versions.
文摘Guest Editors Prof.Ling Tian Prof.Jian-Hua Tao University of Electronic Science and Technology of China Tsinghua University lingtian@uestc.edu.cn jhtao@tsinghua.edu.cn Dr.Bin Zhou National University of Defense Technology binzhou@nudt.edu.cn, Since the concept of “Big Data” was first introduced in Nature in 2008, it has been widely applied in fields, such as business, healthcare, national defense, education, transportation, and security. With the maturity of artificial intelligence technology, big data analysis techniques tailored to various fields have made significant progress, but still face many challenges in terms of data quality, algorithms, and computing power.
基金supported by NOAA JTTI award via Grant #NA21OAR4590165, NOAA GOESR Program funding via Grant #NA16OAR4320115provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA-University of Oklahoma Cooperative Agreement #NA11OAR4320072, U.S. Department of Commercesupported by the National Oceanic and Atmospheric Administration (NOAA) of the U.S. Department of Commerce via Grant #NA18NWS4680063。
文摘Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation.
基金supported by State Grid Corporation Basic Foresight Project(5700-202255308A-2-0-QZ).
文摘Achieving a balance between accuracy and efficiency in target detection applications is an important research topic.To detect abnormal targets on power transmission lines at the power edge,this paper proposes an effective method for reducing the data bit width of the network for floating-point quantization.By performing exponent prealignment and mantissa shifting operations,this method avoids the frequent alignment operations of standard floating-point data,thereby further reducing the exponent and mantissa bit width input into the training process.This enables training low-data-bit width models with low hardware-resource consumption while maintaining accuracy.Experimental tests were conducted on a dataset of real-world images of abnormal targets on transmission lines.The results indicate that while maintaining accuracy at a basic level,the proposed method can significantly reduce the data bit width compared with single-precision data.This suggests that the proposed method has a marked ability to enhance the real-time detection of abnormal targets in transmission circuits.Furthermore,a qualitative analysis indicated that the proposed quantization method is particularly suitable for hardware architectures that integrate storage and computation and exhibit good transferability.
文摘Although big data is publicly available on water quality parameters,virtual simulation has not yet been adequately adapted in environmental chemistry research.Digital twin is different from conventional geospatial modeling approaches and is particularly useful when systematic laboratory/field experiment is not realistic(e.g.,climate impact and water-related environmental catastrophe)or difficult to design and monitor in a real time(e.g.,pollutant and nutrient cycles in estuaries,soils,and sediments).Data-driven water research could realize early warning and disaster readiness simulations for diverse environmental scenarios,including drinking water contamination.
基金the Grant of Program for Scientific ResearchInnovation Team in Colleges and Universities of Anhui Province(2022AH010095)The Grant ofScientific Research and Talent Development Foundation of the Hefei University(No.21-22RC15)+2 种基金The Key Research Plan of Anhui Province(No.2022k07020011)The Grant of Anhui Provincial940 CMC,2024,vol.79,no.1Natural Science Foundation,No.2308085MF213The Open Fund of Information Materials andIntelligent Sensing Laboratory of Anhui Province IMIS202205,as well as the AI General ComputingPlatform of Hefei University.
文摘Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore,data augmentation is crucial for this task.Existing data augmentationmethods often employ pixel-wise transformations,whichmay inadvertently disrupt edge features.In this paper,we propose a data augmentationmethod formonocular depth estimation,which we refer to as the Perpendicular-Cutdepth method.This method involves cutting realworld depth maps along perpendicular directions and pasting them onto input images,thereby diversifying the data without compromising edge features.To validate the effectiveness of the algorithm,we compared it with existing convolutional neural network(CNN)against the current mainstream data augmentation algorithms.Additionally,to verify the algorithm’s applicability to Transformer networks,we designed an encoder-decoder network structure based on Transformer to assess the generalization of our proposed algorithm.Experimental results demonstrate that,in the field of monocular depth estimation,our proposed method,Perpendicular-Cutdepth,outperforms traditional data augmentationmethods.On the indoor dataset NYU,our method increases accuracy from0.900 to 0.907 and reduces the error rate from0.357 to 0.351.On the outdoor dataset KITTI,our method improves accuracy from 0.9638 to 0.9642 and decreases the error rate from 0.060 to 0.0598.
文摘The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.
基金This research was funded by the National Natural Science Foundation of China(Grant No.72001190)by the Ministry of Education’s Humanities and Social Science Project via the China Ministry of Education(Grant No.20YJC630173)by Zhejiang A&F University(Grant No.2022LFR062).
文摘Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.
基金supported by the National Key Research and Development Program of China(Grant No.2018YFA0605703)the National Natural Science Foundation of China(Grant Nos.42176243,41976193 and 41676190)supported by National Natural Science Foundation of China(Grant No.41975079)。
文摘El Nino-Southern Oscillation(ENSO),the leading mode of global interannual variability,usually intensifies the Hadley Circulation(HC),and meanwhile constrains its meridional extension,leading to an equatorward movement of the jet system.Previous studies have investigated the response of HC to ENSO events using different reanalysis datasets and evaluated their capability in capturing the main features of ENSO-associated HC anomalies.However,these studies mainly focused on the global HC,represented by a zonal-mean mass stream function(MSF).Comparatively fewer studies have evaluated HC responses from a regional perspective,partly due to the prerequisite of the Stokes MSF,which prevents us from integrating a regional HC.In this study,we adopt a recently developed technique to construct the three-dimensional structure of HC and evaluate the capability of eight state-of-the-art reanalyses in reproducing the regional HC response to ENSO events.Results show that all eight reanalyses reproduce the spatial structure of HC responses well,with an intensified HC around the central-eastern Pacific but weakened circulations around the Indo-Pacific warm pool and tropical Atlantic.The spatial correlation coefficient of the three-dimensional HC anomalies among the different datasets is always larger than 0.93.However,these datasets may not capture the amplitudes of the HC responses well.This uncertainty is especially large for ENSO-associated equatorially asymmetric HC anomalies,with the maximum amplitude in Climate Forecast System Reanalysis(CFSR)being about 2.7 times the minimum value in the Twentieth Century Reanalysis(20CR).One should be careful when using reanalysis data to evaluate the intensity of ENSO-associated HC anomalies.