Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for rese...Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.展开更多
Ocean temperature is an important physical variable in marine ecosystems,and ocean temperature prediction is an important research objective in ocean-related fields.Currently,one of the commonly used methods for ocean...Ocean temperature is an important physical variable in marine ecosystems,and ocean temperature prediction is an important research objective in ocean-related fields.Currently,one of the commonly used methods for ocean temperature prediction is based on data-driven,but research on this method is mostly limited to the sea surface,with few studies on the prediction of internal ocean temperature.Existing graph neural network-based methods usually use predefined graphs or learned static graphs,which cannot capture the dynamic associations among data.In this study,we propose a novel dynamic spatiotemporal graph neural network(DSTGN)to predict threedimensional ocean temperature(3D-OT),which combines static graph learning and dynamic graph learning to automatically mine two unknown dependencies between sequences based on the original 3D-OT data without prior knowledge.Temporal and spatial dependencies in the time series were then captured using temporal and graph convolutions.We also integrated dynamic graph learning,static graph learning,graph convolution,and temporal convolution into an end-to-end framework for 3D-OT prediction using time-series grid data.In this study,we conducted prediction experiments using high-resolution 3D-OT from the Copernicus global ocean physical reanalysis,with data covering the vertical variation of temperature from the sea surface to 1000 m below the sea surface.We compared five mainstream models that are commonly used for ocean temperature prediction,and the results showed that the method achieved the best prediction results at all prediction scales.展开更多
The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size ...The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace.Therefore,this paper initiates a discussion on graph signal processing for large-scale data analysis.We first provide a comprehensive overview of core ideas in Graph signal processing(GSP)and their connection to conventional digital signal processing(DSP).We then summarize recent developments in developing basic GSP tools,including methods for graph filtering or graph learning,graph signal,graph Fourier transform(GFT),spectrum,graph frequency,etc.Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise.We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets.To show the suitability and the effeteness,we first created a noisy graph signal and then applied it to the filter.After several rounds of simulation results.We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal.By using this example application,we thoroughly demonstrated that graph filtration is efficient for big data analytics.展开更多
Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments ...Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments on galaxy interactions.We identify the galaxies in filaments and sheets using the local dimension and also find the major pairs residing in these environments.The star formation rate(SFR) and color of the interacting galaxies as a function of pair separation are separately analyzed in filaments and sheets.The analysis is repeated for three volume limited samples covering different magnitude ranges.The major pairs residing in the filaments show a significantly higher SFR and bluer color than those residing in the sheets up to the projected pair separation of~50 kpc.We observe a complete reversal of this behavior for both the SFR and color of the galaxy pairs having a projected separation larger than 50 kpc.Some earlier studies report that the galaxy pairs align with the filament axis.Such alignment inside filaments indicates anisotropic accretion that may cause these differences.We do not observe these trends in the brighter galaxy samples.The pairs in filaments and sheets from the brighter galaxy samples trace relatively denser regions in these environments.The absence of these trends in the brighter samples may be explained by the dominant effect of the local density over the effects of the large-scale environment.展开更多
The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to ...The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to withstand malicious cyberattacks.To meet the high hardware resource requirements,address the vulnerability to network attacks and poor reliability in the tradi-tional centralized data storage schemes,this paper proposes a secure storage management method for microgrid data that considers node trust and directed acyclic graph(DAG)consensus mechanism.Firstly,the microgrid data storage model is designed based on the edge computing technology.The blockchain,deployed on the edge computing server and combined with cloud storage,ensures reliable data storage in the microgrid.Secondly,a blockchain consen-sus algorithm based on directed acyclic graph data structure is then proposed to effectively improve the data storage timeliness and avoid disadvantages in traditional blockchain topology such as long chain construction time and low consensus efficiency.Finally,considering the tolerance differences among the candidate chain-building nodes to network attacks,a hash value update mechanism of blockchain header with node trust identification to ensure data storage security is proposed.Experimental results from the microgrid data storage platform show that the proposed method can achieve a private key update time of less than 5 milliseconds.When the number of blockchain nodes is less than 25,the blockchain construction takes no more than 80 mins,and the data throughput is close to 300 kbps.Compared with the traditional chain-topology-based consensus methods that do not consider node trust,the proposed method has higher efficiency in data storage and better resistance to network attacks.展开更多
Bitcoin is widely used as the most classic electronic currency for various electronic services such as exchanges,gambling,marketplaces,and also scams such as high-yield investment projects.Identifying the services ope...Bitcoin is widely used as the most classic electronic currency for various electronic services such as exchanges,gambling,marketplaces,and also scams such as high-yield investment projects.Identifying the services operated by a Bitcoin address can help determine the risk level of that address and build an alert model accordingly.Feature engineering can also be used to flesh out labeled addresses and to analyze the current state of Bitcoin in a small way.In this paper,we address the problem of identifying multiple classes of Bitcoin services,and for the poor classification of individual addresses that do not have significant features,we propose a Bitcoin address identification scheme based on joint multi-model prediction using the mapping relationship between addresses and entities.The innovation of the method is to(1)Extract as many valuable features as possible when an address is given to facilitate the multi-class service identification task.(2)Unlike the general supervised model approach,this paper proposes a joint prediction scheme for multiple learners based on address-entity mapping relationships.Specifically,after obtaining the overall features,the address classification and entity clustering tasks are performed separately,and the results are subjected to graph-basedmaximization consensus.The final result ismade to baseline the individual address classification results while satisfying the constraint of having similarly behaving entities as far as possible.By testing and evaluating over 26,000 Bitcoin addresses,our feature extraction method captures more useful features.In addition,the combined multi-learner model obtained results that exceeded the baseline classifier reaching an accuracy of 77.4%.展开更多
Quality management is a constant and significant concern in enterprises.Effective determination of correct solutions for comprehensive problems helps avoid increased backtesting costs.This study proposes an intelligen...Quality management is a constant and significant concern in enterprises.Effective determination of correct solutions for comprehensive problems helps avoid increased backtesting costs.This study proposes an intelligent quality control method for manufacturing processes based on a human–cyber–physical(HCP)knowledge graph,which is a systematic method that encompasses the following elements:data management and classification based on HCP ternary data,HCP ontology construction,knowledge extraction for constructing an HCP knowledge graph,and comprehensive application of quality control based on HCP knowledge.The proposed method implements case retrieval,automatic analysis,and assisted decision making based on an HCP knowledge graph,enabling quality monitoring,inspection,diagnosis,and maintenance strategies for quality control.In practical applications,the proposed modular and hierarchical HCP ontology exhibits significant superiority in terms of shareability and reusability of the acquired knowledge.Moreover,the HCP knowledge graph deeply integrates the provided HCP data and effectively supports comprehensive decision making.The proposed method was implemented in cases involving an automotive production line and a gear manufacturing process,and the effectiveness of the method was verified by the application system deployed.Furthermore,the proposed method can be extended to other manufacturing process quality control tasks.展开更多
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes...Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.展开更多
Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effectiv...Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>展开更多
In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore,...In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.展开更多
This paper proposes a Graph regularized Lpsmooth non-negative matrix factorization(GSNMF) method by incorporating graph regularization and L_p smoothing constraint, which considers the intrinsic geometric information ...This paper proposes a Graph regularized Lpsmooth non-negative matrix factorization(GSNMF) method by incorporating graph regularization and L_p smoothing constraint, which considers the intrinsic geometric information of a data set and produces smooth and stable solutions. The main contributions are as follows: first, graph regularization is added into NMF to discover the hidden semantics and simultaneously respect the intrinsic geometric structure information of a data set. Second,the Lpsmoothing constraint is incorporated into NMF to combine the merits of isotropic(L_2-norm) and anisotropic(L_1-norm)diffusion smoothing, and produces a smooth and more accurate solution to the optimization problem. Finally, the update rules and proof of convergence of GSNMF are given. Experiments on several data sets show that the proposed method outperforms related state-of-the-art methods.展开更多
Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop...Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop's distributed system was selected to solve the storage problem of massive forestry big data and the memory-based Spark computing framework to realize real-time and fast processing of data.The forestry data contains a wealth of information,and mining this information is of great significance for guiding the development of forestry.We conducts co-word and cluster analyses on the keywords of forestry data,extracts the rules hidden in the data,analyzes the research hotspots more accurately,grasps the evolution trend of subject topics,and plays an important role in promoting the research and development of subject areas.The co-word analysis and clustering algorithm have important practical significance for the topic structure,research hotspot or development trend in the field of forestry research.Distributed storage framework and parallel computing have greatly improved the performance of data mining algorithms.Therefore,the forestry big data mining system by big data technology has important practical significance for promoting the development of intelligent forestry.展开更多
Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional sp...Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory. Key words outlier - detection - three-dimensional space graph - data mining CLC number TP 311. 13 - TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: ZHANG Jing (1975-), female, Ph. D, lecturer, research direction: data mining and knowledge discovery.展开更多
Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data vis...Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data visualization refers to data that is presented in a visual form,such as a chart or map,to help people understand the meaning of the data.Data visualization helps people extract meaning from data quickly and easily.Visualization can be used to fully demonstrate the patterns,trends,and dependencies of your data,which can be found in other displays.Big data visualization analysis combines the advantages of computers,which can be static or interactive,interactive analysis methods and interactive technologies,which can directly help people and effectively understand the information behind big data.It is indispensable in the era of big data visualization,and it can be very intuitive if used properly.Graphical analysis also found that valuable information becomes a powerful tool in complex data relationships,and it represents a significant business opportunity.With the rise of big data,important technologies suitable for dealing with complex relationships have emerged.Graphics come in a variety of shapes and sizes for a variety of business problems.Graphic analysis is first in the visualization.The step is to get the right data and answer the goal.In short,to choose the right method,you must understand each relative strengths and weaknesses and understand the data.Key steps to get data:target;collect;clean;connect.展开更多
With the purpose of making calculation more efficient in practical hydraulic simulations, an improved algorithm was proposed and was applied in the practical water distribution field. This methodology was developed by...With the purpose of making calculation more efficient in practical hydraulic simulations, an improved algorithm was proposed and was applied in the practical water distribution field. This methodology was developed by expanding the traditional loop-equation theory through utilization of the advantages of the graph theory in efficiency. The utilization of the spanning tree technique from graph theory makes the proposed algorithm efficient in calculation and simple to use for computer coding. The algorithms for topological generation and practical implementations are presented in detail in this paper. Through the application to a practical urban system, the consumption of the CPU time and computation memory were decreased while the accuracy was greatly enhanced compared with the present existing methods.展开更多
This study aimed at investigating the characteristics of table and graph that people perceive and the data types which people consider the two displays are most appropriate for. Participants in this survey were 195 te...This study aimed at investigating the characteristics of table and graph that people perceive and the data types which people consider the two displays are most appropriate for. Participants in this survey were 195 teachers and undergraduates from four universities in Beijing. The results showed people's different attitudes towards the two forms of display.展开更多
Purpose: Our work seeks to overcome data quality issues related to incomplete author affiliation data in bibliographic records in order to support accurate and reliable measurement of international research collaborat...Purpose: Our work seeks to overcome data quality issues related to incomplete author affiliation data in bibliographic records in order to support accurate and reliable measurement of international research collaboration(IRC).Design/methodology/approch: We propose, implement, and evaluate a method that leverages the Web-based knowledge graph Wikidata to resolve publication affiliation data to particular countries. The method is tested with general and domain-specific data sets.Findings: Our evaluation covers the magnitude of improvement, accuracy, and consistency. Results suggest the method is beneficial, reliable, and consistent, and thus a viable and improved approach to measuring IRC.Research limitations: Though our evaluation suggests the method works with both general and domain-specific bibliographic data sets, it may perform differently with data sets not tested here. Further limitations stem from the use of the R programming language and R libraries for country identification as well as imbalanced data coverage and quality in Wikidata that may also change over time.Practical implications: The new method helps to increase the accuracy in IRC studies and provides a basis for further development into a general tool that enriches bibliographic data using the Wikidata knowledge graph.Originality: This is the first attempt to enrich bibliographic data using a peer-produced, Webbased knowledge graph like Wikidata.展开更多
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify pat...In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify patterns, trends and relationship within the data. A mathematical model for the graph layout problem is deduced and a spectral graph drawing algorithm for visualizing multivariate categorical data is proposed. The experiments show that the drawings by the algorithm well capture the structures of multivariate categorical data and the computing speed is fast.展开更多
文摘Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.
基金The National Key R&D Program of China under contract No.2021YFC3101603.
文摘Ocean temperature is an important physical variable in marine ecosystems,and ocean temperature prediction is an important research objective in ocean-related fields.Currently,one of the commonly used methods for ocean temperature prediction is based on data-driven,but research on this method is mostly limited to the sea surface,with few studies on the prediction of internal ocean temperature.Existing graph neural network-based methods usually use predefined graphs or learned static graphs,which cannot capture the dynamic associations among data.In this study,we propose a novel dynamic spatiotemporal graph neural network(DSTGN)to predict threedimensional ocean temperature(3D-OT),which combines static graph learning and dynamic graph learning to automatically mine two unknown dependencies between sequences based on the original 3D-OT data without prior knowledge.Temporal and spatial dependencies in the time series were then captured using temporal and graph convolutions.We also integrated dynamic graph learning,static graph learning,graph convolution,and temporal convolution into an end-to-end framework for 3D-OT prediction using time-series grid data.In this study,we conducted prediction experiments using high-resolution 3D-OT from the Copernicus global ocean physical reanalysis,with data covering the vertical variation of temperature from the sea surface to 1000 m below the sea surface.We compared five mainstream models that are commonly used for ocean temperature prediction,and the results showed that the method achieved the best prediction results at all prediction scales.
基金supported in part by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2019R1A2C1006159)and(NRF-2021R1A6A1A03039493)by the 2021 Yeungnam University Research Grant.
文摘The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace.Therefore,this paper initiates a discussion on graph signal processing for large-scale data analysis.We first provide a comprehensive overview of core ideas in Graph signal processing(GSP)and their connection to conventional digital signal processing(DSP).We then summarize recent developments in developing basic GSP tools,including methods for graph filtering or graph learning,graph signal,graph Fourier transform(GFT),spectrum,graph frequency,etc.Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise.We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets.To show the suitability and the effeteness,we first created a noisy graph signal and then applied it to the filter.After several rounds of simulation results.We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal.By using this example application,we thoroughly demonstrated that graph filtration is efficient for big data analytics.
基金financial support from the SERB,DST,Government of India through the project CRG/2019/001110IUCAA,Pune for providing support through an associateship program+1 种基金IISER Tirupati for support through a postdoctoral fellowshipFunding for the SDSS and SDSS-Ⅱhas been provided by the Alfred P.Sloan Foundation,the U.S.Department of Energy,the National Aeronautics and Space Administration,the Japanese Monbukagakusho,the Max Planck Society,and the Higher Education Funding Council for England。
文摘Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments on galaxy interactions.We identify the galaxies in filaments and sheets using the local dimension and also find the major pairs residing in these environments.The star formation rate(SFR) and color of the interacting galaxies as a function of pair separation are separately analyzed in filaments and sheets.The analysis is repeated for three volume limited samples covering different magnitude ranges.The major pairs residing in the filaments show a significantly higher SFR and bluer color than those residing in the sheets up to the projected pair separation of~50 kpc.We observe a complete reversal of this behavior for both the SFR and color of the galaxy pairs having a projected separation larger than 50 kpc.Some earlier studies report that the galaxy pairs align with the filament axis.Such alignment inside filaments indicates anisotropic accretion that may cause these differences.We do not observe these trends in the brighter galaxy samples.The pairs in filaments and sheets from the brighter galaxy samples trace relatively denser regions in these environments.The absence of these trends in the brighter samples may be explained by the dominant effect of the local density over the effects of the large-scale environment.
文摘The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to withstand malicious cyberattacks.To meet the high hardware resource requirements,address the vulnerability to network attacks and poor reliability in the tradi-tional centralized data storage schemes,this paper proposes a secure storage management method for microgrid data that considers node trust and directed acyclic graph(DAG)consensus mechanism.Firstly,the microgrid data storage model is designed based on the edge computing technology.The blockchain,deployed on the edge computing server and combined with cloud storage,ensures reliable data storage in the microgrid.Secondly,a blockchain consen-sus algorithm based on directed acyclic graph data structure is then proposed to effectively improve the data storage timeliness and avoid disadvantages in traditional blockchain topology such as long chain construction time and low consensus efficiency.Finally,considering the tolerance differences among the candidate chain-building nodes to network attacks,a hash value update mechanism of blockchain header with node trust identification to ensure data storage security is proposed.Experimental results from the microgrid data storage platform show that the proposed method can achieve a private key update time of less than 5 milliseconds.When the number of blockchain nodes is less than 25,the blockchain construction takes no more than 80 mins,and the data throughput is close to 300 kbps.Compared with the traditional chain-topology-based consensus methods that do not consider node trust,the proposed method has higher efficiency in data storage and better resistance to network attacks.
基金sponsored by the National Natural Science Foundation of China Nos.62172353,62302114 and U20B2046Future Network Scientific Research Fund Project No.FNSRFP-2021-YB-48Innovation Fund Program of the Engineering Research Center for Integration and Application of Digital Learning Technology of Ministry of Education No.1221045。
文摘Bitcoin is widely used as the most classic electronic currency for various electronic services such as exchanges,gambling,marketplaces,and also scams such as high-yield investment projects.Identifying the services operated by a Bitcoin address can help determine the risk level of that address and build an alert model accordingly.Feature engineering can also be used to flesh out labeled addresses and to analyze the current state of Bitcoin in a small way.In this paper,we address the problem of identifying multiple classes of Bitcoin services,and for the poor classification of individual addresses that do not have significant features,we propose a Bitcoin address identification scheme based on joint multi-model prediction using the mapping relationship between addresses and entities.The innovation of the method is to(1)Extract as many valuable features as possible when an address is given to facilitate the multi-class service identification task.(2)Unlike the general supervised model approach,this paper proposes a joint prediction scheme for multiple learners based on address-entity mapping relationships.Specifically,after obtaining the overall features,the address classification and entity clustering tasks are performed separately,and the results are subjected to graph-basedmaximization consensus.The final result ismade to baseline the individual address classification results while satisfying the constraint of having similarly behaving entities as far as possible.By testing and evaluating over 26,000 Bitcoin addresses,our feature extraction method captures more useful features.In addition,the combined multi-learner model obtained results that exceeded the baseline classifier reaching an accuracy of 77.4%.
基金supported by the National Science and Technology Innovation 2030 of China Next-Generation Artificial Intelligence Major Project(2018AAA0101800)the National Natural Science Foundation of China(52375482)the Regional Innovation Cooperation Project of Sichuan Province(2023YFQ0019).
文摘Quality management is a constant and significant concern in enterprises.Effective determination of correct solutions for comprehensive problems helps avoid increased backtesting costs.This study proposes an intelligent quality control method for manufacturing processes based on a human–cyber–physical(HCP)knowledge graph,which is a systematic method that encompasses the following elements:data management and classification based on HCP ternary data,HCP ontology construction,knowledge extraction for constructing an HCP knowledge graph,and comprehensive application of quality control based on HCP knowledge.The proposed method implements case retrieval,automatic analysis,and assisted decision making based on an HCP knowledge graph,enabling quality monitoring,inspection,diagnosis,and maintenance strategies for quality control.In practical applications,the proposed modular and hierarchical HCP ontology exhibits significant superiority in terms of shareability and reusability of the acquired knowledge.Moreover,the HCP knowledge graph deeply integrates the provided HCP data and effectively supports comprehensive decision making.The proposed method was implemented in cases involving an automotive production line and a gear manufacturing process,and the effectiveness of the method was verified by the application system deployed.Furthermore,the proposed method can be extended to other manufacturing process quality control tasks.
基金Supported by Project of National Natural Science Foundation(No.41874134)
文摘Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.
文摘Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>
基金This research has been partially supported by the national natural science foundation of China (51175169) and the national science and technology support program (2012BAF02B01).
文摘In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.
基金supported by the National Natural Science Foundation of China(61702251,61363049,11571011)the State Scholarship Fund of China Scholarship Council(CSC)(201708360040)+3 种基金the Natural Science Foundation of Jiangxi Province(20161BAB212033)the Natural Science Basic Research Plan in Shaanxi Province of China(2018JM6030)the Doctor Scientific Research Starting Foundation of Northwest University(338050050)Youth Academic Talent Support Program of Northwest University
文摘This paper proposes a Graph regularized Lpsmooth non-negative matrix factorization(GSNMF) method by incorporating graph regularization and L_p smoothing constraint, which considers the intrinsic geometric information of a data set and produces smooth and stable solutions. The main contributions are as follows: first, graph regularization is added into NMF to discover the hidden semantics and simultaneously respect the intrinsic geometric structure information of a data set. Second,the Lpsmoothing constraint is incorporated into NMF to combine the merits of isotropic(L_2-norm) and anisotropic(L_1-norm)diffusion smoothing, and produces a smooth and more accurate solution to the optimization problem. Finally, the update rules and proof of convergence of GSNMF are given. Experiments on several data sets show that the proposed method outperforms related state-of-the-art methods.
基金grants from the Fundamental Research Funds for the Central Universities(Grant No.2572018BH02)Special Funds for Scientific Research in the Forestry Public Welfare Industry(Grant Nos.201504307-03)。
文摘Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop's distributed system was selected to solve the storage problem of massive forestry big data and the memory-based Spark computing framework to realize real-time and fast processing of data.The forestry data contains a wealth of information,and mining this information is of great significance for guiding the development of forestry.We conducts co-word and cluster analyses on the keywords of forestry data,extracts the rules hidden in the data,analyzes the research hotspots more accurately,grasps the evolution trend of subject topics,and plays an important role in promoting the research and development of subject areas.The co-word analysis and clustering algorithm have important practical significance for the topic structure,research hotspot or development trend in the field of forestry research.Distributed storage framework and parallel computing have greatly improved the performance of data mining algorithms.Therefore,the forestry big data mining system by big data technology has important practical significance for promoting the development of intelligent forestry.
文摘Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory. Key words outlier - detection - three-dimensional space graph - data mining CLC number TP 311. 13 - TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: ZHANG Jing (1975-), female, Ph. D, lecturer, research direction: data mining and knowledge discovery.
基金This research work is supported by Hunan Provincial Education Science 13th Five Year Plan(Grant No.XJK016BXX001)Social Science Foundation of Hunan Province(Grant No.17YBA049)+2 种基金Hunan Provincial Natural Science Foundation of China(Grant No.2017JJ2016)National Students’platform for innovation and entrepreneurship training(Grant No.201811532010)The work is also supported by Open foundation for University Innovation Platform from Hunan Province,China(Grand No.16K013)and the 2011 Collaborative Innovation Center of Big Data for Financial and Economical Asset Development and Utility in Universities of Hunan Province.We also thank the anonymous reviewers for their valuable comments and insightful suggestions.
文摘Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data visualization refers to data that is presented in a visual form,such as a chart or map,to help people understand the meaning of the data.Data visualization helps people extract meaning from data quickly and easily.Visualization can be used to fully demonstrate the patterns,trends,and dependencies of your data,which can be found in other displays.Big data visualization analysis combines the advantages of computers,which can be static or interactive,interactive analysis methods and interactive technologies,which can directly help people and effectively understand the information behind big data.It is indispensable in the era of big data visualization,and it can be very intuitive if used properly.Graphical analysis also found that valuable information becomes a powerful tool in complex data relationships,and it represents a significant business opportunity.With the rise of big data,important technologies suitable for dealing with complex relationships have emerged.Graphics come in a variety of shapes and sizes for a variety of business problems.Graphic analysis is first in the visualization.The step is to get the right data and answer the goal.In short,to choose the right method,you must understand each relative strengths and weaknesses and understand the data.Key steps to get data:target;collect;clean;connect.
文摘With the purpose of making calculation more efficient in practical hydraulic simulations, an improved algorithm was proposed and was applied in the practical water distribution field. This methodology was developed by expanding the traditional loop-equation theory through utilization of the advantages of the graph theory in efficiency. The utilization of the spanning tree technique from graph theory makes the proposed algorithm efficient in calculation and simple to use for computer coding. The algorithms for topological generation and practical implementations are presented in detail in this paper. Through the application to a practical urban system, the consumption of the CPU time and computation memory were decreased while the accuracy was greatly enhanced compared with the present existing methods.
基金Project supported partly by the National Basic Research Program (973) of China (No. 2002B312103)+2 种基金the National Natural Science Foundation of China (No. 3027466)the Chinese Academy of Sciences
文摘This study aimed at investigating the characteristics of table and graph that people perceive and the data types which people consider the two displays are most appropriate for. Participants in this survey were 195 teachers and undergraduates from four universities in Beijing. The results showed people's different attitudes towards the two forms of display.
文摘Purpose: Our work seeks to overcome data quality issues related to incomplete author affiliation data in bibliographic records in order to support accurate and reliable measurement of international research collaboration(IRC).Design/methodology/approch: We propose, implement, and evaluate a method that leverages the Web-based knowledge graph Wikidata to resolve publication affiliation data to particular countries. The method is tested with general and domain-specific data sets.Findings: Our evaluation covers the magnitude of improvement, accuracy, and consistency. Results suggest the method is beneficial, reliable, and consistent, and thus a viable and improved approach to measuring IRC.Research limitations: Though our evaluation suggests the method works with both general and domain-specific bibliographic data sets, it may perform differently with data sets not tested here. Further limitations stem from the use of the R programming language and R libraries for country identification as well as imbalanced data coverage and quality in Wikidata that may also change over time.Practical implications: The new method helps to increase the accuracy in IRC studies and provides a basis for further development into a general tool that enriches bibliographic data using the Wikidata knowledge graph.Originality: This is the first attempt to enrich bibliographic data using a peer-produced, Webbased knowledge graph like Wikidata.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
基金Supported by the National Natural Science Foundation of China (601133010)
文摘In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify patterns, trends and relationship within the data. A mathematical model for the graph layout problem is deduced and a spectral graph drawing algorithm for visualizing multivariate categorical data is proposed. The experiments show that the drawings by the algorithm well capture the structures of multivariate categorical data and the computing speed is fast.