Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for rese...Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.展开更多
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes...Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.展开更多
Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effectiv...Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>展开更多
In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore,...In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.展开更多
Purpose:The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats,thus developing out the necessity for R...Purpose:The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats,thus developing out the necessity for RDF data processing with specific purposes.The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor,a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency.Design/methodology/approach:The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats,and reuses the Java framework RDF4J as middleware that realizes access to data repositories,SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support.It can support effortless services with various configuration templates in multi-scenario applications,and help extend data process tasks in other services or tools to complement missing functions.Findings:The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface,supports data integration and federation over multi-source heterogeneous repositories or endpoints,as well as manage linked data in hybrid storage mode.Research limitations:The plugin set can support several application scenarios of RDF data process,but error detection/check and interaction with other graph repositories remain to be improved.Practical implications:The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation,multi-format data conversion,remote RDF data migration,and RDF graph update in semantic query process.Originality/value:This is the first attempt to develop components instead of systems that can include extract,consolidate,and store RDF data on the basis of an ecologically mature data warehousing environment.展开更多
Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments ...Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments on galaxy interactions.We identify the galaxies in filaments and sheets using the local dimension and also find the major pairs residing in these environments.The star formation rate(SFR) and color of the interacting galaxies as a function of pair separation are separately analyzed in filaments and sheets.The analysis is repeated for three volume limited samples covering different magnitude ranges.The major pairs residing in the filaments show a significantly higher SFR and bluer color than those residing in the sheets up to the projected pair separation of~50 kpc.We observe a complete reversal of this behavior for both the SFR and color of the galaxy pairs having a projected separation larger than 50 kpc.Some earlier studies report that the galaxy pairs align with the filament axis.Such alignment inside filaments indicates anisotropic accretion that may cause these differences.We do not observe these trends in the brighter galaxy samples.The pairs in filaments and sheets from the brighter galaxy samples trace relatively denser regions in these environments.The absence of these trends in the brighter samples may be explained by the dominant effect of the local density over the effects of the large-scale environment.展开更多
语义网数据的大量增加使得RDF数据查询成为一个重要研究主题.关键词查询方式不需要掌握数据模式或查询语言,更适合普通用户使用.文中提出一种RDF数据关键词查询方法KREAG(Keyword query over RDF data based on Entity-triple Associati...语义网数据的大量增加使得RDF数据查询成为一个重要研究主题.关键词查询方式不需要掌握数据模式或查询语言,更适合普通用户使用.文中提出一种RDF数据关键词查询方法KREAG(Keyword query over RDF data based on Entity-triple Association Graph).为了支持用户对属性或关系名进行查询,将RDF数据建模为顶点带标签的实体三元组关联图.该模型保证了RDF数据中实体间关联转化为关联图中顶点间的通路,且文本信息全部封装到关联图顶点标签上.在此基础上,将关键词查询问题转化为关联图上查找有向斯坦纳树问题.在保证近似比为m的前提下(m为查询关键词的个数),利用近似算法实现快速查询响应.通过合理的评分方式衡量查询结果的相关性,支持top-k查询.算法的时间复杂度为O(m.|V|),其中|V|为实体三元组关联图中顶点个数.实验表明KREAG较其它方法具有更快的响应时间,同时能够有效地实现RDF数据的关键词查询.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
文摘Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.
基金Supported by Project of National Natural Science Foundation(No.41874134)
文摘Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.
文摘Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>
基金This research has been partially supported by the national natural science foundation of China (51175169) and the national science and technology support program (2012BAF02B01).
文摘In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.
基金This work is supported by“National Social Science Foundation in China”Project(19BTQ061)“Integration and Development on A Next Generation of Open Knowledge Services System and Key Technologies”project(2020XM05).
文摘Purpose:The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats,thus developing out the necessity for RDF data processing with specific purposes.The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor,a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency.Design/methodology/approach:The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats,and reuses the Java framework RDF4J as middleware that realizes access to data repositories,SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support.It can support effortless services with various configuration templates in multi-scenario applications,and help extend data process tasks in other services or tools to complement missing functions.Findings:The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface,supports data integration and federation over multi-source heterogeneous repositories or endpoints,as well as manage linked data in hybrid storage mode.Research limitations:The plugin set can support several application scenarios of RDF data process,but error detection/check and interaction with other graph repositories remain to be improved.Practical implications:The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation,multi-format data conversion,remote RDF data migration,and RDF graph update in semantic query process.Originality/value:This is the first attempt to develop components instead of systems that can include extract,consolidate,and store RDF data on the basis of an ecologically mature data warehousing environment.
基金financial support from the SERB,DST,Government of India through the project CRG/2019/001110IUCAA,Pune for providing support through an associateship program+1 种基金IISER Tirupati for support through a postdoctoral fellowshipFunding for the SDSS and SDSS-Ⅱhas been provided by the Alfred P.Sloan Foundation,the U.S.Department of Energy,the National Aeronautics and Space Administration,the Japanese Monbukagakusho,the Max Planck Society,and the Higher Education Funding Council for England。
文摘Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments on galaxy interactions.We identify the galaxies in filaments and sheets using the local dimension and also find the major pairs residing in these environments.The star formation rate(SFR) and color of the interacting galaxies as a function of pair separation are separately analyzed in filaments and sheets.The analysis is repeated for three volume limited samples covering different magnitude ranges.The major pairs residing in the filaments show a significantly higher SFR and bluer color than those residing in the sheets up to the projected pair separation of~50 kpc.We observe a complete reversal of this behavior for both the SFR and color of the galaxy pairs having a projected separation larger than 50 kpc.Some earlier studies report that the galaxy pairs align with the filament axis.Such alignment inside filaments indicates anisotropic accretion that may cause these differences.We do not observe these trends in the brighter galaxy samples.The pairs in filaments and sheets from the brighter galaxy samples trace relatively denser regions in these environments.The absence of these trends in the brighter samples may be explained by the dominant effect of the local density over the effects of the large-scale environment.
文摘语义网数据的大量增加使得RDF数据查询成为一个重要研究主题.关键词查询方式不需要掌握数据模式或查询语言,更适合普通用户使用.文中提出一种RDF数据关键词查询方法KREAG(Keyword query over RDF data based on Entity-triple Association Graph).为了支持用户对属性或关系名进行查询,将RDF数据建模为顶点带标签的实体三元组关联图.该模型保证了RDF数据中实体间关联转化为关联图中顶点间的通路,且文本信息全部封装到关联图顶点标签上.在此基础上,将关键词查询问题转化为关联图上查找有向斯坦纳树问题.在保证近似比为m的前提下(m为查询关键词的个数),利用近似算法实现快速查询响应.通过合理的评分方式衡量查询结果的相关性,支持top-k查询.算法的时间复杂度为O(m.|V|),其中|V|为实体三元组关联图中顶点个数.实验表明KREAG较其它方法具有更快的响应时间,同时能够有效地实现RDF数据的关键词查询.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.