Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently...Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently, greater emphasis has been placed on GIS (geographical information system)to deal with the marine information. The GIS has shown great success for terrestrial applications in the last decades, but its use in marine fields has been far more restricted. One of the main reasons is that most of the GIS systems or their data models are designed for land applications. They cannot do well with the nature of the marine environment and for the marine information. And this becomes a fundamental challenge to the traditional GIS and its data structure. This work designed a data model, the raster-based spatio-temporal hierarchical data model (RSHDM), for the marine information system, or for the knowledge discovery fi'om spatio-temporal data, which bases itself on the nature of the marine data and overcomes the shortages of the current spatio-temporal models when they are used in the field. As an experiment, the marine fishery data warehouse (FDW) for marine fishery management was set up, which was based on the RSHDM. The experiment proved that the RSHDM can do well with the data and can extract easily the aggregations that the management needs at different levels.展开更多
Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great s...Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.展开更多
Based on the perspective of big data,the growth characteristics of marine science and technology talents were analyzed,and the growth of marine science and technology talents was divided into five periods:study period...Based on the perspective of big data,the growth characteristics of marine science and technology talents were analyzed,and the growth of marine science and technology talents was divided into five periods:study period,adaptation period,growth period,promotion period and stability period.Moreover,some suggestions for the training of marine science and technology talents were proposed from the aspects of students,families,schools and society.展开更多
By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline...By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline published in the China Academic Network Publishing Database(CNKI)was analyzed and discussed.It is found that there was a lack of communication and cooperation among research institutions and scholars;the research hotspots involved four main areas,including“application in tourism research”,“application in traffic travel research”,“application in work-housing relationship research”,and“application in personal family life research”.展开更多
This study analyzes and summarizes seven main characteristics of the marine data sampled by multiple underwater gliders. These characteristics such as the big data volume and data sparseness make it extremely difficul...This study analyzes and summarizes seven main characteristics of the marine data sampled by multiple underwater gliders. These characteristics such as the big data volume and data sparseness make it extremely difficult to do some meaningful applications like early warning of marine environment. In order to make full use of the sea trial data, this paper gives the definition of two types of marine data cube which can integrate the big marine data sampled by multiple underwater gliders along saw-tooth paths, and proposes a data fitting algorithm based on time extraction and space compression(DFTS) to construct the temperature and conductivity data cubes. This research also presents an early warning algorithm based on data cube(EWDC) to realize the early warning of a new sampled data file.Experiments results show that the proposed methods are reasonable and effective. Our work is the first study to do some realistic applications on the data sampled by multiple underwater vehicles, and it provides a research framework for processing and analyzing the big marine data oriented to the applications of underwater gliders.展开更多
Marine big data are characterized by a large amount and complex structures,which bring great challenges to data management and retrieval.Based on the GeoSOT Grid Code and the composite index structure of the MongoDB d...Marine big data are characterized by a large amount and complex structures,which bring great challenges to data management and retrieval.Based on the GeoSOT Grid Code and the composite index structure of the MongoDB database,this paper proposes a spatio-temporal grid index model(STGI)for efficient optimized query of marine big data.A spatio-temporal secondary index is created on the spatial code and time code columns to build a composite index in the MongoDB database used for the storage of massive marine data.Multiple comparative experiments demonstrate that the retrieval efficiency adopting the STGI approach is increased by more than two to three times compared with other index models.Through theoretical analysis and experimental verification,the conclusion could be achieved that the STGI model is quite suitable for retrieving large-scale spatial data with low time frequency,such as marine big data.展开更多
On the basis of the digital Weifang geospatial framework,Smart Weifang spatio-temporal information cloud platform(WFCP)integrated legal person information,population,place name and address data,macroeconomic data and ...On the basis of the digital Weifang geospatial framework,Smart Weifang spatio-temporal information cloud platform(WFCP)integrated legal person information,population,place name and address data,macroeconomic data and so on.And it also expanded the data contents,such as the indoor and outdoor data,the overground and underground data,panoramic data and real data.It also introduced the contents of historical geographical information in different periods and real-time location information,address information of sensing equipment,real-time perception and interpreting information.It has overcome the difficulties of real-time access of Internet of Things(IoT)perception,multi-node collaboration,64-bit support,cluster deployment and has the characteristics of spatio-temporal management,ondemand service,large data analysis and micro-service architecture.It built spatio-temporal information big data center and spatio-temporal information cloud platform,realized the convergence and management of the distributed big data,deeply applied for land,transportation,environmental protection,police and subdistrict five areas,by supporting the integrated application of multi-source information and supporting intelligent deep application.In the aspect of hardware environment construction,according to the top-level design and unified arrangement of Smart Weifang,the WFCP was migrated to Weifang cloud computing center,to achieve the on-demand computing resources and dynamic scheduling load-based computing resources,to support the generalizing load map application.展开更多
With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the...With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming;the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.展开更多
基金supported by the National Key Basic Research and Development Program of China under contract No.2006CB701305the National Natural Science Foundation of China under coutract No.40571129the National High-Technology Program of China under contract Nos 2002AA639400,2003AA604040 and 2003AA637030.
文摘Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently, greater emphasis has been placed on GIS (geographical information system)to deal with the marine information. The GIS has shown great success for terrestrial applications in the last decades, but its use in marine fields has been far more restricted. One of the main reasons is that most of the GIS systems or their data models are designed for land applications. They cannot do well with the nature of the marine environment and for the marine information. And this becomes a fundamental challenge to the traditional GIS and its data structure. This work designed a data model, the raster-based spatio-temporal hierarchical data model (RSHDM), for the marine information system, or for the knowledge discovery fi'om spatio-temporal data, which bases itself on the nature of the marine data and overcomes the shortages of the current spatio-temporal models when they are used in the field. As an experiment, the marine fishery data warehouse (FDW) for marine fishery management was set up, which was based on the RSHDM. The experiment proved that the RSHDM can do well with the data and can extract easily the aggregations that the management needs at different levels.
基金Supported by the National Key Research and Development Program of China(Nos.2016YFC1402000,2018YFC1407003,2017YFC1405300)
文摘Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.
基金Supported by Foundation for Humanities and Social Sciences Research Planning of Ministry of Education of Shanghai City(19YJA630058)
文摘Based on the perspective of big data,the growth characteristics of marine science and technology talents were analyzed,and the growth of marine science and technology talents was divided into five periods:study period,adaptation period,growth period,promotion period and stability period.Moreover,some suggestions for the training of marine science and technology talents were proposed from the aspects of students,families,schools and society.
文摘By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline published in the China Academic Network Publishing Database(CNKI)was analyzed and discussed.It is found that there was a lack of communication and cooperation among research institutions and scholars;the research hotspots involved four main areas,including“application in tourism research”,“application in traffic travel research”,“application in work-housing relationship research”,and“application in personal family life research”.
基金financially supported by the National Natural Science Foundation of China(Grant Nos.U1709202 and No.61502069)the Foundation of State Key Laboratory of Robotics(Grant No.2015-o03)the Fundamental Research Funds for the Central Universities(Grant Nos.DUT18JC39 and DUT17JC45)
文摘This study analyzes and summarizes seven main characteristics of the marine data sampled by multiple underwater gliders. These characteristics such as the big data volume and data sparseness make it extremely difficult to do some meaningful applications like early warning of marine environment. In order to make full use of the sea trial data, this paper gives the definition of two types of marine data cube which can integrate the big marine data sampled by multiple underwater gliders along saw-tooth paths, and proposes a data fitting algorithm based on time extraction and space compression(DFTS) to construct the temperature and conductivity data cubes. This research also presents an early warning algorithm based on data cube(EWDC) to realize the early warning of a new sampled data file.Experiments results show that the proposed methods are reasonable and effective. Our work is the first study to do some realistic applications on the data sampled by multiple underwater vehicles, and it provides a research framework for processing and analyzing the big marine data oriented to the applications of underwater gliders.
基金This research was funded by the National Key Research and Development Plan(2018YFB0505300)the Guangxi Science and Technology Major Project(AA18118025)+1 种基金the Opening Foundation of Key Laboratory of Environment Change and Resources Use in Beibu Gulf,Ministry of Education(Nanning Normal University)Guangxi Key Laboratory of Earth Surface Processes and Intelligent Simulation(Nanning Normal University)(No.NNNU-KLOP-K1905).
文摘Marine big data are characterized by a large amount and complex structures,which bring great challenges to data management and retrieval.Based on the GeoSOT Grid Code and the composite index structure of the MongoDB database,this paper proposes a spatio-temporal grid index model(STGI)for efficient optimized query of marine big data.A spatio-temporal secondary index is created on the spatial code and time code columns to build a composite index in the MongoDB database used for the storage of massive marine data.Multiple comparative experiments demonstrate that the retrieval efficiency adopting the STGI approach is increased by more than two to three times compared with other index models.Through theoretical analysis and experimental verification,the conclusion could be achieved that the STGI model is quite suitable for retrieving large-scale spatial data with low time frequency,such as marine big data.
文摘On the basis of the digital Weifang geospatial framework,Smart Weifang spatio-temporal information cloud platform(WFCP)integrated legal person information,population,place name and address data,macroeconomic data and so on.And it also expanded the data contents,such as the indoor and outdoor data,the overground and underground data,panoramic data and real data.It also introduced the contents of historical geographical information in different periods and real-time location information,address information of sensing equipment,real-time perception and interpreting information.It has overcome the difficulties of real-time access of Internet of Things(IoT)perception,multi-node collaboration,64-bit support,cluster deployment and has the characteristics of spatio-temporal management,ondemand service,large data analysis and micro-service architecture.It built spatio-temporal information big data center and spatio-temporal information cloud platform,realized the convergence and management of the distributed big data,deeply applied for land,transportation,environmental protection,police and subdistrict five areas,by supporting the integrated application of multi-source information and supporting intelligent deep application.In the aspect of hardware environment construction,according to the top-level design and unified arrangement of Smart Weifang,the WFCP was migrated to Weifang cloud computing center,to achieve the on-demand computing resources and dynamic scheduling load-based computing resources,to support the generalizing load map application.
基金supported by the National Key Research and Development Program of China under Grant No.2016YFB1000601
文摘With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming;the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.