Massive ocean data acquired by various observing platforms and sensors poses new challenges to data management and utilization.Typically,it is difficult to find the desired data from the large amount of datasets effic...Massive ocean data acquired by various observing platforms and sensors poses new challenges to data management and utilization.Typically,it is difficult to find the desired data from the large amount of datasets efficiently and effectively.Most of existing methods for data discovery are based on the keyword retrieval or direct semantic reasoning,and they are either limited in data access rate or do not take the time cost into account.In this paper,we creatively design and implement a novel system to alleviate the problem by introducing semantics with ontologies,which is referred to as Data Ontology and List-Based Publishing(DOLP).Specifically,we mainly improve the ocean data services in the following three aspects.First,we propose a unified semantic model called OEDO(Ocean Environmental Data Ontology)to represent heterogeneous ocean data by metadata and to be published as data services.Second,we propose an optimized quick service query list(QSQL)data structure for storing the pre-inferred semantically related services,and reducing the service querying time.Third,we propose two algorithms for optimizing QSQL hierarchically and horizontally,respectively,which aim to extend the semantics relationships of the data service and improve the data access rate.Experimental results prove that DOLP outperforms the benchmark methods.First,our QSQL-based data discovery methods obtain a higher recall rate than the keyword-based method,and are faster than the traditional semantic method based on direct reasoning.Second,DOLP can handle more complex semantic relationships than the existing methods.展开更多
An ocean state monitor and analysis radar(OSMAR), developed by Wuhan University in China, have been mounted at six stations along the coasts of East China Sea(ECS) to measure velocities(currents, waves and winds...An ocean state monitor and analysis radar(OSMAR), developed by Wuhan University in China, have been mounted at six stations along the coasts of East China Sea(ECS) to measure velocities(currents, waves and winds) at the sea surface. Radar-observed surface current is taken as an example to illustrate the operational high-frequency(HF) radar observing and data service platform(OP), presenting an operational flow from data observing, transmitting, processing, visualizing, to end-user service. Three layers(systems): radar observing system(ROS), data service system(DSS) and visualization service system(VSS), as well as the data flow within the platform are introduced. Surface velocities observed at stations are synthesized at the radar data receiving and preprocessing center of the ROS, and transmitted to the DSS, in which the data processing and quality control(QC) are conducted. Users are allowed to browse the processed data on the portal of the DSS, and access to those data files. The VSS aims to better show the data products by displaying the information on a visual globe. By utilizing the OP, the surface currents in East China Sea are monitored, and hourly and seasonal variabilities of them are investigated.展开更多
Purpose: This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring th...Purpose: This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring the data quality of the European Tertiary Education Register(ETER) database, illustrating its functioning and highlighting the main challenges that still have to be faced in this domain.Design/methodology/approach: The proposed data quality methodology is based on two kinds of checks, one to assess the consistency of cross-sectional data and the other to evaluate the stability of multiannual data. This methodology has an operational and empirical orientation. This means that the proposed checks do not assume any theoretical distribution for the determination of the threshold parameters that identify potential outliers, inconsistencies, and errors in the data. Findings: We show that the proposed cross-sectional checks and multiannual checks are helpful to identify outliers, extreme observations and to detect ontological inconsistencies not described in the available meta-data. For this reason, they may be a useful complement to integrate the processing of the available information.Research limitations: The coverage of the study is limited to European Higher Education Institutions. The cross-sectional and multiannual checks are not yet completely integrated.Practical implications: The consideration of the quality of the available data and information is important to enhance data quality-aware empirical investigations, highlighting problems, and areas where to invest for improving the coverage and interoperability of data in future data collection initiatives.Originality/value: The data-driven quality checks proposed in this paper may be useful as a reference for building and monitoring the data quality of new databases or of existing databases available for other countries or systems characterized by high heterogeneity and complexity of the units of analysis without relying on pre-specified theoretical distributions.展开更多
In existing web services-based workflow, data exchanging across the web services is centralized, the workflow engine intermediates at each step of the application sequence. However, many grid applications, especially ...In existing web services-based workflow, data exchanging across the web services is centralized, the workflow engine intermediates at each step of the application sequence. However, many grid applications, especially data intensive scientific applications, require exchanging large amount of data across the grid services. Having a central workflow engine relay the data between the services would resu'lts in a bottleneck in these cases. This paper proposes a data exchange model for individual grid workflow and multiworkflows composition respectively. The model enables direct communication for large amounts of data between two grid services. To enable data to exchange among multiple workflows, the bridge data service is used.展开更多
For China’s telecom industry,2009 is destined to be an extraordinary year due to the approach of long-thirsted-for mobile 3G era,which will have significant impact on current work and lifestyles.2009 will also be a y...For China’s telecom industry,2009 is destined to be an extraordinary year due to the approach of long-thirsted-for mobile 3G era,which will have significant impact on current work and lifestyles.2009 will also be a year full of opportunities and challenges because the coming 3G era will bring limitless business opportunities and impose more challenges on Chinese telecom operators.The reshuffling of Chinese telecom markets has been brought to an end.The new China Unicom,China Mobile and China Telecom all focus their strategies on broadband mobile data services in order to achieve the objective of a smooth transforming from voice services to data services.Technologically,various 3G technologies and their evolutions become great concerns of telecom operators;while in terms of services,the key for 3G systems is their data services.As a result,high speed broadband data services see an era of rapid development.展开更多
What tasks do technological changes taking place in the world impose on tax administrations,and at the same time,what opportunities do they create in enforcing the principle of public responsibility?How can innovation...What tasks do technological changes taking place in the world impose on tax administrations,and at the same time,what opportunities do they create in enforcing the principle of public responsibility?How can innovations like European Digital Identity Wallet(EUDIW)be applied in the authentication environment?What assistance can the authorities provide in the integrity of taxpayers'business data?What developments are seen in the work of the Hungarian tax administration to use transaction-based data to contribute to a more modern public administration system and,last but not least,to a fair public burden?How does blockchain as a technology platform support data integrity?How does personalized and easy-to-understand communication revolutionize customer information?These questions are answered in this article.展开更多
China began to develop its meteorological satellite program since 1969.With 50-years’growing,there are 17 Fengyun(FY)meteorological satellites launched successfully.At present,seven of them are in orbit to provide th...China began to develop its meteorological satellite program since 1969.With 50-years’growing,there are 17 Fengyun(FY)meteorological satellites launched successfully.At present,seven of them are in orbit to provide the operational service,including three polar orbiting meteorological satellites and four geostationary meteorological satellites.Since last COSPAR report,no new Fengyun satellite has been launched.The information of the on-orbit FY-2 series,FY-3 series,and FY-4 series has been updated.FY-3D and FY-2H satellites accomplished the commission test and transitioned into operation in 2018.FY-2E satellite completed its service to decommission in 2019.The web-based users and Direct Broadcasting(DB)users keep growing worldwide to require the Fengyun satellite data and products.A new Mobile Application Service has been launched to Fengyun users based on the cloud technology in 2018.In this report,the international and regional co-operations to facilitate the Fengyun user community have been addressed especially.To strengthen the data service in the Belt and Road countries,the Emergency Support Mechanism of Fengyun satellite(FY_ESM)has been established since 2018.Meanwhile,a Recalibrating 30-years’archived Fengyun satellite data project has been founded since 2018.This project targets to generate the Fundamental Climate Data Record(FCDR)as a space agency response to the Global Climate Observation System(GCOS).At last,the future Fengyun program up to 2025 has been introduced as well.展开更多
China’s efforts to develop Fengyun meteorological satellites have made major strides over the past 50 years,with the polar and geostationary meteorological satellite series achieving continuously stable operation to ...China’s efforts to develop Fengyun meteorological satellites have made major strides over the past 50 years,with the polar and geostationary meteorological satellite series achieving continuously stable operation to persistently provide data and product services globally.By the end of 2021,19 Chinese self-developed Fengyun meteorological satellites have been launched successfully.Seven of them are in operation at present,the data and products are widely applied to weather analysis,numerical weather forecasting and climate prediction,as well as environment and disaster monitoring.Since the last COSPAR report,FY-4B,the first new-generation operational geostationary satellite,and FY-3E,the first early-morning orbit satellite in China’s polar-orbiting meteorological satellite family have been launched in 2021.The characteristics of the two latest satellites and the instruments onboard are addressed in this report.The status of current Fengyun Satellites,product and data service and international cooperation and supporting activities has been introduced as well.展开更多
Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracte...Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracted can be potentially updated with a frequency higher than once per year,and be safe from manipulations or misinterpretations.Moreover,this approach allows us flexibility in collecting indicators about the efficiency of universities’websites and their effectiveness in disseminating key contents.These new indicators can complement traditional indicators of scientific research(e.g.number of articles and number of citations)and teaching(e.g.number of students and graduates)by introducing further dimensions to allow new insights for“profiling”the analyzed universities.Design/methodology/approach:Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web.This study implements an advanced application of the webometric approach,exploiting all the three categories of web mining:web content mining;web structure mining;web usage mining.The information to compute our indicators has been extracted from the universities’websites by using web scraping and text mining techniques.The scraped information has been stored in a NoSQL DB according to a semistructured form to allow for retrieving information efficiently by text mining techniques.This provides increased flexibility in the design of new indicators,opening the door to new types of analyses.Some data have also been collected by means of batch interrogations of search engines(Bing,www.bing.com)or from a leading provider of Web analytics(SimilarWeb,http://www.similarweb.com).The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register(https://eter.joanneum.at/#/home),a database collecting information on Higher Education Institutions(HEIs)at European level.All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.Findings:The main findings of this study concern the evaluation of the potential in digitalization of universities,in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’websites.These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitations:The results reported in this study refers to Italian universities only,but the approach could be extended to other university systems abroad.Practical implications:The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites.The approach could be applied to other university systems.Originality/value:This work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping,optical character recognition and nontrivial text mining operations(Bruni&Bianchi,2020).展开更多
Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems...Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems are among these application domains.Data Distribution Service(DDS)is a communication mechanism based on Data-Centric Publish-Subscribe(DCPS)model.It is used for distributed systems with real-time operational constraints.Java Message Service(JMS)is a messaging standard for enterprise systems using Service Oriented Architecture(SOA)for non-real-time operations.JMS allows Java programs to exchange messages in a loosely coupled fashion.JMS also supports sending and receiving messages using a messaging queue and a publish-subscribe interface.In this article,we propose an architecture enabling the automated integration of distributed real-time and non-real-time systems.We test our proposed architecture using a distributed Command,Control,Communications,Computers,and Intelligence(C4I)system.The system has DDS-based real-time Combat Management System components deployed to naval warships,and SOA-based non-real-time Command and Control components used at headquarters.The proposed solution enables the exchange of data between these two systems efficiently.We compare the proposed solution with a similar study.Our solution is superior in terms of automation support,ease of implementation,scalability,and performance.展开更多
This paper is the second of a series that describes some of the main dataset resources presently shared through the GEOSS Platform. The GEOSS Platform was created as the technological tool to implement interoperabilit...This paper is the second of a series that describes some of the main dataset resources presently shared through the GEOSS Platform. The GEOSS Platform was created as the technological tool to implement interoperability among the Global Earth Observation System of Systems (GEOSS);it is a brokering infrastructure that presently brokers more than 190 autonomous data catalogs and information systems. This paper is focused on the analysis of the NextGEOSS datasets describing the data publishing process from NextGEOSS to the GEOSS platform. In particular, both the administrative registration and the technical registration were taken into consideration. One of the most important data shared by the GEOSS Platform are the NextGEOSS datasets: the present study provides some insights in terms of GEOSS user searches for NextGEOSS data.展开更多
Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastru...Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures.Researchers desire to shorten the workflows from data generation to analysis and publication,and the full workflow needs to become transparent to multiple stakeholders,including research administrators and funders.This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable,accessible,interoperable and reusable,but also doing so in a way that leverages machine support for better efficiency.One primary need to be addressed is that of findability,and achieving better findability has benefits for other aspects of data and workflow management.In this article,we describe how machine capabilities can be extended to make workflows more findable,in particular by leveraging the Digital Object Architecture,common object operations and machine learning techniques.展开更多
Fengyun meteorological satellites have undergone a series of significant developments over the past 50 years.Two generations,four types,and 21 Fengyun satellites have been developed and launched,with 9 currently opera...Fengyun meteorological satellites have undergone a series of significant developments over the past 50 years.Two generations,four types,and 21 Fengyun satellites have been developed and launched,with 9 currently operational in orbit.The data obtained from Fengyun satellites is employed in a multitude of applications,including weather forecasting,meteorological disaster prevention and reduction,climate change,global environmental monitoring,and space weather.These data products and services are made available to the global community,resulting in tangible social and economic benefits.In 2023,two Fengyun meteorological satellites were successfully launched.This report presents an overview of the two recently launched Fengyun satellites and currently in orbit Fengyun satellites,including an evaluation of their remote sensing instruments since 2022.Additionally,it addresses the subject of Fengyun satellite data archiving,data services,application services,international cooperation,and supporting activities.Furthermore,the development prospects have been outlined.展开更多
With the increase of different sensors,applications and customers,the demand from data providers and users is for a new geospatial data service model,which supports low cost,high dexterity,and which would provide a co...With the increase of different sensors,applications and customers,the demand from data providers and users is for a new geospatial data service model,which supports low cost,high dexterity,and which would provide a comprehensive service.Based on such requirements and demands,the 21AT TripleSat constellation terminal and data delivery and management system has been developed by a Beijing based high-tech enterprise,Twenty First Century Aerospace Technology Co.,Ltd.(21AT).The company is the first commercial Earth observation satellite operator and service provider in China.This new geospatial data service model allows the user to directly access multi-source satellite data,manage the data order,and carry out automatic massive data production and delivery.The solution also implements safe and hierarchical user management,statistical data analysis,and automatic information reports.In addition,a mobile application is also available for users to easily access system functions.This new geospatial solution has already been successfully applied and installed in many customer sites in China,and is now available globally for international clients interested in fast geospatial solutions.It enables the success of customers’operational services.Besides providing TripleSat Constellation images,the multi-source data access system also allows the users to access other satellite data sources,based on customized agreement.This paper describes and discusses this new geospatial data service model.展开更多
To study characteristics and market structure of data service and the role ofoperator, this paper makes a commercial model by applying the theory of intermediaries andneoclassic economics. Data service has different e...To study characteristics and market structure of data service and the role ofoperator, this paper makes a commercial model by applying the theory of intermediaries andneoclassic economics. Data service has different economic characteristics from voice service.Firstly, production mode of data service is roundabout production, secondly, driving power of dataservice is economies of specialization, and finally, management method of data service is impersonalmanagement. In data service market, information asymmetry and barrier to entry determinetransaction efficiency and the specialization level of service providers indirectly. Therefore,operator should intervene in the market by offering trade service in order to promote development ofservice providers. Because of different quality of service providers, market structure of dataservice must be the state that trade platform built by operator and intermediary platform built byoperator coexists.展开更多
With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new ...With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on Iaa S(DHCI) architecture, which includes four key modules: monitoring,scheduling, Virtual Machine(VM) management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Open Stack.Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.展开更多
The Virus Outbreak Data Network(VODAN)-Africa aims to contribute to the publication of Findable Accessible, Interoperable, and Reusable(FAIR) health data under well-defined access conditions. The next step in the VODA...The Virus Outbreak Data Network(VODAN)-Africa aims to contribute to the publication of Findable Accessible, Interoperable, and Reusable(FAIR) health data under well-defined access conditions. The next step in the VODAN-Africa architecture is to locally deploy the Center for Expanded Data Annotation and Retrieval(CEDAR) and arrange accessibility based on the ‘data visiting’ concept. Locally curated and reposited machine-actionable data can be visited by queries or algorithms, provided that the conditions of access are met. The goal is to enable the multiple(re)use of data with secure access functionality by clinicians(patient care), an idea aligned with the FAIR-based Personal Health Train(PHT) concept. The privacy and security requirements in relation to the FAIR Data Host and the FAIRification workspace(to produce metadata) or dashboard(for the patient) must be clear to design the IT architecture. This article describes a(first) practice, a reference implementation in development, within the VODAN-Africa and Leiden University Medical Center community.展开更多
The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges...The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network(VODAN) architecture builds on these principles, using the European Union’s General Data Protection Regulation(GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.展开更多
In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulationbased research....In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulationbased research.We first identified the different users and user groups in simulation-based climate science based on their reasons for using the resources provided at the German Climate Computing Center(DKRZ).What is special about this is that the DKRZ provides the climate science community with resources like high performance computing(HPC),data storage and specialised services,and hosts the World Data Center for Climate(WDCC).Therefore,users can perform their entire research workflows up to the publication of the data on the same infrastructure.Our analysis shows,that the resources are used by two primary user types:those who require the HPC-system to perform resource intensive simulations to subsequently analyse them and those who reuse,build-on and analyse existing data.We then further subdivided these top-level user categories based on their specific goals and analysed their typical,idealised workflows applied to achieve the respective project goals.We find that due to the subdivision and further granulation of the user groups,the workflows show apparent differences.Nevertheless,similar"Canonical Workflow Modules"can be clearly made out.These modules are"Data and Software(Re)use","Compute","Data and Software Storing","Data and Software Publication","Generating Knowledge"and in their entirety form the basis for a Canonical Workflow Framework for Research(CWFR).It is desirable that parts of the workflows in a CWFR act as FDOs,but we view this aspect critically.Also,we reflect on the question whether the derivation of Canonical Workflow modules from the analysis of current user behaviour still holds for future systems and work processes.展开更多
文摘为提升医院精细化管理,推动医疗机构科学地开展智慧医院建设,解决国家、省、市逐年增长的数据填报工作难题,本文构建了符合实际场景的智慧医疗数据管理平台。该平台利用Data Services技术将HANA数据库计算出的指标抽取到平台数据库,利用JAVA SSM框架完成平台开发,可实现各科室数据自动填报,同时实现了业务处理、数据核对、流程管理、统计分析等上报数据的精细化管理。以SAP Data Services为工具,实现平台指标的自动计算展示,优化流程,建设数据为驱动的高水平智慧医院,从而提升医院核心竞争力。
基金supported by the National Key Research and Development Program of China under Grant No.2018YFB0203801the National Natural Science Foundation of China under Grant Nos.61702529 and 61802424.
文摘Massive ocean data acquired by various observing platforms and sensors poses new challenges to data management and utilization.Typically,it is difficult to find the desired data from the large amount of datasets efficiently and effectively.Most of existing methods for data discovery are based on the keyword retrieval or direct semantic reasoning,and they are either limited in data access rate or do not take the time cost into account.In this paper,we creatively design and implement a novel system to alleviate the problem by introducing semantics with ontologies,which is referred to as Data Ontology and List-Based Publishing(DOLP).Specifically,we mainly improve the ocean data services in the following three aspects.First,we propose a unified semantic model called OEDO(Ocean Environmental Data Ontology)to represent heterogeneous ocean data by metadata and to be published as data services.Second,we propose an optimized quick service query list(QSQL)data structure for storing the pre-inferred semantically related services,and reducing the service querying time.Third,we propose two algorithms for optimizing QSQL hierarchically and horizontally,respectively,which aim to extend the semantics relationships of the data service and improve the data access rate.Experimental results prove that DOLP outperforms the benchmark methods.First,our QSQL-based data discovery methods obtain a higher recall rate than the keyword-based method,and are faster than the traditional semantic method based on direct reasoning.Second,DOLP can handle more complex semantic relationships than the existing methods.
基金The National Natural Science Foundation of China under contract No.41206012
文摘An ocean state monitor and analysis radar(OSMAR), developed by Wuhan University in China, have been mounted at six stations along the coasts of East China Sea(ECS) to measure velocities(currents, waves and winds) at the sea surface. Radar-observed surface current is taken as an example to illustrate the operational high-frequency(HF) radar observing and data service platform(OP), presenting an operational flow from data observing, transmitting, processing, visualizing, to end-user service. Three layers(systems): radar observing system(ROS), data service system(DSS) and visualization service system(VSS), as well as the data flow within the platform are introduced. Surface velocities observed at stations are synthesized at the radar data receiving and preprocessing center of the ROS, and transmitted to the DSS, in which the data processing and quality control(QC) are conducted. Users are allowed to browse the processed data on the portal of the DSS, and access to those data files. The VSS aims to better show the data products by displaying the information on a visual globe. By utilizing the OP, the surface currents in East China Sea are monitored, and hourly and seasonal variabilities of them are investigated.
基金support of the European Commission ETER Project (No. 934533-2017-AO8-CH)H2020 RISIS 2 project (No. 824091)。
文摘Purpose: This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring the data quality of the European Tertiary Education Register(ETER) database, illustrating its functioning and highlighting the main challenges that still have to be faced in this domain.Design/methodology/approach: The proposed data quality methodology is based on two kinds of checks, one to assess the consistency of cross-sectional data and the other to evaluate the stability of multiannual data. This methodology has an operational and empirical orientation. This means that the proposed checks do not assume any theoretical distribution for the determination of the threshold parameters that identify potential outliers, inconsistencies, and errors in the data. Findings: We show that the proposed cross-sectional checks and multiannual checks are helpful to identify outliers, extreme observations and to detect ontological inconsistencies not described in the available meta-data. For this reason, they may be a useful complement to integrate the processing of the available information.Research limitations: The coverage of the study is limited to European Higher Education Institutions. The cross-sectional and multiannual checks are not yet completely integrated.Practical implications: The consideration of the quality of the available data and information is important to enhance data quality-aware empirical investigations, highlighting problems, and areas where to invest for improving the coverage and interoperability of data in future data collection initiatives.Originality/value: The data-driven quality checks proposed in this paper may be useful as a reference for building and monitoring the data quality of new databases or of existing databases available for other countries or systems characterized by high heterogeneity and complexity of the units of analysis without relying on pre-specified theoretical distributions.
基金Supported by the National Natural Science Foun-dation of China(60373072)
文摘In existing web services-based workflow, data exchanging across the web services is centralized, the workflow engine intermediates at each step of the application sequence. However, many grid applications, especially data intensive scientific applications, require exchanging large amount of data across the grid services. Having a central workflow engine relay the data between the services would resu'lts in a bottleneck in these cases. This paper proposes a data exchange model for individual grid workflow and multiworkflows composition respectively. The model enables direct communication for large amounts of data between two grid services. To enable data to exchange among multiple workflows, the bridge data service is used.
文摘For China’s telecom industry,2009 is destined to be an extraordinary year due to the approach of long-thirsted-for mobile 3G era,which will have significant impact on current work and lifestyles.2009 will also be a year full of opportunities and challenges because the coming 3G era will bring limitless business opportunities and impose more challenges on Chinese telecom operators.The reshuffling of Chinese telecom markets has been brought to an end.The new China Unicom,China Mobile and China Telecom all focus their strategies on broadband mobile data services in order to achieve the objective of a smooth transforming from voice services to data services.Technologically,various 3G technologies and their evolutions become great concerns of telecom operators;while in terms of services,the key for 3G systems is their data services.As a result,high speed broadband data services see an era of rapid development.
文摘What tasks do technological changes taking place in the world impose on tax administrations,and at the same time,what opportunities do they create in enforcing the principle of public responsibility?How can innovations like European Digital Identity Wallet(EUDIW)be applied in the authentication environment?What assistance can the authorities provide in the integrity of taxpayers'business data?What developments are seen in the work of the Hungarian tax administration to use transaction-based data to contribute to a more modern public administration system and,last but not least,to a fair public burden?How does blockchain as a technology platform support data integrity?How does personalized and easy-to-understand communication revolutionize customer information?These questions are answered in this article.
基金Supported by the National Key Research and Development Program of China(2018YFB0504900,2018YFB0504905)。
文摘China began to develop its meteorological satellite program since 1969.With 50-years’growing,there are 17 Fengyun(FY)meteorological satellites launched successfully.At present,seven of them are in orbit to provide the operational service,including three polar orbiting meteorological satellites and four geostationary meteorological satellites.Since last COSPAR report,no new Fengyun satellite has been launched.The information of the on-orbit FY-2 series,FY-3 series,and FY-4 series has been updated.FY-3D and FY-2H satellites accomplished the commission test and transitioned into operation in 2018.FY-2E satellite completed its service to decommission in 2019.The web-based users and Direct Broadcasting(DB)users keep growing worldwide to require the Fengyun satellite data and products.A new Mobile Application Service has been launched to Fengyun users based on the cloud technology in 2018.In this report,the international and regional co-operations to facilitate the Fengyun user community have been addressed especially.To strengthen the data service in the Belt and Road countries,the Emergency Support Mechanism of Fengyun satellite(FY_ESM)has been established since 2018.Meanwhile,a Recalibrating 30-years’archived Fengyun satellite data project has been founded since 2018.This project targets to generate the Fundamental Climate Data Record(FCDR)as a space agency response to the Global Climate Observation System(GCOS).At last,the future Fengyun program up to 2025 has been introduced as well.
基金Supported by the National Key Research and Development Program of China(2018YFB0504900,2018YFB0504905)the National Project on Fengyun Meteorological Satellite Development。
文摘China’s efforts to develop Fengyun meteorological satellites have made major strides over the past 50 years,with the polar and geostationary meteorological satellite series achieving continuously stable operation to persistently provide data and product services globally.By the end of 2021,19 Chinese self-developed Fengyun meteorological satellites have been launched successfully.Seven of them are in operation at present,the data and products are widely applied to weather analysis,numerical weather forecasting and climate prediction,as well as environment and disaster monitoring.Since the last COSPAR report,FY-4B,the first new-generation operational geostationary satellite,and FY-3E,the first early-morning orbit satellite in China’s polar-orbiting meteorological satellite family have been launched in 2021.The characteristics of the two latest satellites and the instruments onboard are addressed in this report.The status of current Fengyun Satellites,product and data service and international cooperation and supporting activities has been introduced as well.
基金This work is developed with the support of the H2020 RISIS 2 Project(No.824091)and of the“Sapienza”Research Awards No.RM1161550376E40E of 2016 and RM11916B8853C925 of 2019.This article is a largely extended version of Bianchi et al.(2019)presented at the ISSI 2019 Conference held in Rome,2–5 September 2019.
文摘Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracted can be potentially updated with a frequency higher than once per year,and be safe from manipulations or misinterpretations.Moreover,this approach allows us flexibility in collecting indicators about the efficiency of universities’websites and their effectiveness in disseminating key contents.These new indicators can complement traditional indicators of scientific research(e.g.number of articles and number of citations)and teaching(e.g.number of students and graduates)by introducing further dimensions to allow new insights for“profiling”the analyzed universities.Design/methodology/approach:Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web.This study implements an advanced application of the webometric approach,exploiting all the three categories of web mining:web content mining;web structure mining;web usage mining.The information to compute our indicators has been extracted from the universities’websites by using web scraping and text mining techniques.The scraped information has been stored in a NoSQL DB according to a semistructured form to allow for retrieving information efficiently by text mining techniques.This provides increased flexibility in the design of new indicators,opening the door to new types of analyses.Some data have also been collected by means of batch interrogations of search engines(Bing,www.bing.com)or from a leading provider of Web analytics(SimilarWeb,http://www.similarweb.com).The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register(https://eter.joanneum.at/#/home),a database collecting information on Higher Education Institutions(HEIs)at European level.All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.Findings:The main findings of this study concern the evaluation of the potential in digitalization of universities,in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’websites.These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitations:The results reported in this study refers to Italian universities only,but the approach could be extended to other university systems abroad.Practical implications:The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites.The approach could be applied to other university systems.Originality/value:This work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping,optical character recognition and nontrivial text mining operations(Bruni&Bianchi,2020).
文摘Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems are among these application domains.Data Distribution Service(DDS)is a communication mechanism based on Data-Centric Publish-Subscribe(DCPS)model.It is used for distributed systems with real-time operational constraints.Java Message Service(JMS)is a messaging standard for enterprise systems using Service Oriented Architecture(SOA)for non-real-time operations.JMS allows Java programs to exchange messages in a loosely coupled fashion.JMS also supports sending and receiving messages using a messaging queue and a publish-subscribe interface.In this article,we propose an architecture enabling the automated integration of distributed real-time and non-real-time systems.We test our proposed architecture using a distributed Command,Control,Communications,Computers,and Intelligence(C4I)system.The system has DDS-based real-time Combat Management System components deployed to naval warships,and SOA-based non-real-time Command and Control components used at headquarters.The proposed solution enables the exchange of data between these two systems efficiently.We compare the proposed solution with a similar study.Our solution is superior in terms of automation support,ease of implementation,scalability,and performance.
基金the DAB4EDGE(GEO-DAB Support for European Direction in GEOSS Common Infrastructure Enhancements2018-2020+1 种基金ESA Contract No.4000123005/18/IT/CGD)project and from Horizon 2020 research and innovation programme under grant agreement N.776136(EDGE-European Direction in GEOSS Common Infrastructure Enhancements)N.101039118(GPP-GEOSS Platform Plus).
文摘This paper is the second of a series that describes some of the main dataset resources presently shared through the GEOSS Platform. The GEOSS Platform was created as the technological tool to implement interoperability among the Global Earth Observation System of Systems (GEOSS);it is a brokering infrastructure that presently brokers more than 190 autonomous data catalogs and information systems. This paper is focused on the analysis of the NextGEOSS datasets describing the data publishing process from NextGEOSS to the GEOSS platform. In particular, both the administrative registration and the technical registration were taken into consideration. One of the most important data shared by the GEOSS Platform are the NextGEOSS datasets: the present study provides some insights in terms of GEOSS user searches for NextGEOSS data.
文摘Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures.Researchers desire to shorten the workflows from data generation to analysis and publication,and the full workflow needs to become transparent to multiple stakeholders,including research administrators and funders.This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable,accessible,interoperable and reusable,but also doing so in a way that leverages machine support for better efficiency.One primary need to be addressed is that of findability,and achieving better findability has benefits for other aspects of data and workflow management.In this article,we describe how machine capabilities can be extended to make workflows more findable,in particular by leveraging the Digital Object Architecture,common object operations and machine learning techniques.
基金Supported by National Natural Science Foundation of China(42274217)。
文摘Fengyun meteorological satellites have undergone a series of significant developments over the past 50 years.Two generations,four types,and 21 Fengyun satellites have been developed and launched,with 9 currently operational in orbit.The data obtained from Fengyun satellites is employed in a multitude of applications,including weather forecasting,meteorological disaster prevention and reduction,climate change,global environmental monitoring,and space weather.These data products and services are made available to the global community,resulting in tangible social and economic benefits.In 2023,two Fengyun meteorological satellites were successfully launched.This report presents an overview of the two recently launched Fengyun satellites and currently in orbit Fengyun satellites,including an evaluation of their remote sensing instruments since 2022.Additionally,it addresses the subject of Fengyun satellite data archiving,data services,application services,international cooperation,and supporting activities.Furthermore,the development prospects have been outlined.
基金supported by the project of Beijing Municipal Science and Technology Commission and Science and Technology Innovation Base of Cultivating and Developing Engineering[grant number Z161100005016069]the National High Technology Research and Development Program[grant number 2013AA12A303].
文摘With the increase of different sensors,applications and customers,the demand from data providers and users is for a new geospatial data service model,which supports low cost,high dexterity,and which would provide a comprehensive service.Based on such requirements and demands,the 21AT TripleSat constellation terminal and data delivery and management system has been developed by a Beijing based high-tech enterprise,Twenty First Century Aerospace Technology Co.,Ltd.(21AT).The company is the first commercial Earth observation satellite operator and service provider in China.This new geospatial data service model allows the user to directly access multi-source satellite data,manage the data order,and carry out automatic massive data production and delivery.The solution also implements safe and hierarchical user management,statistical data analysis,and automatic information reports.In addition,a mobile application is also available for users to easily access system functions.This new geospatial solution has already been successfully applied and installed in many customer sites in China,and is now available globally for international clients interested in fast geospatial solutions.It enables the success of customers’operational services.Besides providing TripleSat Constellation images,the multi-source data access system also allows the users to access other satellite data sources,based on customized agreement.This paper describes and discusses this new geospatial data service model.
基金This work is supported by National Science Foundation of China (No.70472073).
文摘To study characteristics and market structure of data service and the role ofoperator, this paper makes a commercial model by applying the theory of intermediaries andneoclassic economics. Data service has different economic characteristics from voice service.Firstly, production mode of data service is roundabout production, secondly, driving power of dataservice is economies of specialization, and finally, management method of data service is impersonalmanagement. In data service market, information asymmetry and barrier to entry determinetransaction efficiency and the specialization level of service providers indirectly. Therefore,operator should intervene in the market by offering trade service in order to promote development ofservice providers. Because of different quality of service providers, market structure of dataservice must be the state that trade platform built by operator and intermediary platform built byoperator coexists.
基金supported by the Open Project Program of Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks(No.WSNLBKF201503)the Fundamental Research Funds for the Central Universities(No.2016JBM011)+2 种基金Fundamental Research Funds for the Central Universities(No.2014ZD03-03)the Priority Academic Program Development of Jiangsu Higher Education InstitutionsJiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology
文摘With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on Iaa S(DHCI) architecture, which includes four key modules: monitoring,scheduling, Virtual Machine(VM) management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Open Stack.Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘The Virus Outbreak Data Network(VODAN)-Africa aims to contribute to the publication of Findable Accessible, Interoperable, and Reusable(FAIR) health data under well-defined access conditions. The next step in the VODAN-Africa architecture is to locally deploy the Center for Expanded Data Annotation and Retrieval(CEDAR) and arrange accessibility based on the ‘data visiting’ concept. Locally curated and reposited machine-actionable data can be visited by queries or algorithms, provided that the conditions of access are met. The goal is to enable the multiple(re)use of data with secure access functionality by clinicians(patient care), an idea aligned with the FAIR-based Personal Health Train(PHT) concept. The privacy and security requirements in relation to the FAIR Data Host and the FAIRification workspace(to produce metadata) or dashboard(for the patient) must be clear to design the IT architecture. This article describes a(first) practice, a reference implementation in development, within the VODAN-Africa and Leiden University Medical Center community.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network(VODAN) architecture builds on these principles, using the European Union’s General Data Protection Regulation(GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.
基金funded by the Deutsche Forschungsgemeinschaft(DFG,German Research Foundation)under Germany's Excellence Strategy-EXC 2037 CLICCS-Climate,Climatic Change,and Society-Project No.390683824.
文摘In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulationbased research.We first identified the different users and user groups in simulation-based climate science based on their reasons for using the resources provided at the German Climate Computing Center(DKRZ).What is special about this is that the DKRZ provides the climate science community with resources like high performance computing(HPC),data storage and specialised services,and hosts the World Data Center for Climate(WDCC).Therefore,users can perform their entire research workflows up to the publication of the data on the same infrastructure.Our analysis shows,that the resources are used by two primary user types:those who require the HPC-system to perform resource intensive simulations to subsequently analyse them and those who reuse,build-on and analyse existing data.We then further subdivided these top-level user categories based on their specific goals and analysed their typical,idealised workflows applied to achieve the respective project goals.We find that due to the subdivision and further granulation of the user groups,the workflows show apparent differences.Nevertheless,similar"Canonical Workflow Modules"can be clearly made out.These modules are"Data and Software(Re)use","Compute","Data and Software Storing","Data and Software Publication","Generating Knowledge"and in their entirety form the basis for a Canonical Workflow Framework for Research(CWFR).It is desirable that parts of the workflows in a CWFR act as FDOs,but we view this aspect critically.Also,we reflect on the question whether the derivation of Canonical Workflow modules from the analysis of current user behaviour still holds for future systems and work processes.