As an introductory course for the emerging major of big data management and application,“Introduction to Big Data”has not yet formed a curriculum standard and implementation plan that is widely accepted and used by ...As an introductory course for the emerging major of big data management and application,“Introduction to Big Data”has not yet formed a curriculum standard and implementation plan that is widely accepted and used by everyone.To this end,we discuss some of our explorations and attempts in the construction and teaching process of big data courses for the major of big data management and application from the perspective of course planning,course implementation,and course summary.After interviews with students and feedback from questionnaires,students are highly satisfied with some of the teaching measures and programs currently adopted.展开更多
Due to the extensive use of various intelligent terminals and the popularity of network social tools,a large amount of data in the field of medical emerged.How to manage these massive data safely and reliably has beco...Due to the extensive use of various intelligent terminals and the popularity of network social tools,a large amount of data in the field of medical emerged.How to manage these massive data safely and reliably has become an important challenge for the medical network community.This paper proposes a data management framework of medical network community based on Consortium Blockchain(CB)and Federated learning(FL),which realizes the data security sharing between medical institutions and research institutions.Under this framework,the data security sharing mechanism of medical network community based on smart contract and the data privacy protection mechanism based on FL and alliance chain are designed to ensure the security of data and the privacy of important data in medical network community,respectively.An intelligent contract system based on Keyed-Homomorphic Public Key(KH-PKE)Encryption scheme is designed,so that medical data can be saved in the CB in the form of ciphertext,and the automatic sharing of data is realized.Zero knowledge mechanism is used to ensure the correctness of shared data.Moreover,the zero-knowledge mechanism introduces the dynamic group signature mechanism of chosen ciphertext attack(CCA)anonymity,which makes the scheme more efficient in computing and communication cost.In the end of this paper,the performance of the scheme is analyzed fromboth asymptotic and practical aspects.Through experimental comparative analysis,the scheme proposed in this paper is more effective and feasible.展开更多
Connected and autonomous vehicles are seeing their dawn at this moment.They provide numerous benefits to vehicle owners,manufacturers,vehicle service providers,insurance companies,etc.These vehicles generate a large a...Connected and autonomous vehicles are seeing their dawn at this moment.They provide numerous benefits to vehicle owners,manufacturers,vehicle service providers,insurance companies,etc.These vehicles generate a large amount of data,which makes privacy and security a major challenge to their success.The complicated machine-led mechanics of connected and autonomous vehicles increase the risks of privacy invasion and cyber security violations for their users by making them more susceptible to data exploitation and vulnerable to cyber-attacks than any of their predecessors.This could have a negative impact on how well-liked CAVs are with the general public,give them a poor name at this early stage of their development,put obstacles in the way of their adoption and expanded use,and complicate the economic models for their future operations.On the other hand,congestion is still a bottleneck for traffic management and planning.This research paper presents a blockchain-based framework that protects the privacy of vehicle owners and provides data security by storing vehicular data on the blockchain,which will be used further for congestion detection and mitigation.Numerous devices placed along the road are used to communicate with passing cars and collect their data.The collected data will be compiled periodically to find the average travel time of vehicles and traffic density on a particular road segment.Furthermore,this data will be stored in the memory pool,where other devices will also store their data.After a predetermined amount of time,the memory pool will be mined,and data will be uploaded to the blockchain in the form of blocks that will be used to store traffic statistics.The information is then used in two different ways.First,the blockchain’s final block will provide real-time traffic data,triggering an intelligent traffic signal system to reduce congestion.Secondly,the data stored on the blockchain will provide historical,statistical data that can facilitate the analysis of traffic conditions according to past behavior.展开更多
The mining industry faces a number of challenges that promote the adoption of new technologies.Big data,which is driven by the accelerating progress of information and communication technology,is one of the promising ...The mining industry faces a number of challenges that promote the adoption of new technologies.Big data,which is driven by the accelerating progress of information and communication technology,is one of the promising technologies that can reshape the entire mining landscape.Despite numerous attempts to apply big data in the mining industry,fundamental problems of big data,especially big data management(BDM),in the mining industry persist.This paper aims to fill the gap by presenting the basics of BDM.This work provides a brief introduction to big data and BDM,and it discusses the challenges encountered by the mining industry to indicate the necessity of implementing big data.It also summarizes data sources in the mining industry and presents the potential benefits of big data to the mining industry.This work also envisions a future in which a global database project is established and big data is used together with other technologies(i.e.,automation),supported by government policies and following international standards.This paper also outlines the precautions for the utilization of BDM in the mining industry.展开更多
The wealth of user data acts as a fuel for network intelligence toward the sixth generation wireless networks(6G).Due to data heterogeneity and dynamics,decentralized data management(DM)is desirable for achieving tran...The wealth of user data acts as a fuel for network intelligence toward the sixth generation wireless networks(6G).Due to data heterogeneity and dynamics,decentralized data management(DM)is desirable for achieving transparent data operations across network domains,and blockchain can be a promising solution.However,the increasing data volume and stringent data privacy-preservation requirements in 6G bring significantly technical challenge to balance transparency,efficiency,and privacy requirements in decentralized blockchain-based DM.In this paper,we investigate blockchain solutions to address the challenge.First,we explore the consensus protocols and scalability mechanisms in blockchains and discuss the roles of DM stakeholders in blockchain architectures.Second,we investigate the authentication and authorization requirements for DM stakeholders.Third,we categorize DM privacy requirements and study blockchain-based mechanisms for collaborative data processing.Subsequently,we present research issues and potential solutions for blockchain-based DM toward 6G from these three perspectives.Finally,we conclude this paper and discuss future research directions.展开更多
Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s in...Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.展开更多
Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune...Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system. 展开更多
This paper describes how database information and electronic 3D models are integrated to produce power plant designs more efficiently and accurately. Engineering CAD/CAE systems have evolved from strictly 3D modeling ...This paper describes how database information and electronic 3D models are integrated to produce power plant designs more efficiently and accurately. Engineering CAD/CAE systems have evolved from strictly 3D modeling to spatial data management tools. This paper describes how process data, commodities, and location data are disseminated to the various project team members through a central integrated database. The database and 3D model also provide a cache of information that is valuable to the constructor, and operations and maintenance Personnel.展开更多
The next generation of high-power lasers enables repetition of experiments at orders of magnitude higher frequency than what was possible using the prior generation.Facilities requiring human intervention between lase...The next generation of high-power lasers enables repetition of experiments at orders of magnitude higher frequency than what was possible using the prior generation.Facilities requiring human intervention between laser repetitions need to adapt in order to keep pace with the new laser technology.A distributed networked control system can enable laboratory-wide automation and feedback control loops.These higher-repetition-rate experiments will create enormous quantities of data.A consistent approach to managing data can increase data accessibility,reduce repetitive data-software development and mitigate poorly organized metadata.An opportunity arises to share knowledge of improvements to control and data infrastructure currently being undertaken.We compare platforms and approaches to state-of-the-art control systems and data management at high-power laser facilities,and we illustrate these topics with case studies from our community.展开更多
This study analyzed the concept of time efficiency in the data management process associated with the personnel training and competence assessments in one of the quality control (QC) laboratories of Nigeria’s Foods a...This study analyzed the concept of time efficiency in the data management process associated with the personnel training and competence assessments in one of the quality control (QC) laboratories of Nigeria’s Foods and Drugs Authority (NAFDAC). The laboratory administrators were burdened with a lot of mental and paper-based record keeping because the personnel training’s data were managed manually, hence not efficiently processed. The Excel spreadsheet provided by a Purdue doctoral dissertation as a remedial to this challenge was found to be deficient in handling operations in database tables, and therefore did not appropriately address the inefficiencies. Purpose: This study aimed to reduce the time it essentially takes to generate, obtain, manipulate, exchange, and securely store data that are associated with personnel competence training and assessments. Method: The study developed a software system that was integrated with a relational database management system (RDBMS) to improve manual/Excel-based data management procedures. To validate the efficiency of the software the mean operational times in using the Excel-based format were compared with that of the “New” software system. The data were obtained by performing four predefined core tasks for five hypothetical subjects using Excel and the “New” system (the model system) respectively. Results: It was verified that the average time to accomplish the specified tasks using the “New” system (37.08 seconds) was significantly (p = 0.00191, α = 0.05) lower than the time measurements for the Excel system (77.39 seconds) in the ANACHEM laboratory. The RDBMS-based “New” system provided operational (time) efficiency in the personnel training and competence assessment process in the QC laboratory and reduced human errors.展开更多
The Internet of Medical Things(IoMT)is an online device that senses and transmits medical data from users to physicians within a time interval.In,recent years,IoMT has rapidly grown in the medicalfield to provide heal...The Internet of Medical Things(IoMT)is an online device that senses and transmits medical data from users to physicians within a time interval.In,recent years,IoMT has rapidly grown in the medicalfield to provide healthcare services without physical appearance.With the use of sensors,IoMT applications are used in healthcare management.In such applications,one of the most important factors is data security,given that its transmission over the network may cause obtrusion.For data security in IoMT systems,blockchain is used due to its numerous blocks for secure data storage.In this study,Blockchain-assisted secure data management framework(BSDMF)and Proof of Activity(PoA)protocol using malicious code detection algorithm is used in the proposed data security for the healthcare system.The main aim is to enhance the data security over the networks.The PoA protocol enhances high security of data from the literature review.By replacing the malicious node from the block,the PoA can provide high security for medical data in the blockchain.Comparison with existing systems shows that the proposed simulation with BSD-Malicious code detection algorithm achieves higher accuracy ratio,precision ratio,security,and efficiency and less response time for Blockchain-enabled healthcare systems.展开更多
Management of poultry farms in China mostly relies on manual labor.Since such a large amount of valuable data for the production process either are saved incomplete or saved only as paper documents,making it very diff...Management of poultry farms in China mostly relies on manual labor.Since such a large amount of valuable data for the production process either are saved incomplete or saved only as paper documents,making it very difficult for data retrieve,processing and analysis.An integrated cloud-based data management system(CDMS)was proposed in this study,in which the asynchronous data transmission,distributed file system,and wireless network technology were used for information collection,management and sharing in large-scale egg production.The cloud-based platform can provide information technology infrastructures for different farms.The CDMS can also allocate the computing resources and storage space based on demand.A real-time data acquisition software was developed,which allowed farm management staff to submit reports through website or smartphone,enabled digitization of production data.The use of asynchronous transfer in the system can avoid potential data loss during the transmission between farms and the remote cloud data center.All the valid historical data of poultry farms can be stored to the remote cloud data center,and then eliminates the need for large server clusters on the farms.Users with proper identification can access the online data portal of the system through a browser or an APP from anywhere worldwide.展开更多
Effective stewardship of data is a critical precursor to making data FAIR.The goal of this paper is to bring an overview of current state of the art of data management and data stewardship planning solutions(DMP).We b...Effective stewardship of data is a critical precursor to making data FAIR.The goal of this paper is to bring an overview of current state of the art of data management and data stewardship planning solutions(DMP).We begin by arguing why data management is an important vehicle supporting adoption and implementation of the FAIR principles,we describe the background,context and historical development,as well as major driving forces,being research initiatives and funders.Then we provide an overview of the current leading DMP tools in the form of a table presenting the key characteristics.Next,we elaborate on emerging common standards for DMPs,especially the topic of machine-actionable DMPs.As sound DMP is not only a precursor of FAIR data stewardship,but also an integral part of it,we discuss its positioning in the emerging FAIR tools ecosystem.Capacity building and training activities are an important ingredient in the whole effort.Although not being the primary goal of this paper,we touch also the topic of research workforce support,as tools can be just as much effective as their users are competent to use them properly.We conclude by discussing the relations of DMP to FAIR principles,as there are other important connections than just being a precursor.展开更多
Spatial vector data with high-precision and wide-coverage has exploded globally,such as land cover,social media,and other data-sets,which provides a good opportunity to enhance the national macroscopic decision-making...Spatial vector data with high-precision and wide-coverage has exploded globally,such as land cover,social media,and other data-sets,which provides a good opportunity to enhance the national macroscopic decision-making,social supervision,public services,and emergency capabilities.Simultaneously,it also brings great challenges in management technology for big spatial vector data(BSVD).In recent years,a large number of new concepts,parallel algorithms,processing tools,platforms,and applications have been proposed and developed to improve the value of BSVD from both academia and industry.To better understand BSVD and take advantage of its value effectively,this paper presents a review that surveys recent studies and research work in the data management field for BSVD.In this paper,we discuss and itemize this topic from three aspects according to different information technical levels of big spatial vector data management.It aims to help interested readers to learn about the latest research advances and choose the most suitable big data technologies and approaches depending on their system architectures.To support them more fully,firstly,we identify new concepts and ideas from numerous scholars about geographic information system to focus on BSVD scope in the big data era.Then,we conclude systematically not only the most recent published literatures but also a global view of main spatial technologies of BSVD,including data storage and organization,spatial index,processing methods,and spatial analysis.Finally,based on the above commentary and related work,several opportunities and challenges are listed as the future research interests and directions for reference.展开更多
Data availability is of vital importance for marine and oceanographic research but most of the European data are fragmented,not always validated and not easily accessible.In the countries bordering the European seas,m...Data availability is of vital importance for marine and oceanographic research but most of the European data are fragmented,not always validated and not easily accessible.In the countries bordering the European seas,more than 1000 scientific laboratories from governmental organisations and private industry collect data using various sensors on board of research vessels,submarines,fixed and drifting platforms,aeroplanes and satellites to measure physical,geophysical,geological,biological and chemical parameters,biological species and others.SeaDataNet is an Integrated Research Infrastructure Initiative(I3)(2006-2011)in the EU FP6 framework programme.It is developing an efficient distributed Pan-European marine data management infrastructure for managing these large and diverse data sets.It is interconnecting the existing professional data centres of 35 countries,active in data collection and providing integrated databases of standardised quality on-line.This article describes the architecture and the features of the SeaDataNet infrastructure.In particular it describes the way interoperability is achieved between all the contributing data centres.Finally it highlights the on-going developments and challenges.展开更多
Research Data Management(RDM)has become increasingly important for more and more academic institutions.Using the Peking University Open Research Data Repository(PKU-ORDR)project as an example,this paper will review a ...Research Data Management(RDM)has become increasingly important for more and more academic institutions.Using the Peking University Open Research Data Repository(PKU-ORDR)project as an example,this paper will review a library-based university-wide open research data repository project and related RDM services implementation process including project kickoff,needs assessment,partnerships establishment,software investigation and selection,software customization,as well as data curation services and training.Through the review,some issues revealed during the stages of the implementation process are also discussed and addressed in the paper such as awareness of research data,demands from data providers and users,data policies and requirements from home institution,requirements from funding agencies and publishers,the collaboration between administrative units and libraries,and concerns from data providers and users.The significance of the study is that the paper shows an example of creating an Open Data repository and RDM services for other Chinese academic libraries planning to implement their RDM services for their home institutions.The authors of the paper have also observed since the PKU-ORDR and RDM services implemented in 2015,the Peking University Library(PKUL)has helped numerous researchers to support the entire research life cycle and enhanced Open Science(OS)practices on campus,as well as impacted the national OS movement in China through various national events and activities hosted by the PKUL.展开更多
A growing number of research funding organizations(RFOs)are taking responsibility to increase the scientific and social impact of research output.Also reusable research data are recognized as relevant output for gaini...A growing number of research funding organizations(RFOs)are taking responsibility to increase the scientific and social impact of research output.Also reusable research data are recognized as relevant output for gaining impact.RFOs are therefore promoting FAIR research data management and stewardship(RDM)in their research funding cycle.However,the implementation of FAIR RDM still faces important obstacles and challenges.To solve these,stakeholders work together to develop innovative tools and practices.Here we elaborate on the role of RFOs in developing a FAIR funding model to support the FAIR RDM in the funding cycle,integrated with research community specific guidance,criteria and metadata,and enabling automatic assessments of progress and output from RDM.The model facilitates to create research data with a high level of FAIRness that are meaningful for a research community.To fully benefit from the model,RFOs,research institutions and service providers need to implement machine actionability in their FAIR RDM tools and procedures.As many stakeholders still need to get familiar with“human actionable”FAIR data practices,the introduction of the model will be stepwise,with an active role of the RFOs in driving FAIR RDM processes as effectively as possible.展开更多
Data security and privacy issues are magnified by the volume,the variety,and the velocity of Big Data and by the lack,up to now,of a reference data model and related data manipulation languages.In this paper,we focus ...Data security and privacy issues are magnified by the volume,the variety,and the velocity of Big Data and by the lack,up to now,of a reference data model and related data manipulation languages.In this paper,we focus on one of the key data security services,that is,access control,by highlighting the differences with traditional data management systems and describing a set of requirements that any access control solution for Big Data platforms may fulfill.We then describe the state of the art and discuss open research issues.展开更多
Forensic investigations,especially those related to missing persons and unidentified remains,produce different types of data that must be managed and understood.The data collected and produced are extensive and origin...Forensic investigations,especially those related to missing persons and unidentified remains,produce different types of data that must be managed and understood.The data collected and produced are extensive and originate from various sources:the police,non-governmental organizations(NGOs),medical examiner offices,specialised forensic teams,family members,and others.Some examples of information include,but are not limited to,the investigative background information,excavation data of burial sites,antemortem data on missing persons,and postmortem data on the remains of unidentified individuals.These complex data must be stored in a secured place,analysed,compared,shared,and then reported to the investigative actors and the public,especially the families of missing persons,who should be kept informed of the investigation.Therefore,a data management system with the capability of performing the tasks relevant to the goals of the investigation and the identification of an individual,while respecting the deceased and their families,is critical for standardising investigations.Data management is crucial to assure the quality of investigative processes,and it must be recognised as a holistic integrated system.The aim of this article is to discuss some of the most important components of an effective forensic data management system.The discussion is enriched by examples,challenges,and lessons learned from the erratic development and launching of databases for missing and unidentified persons in Brazil.The main objective of this article is to bring attention to the urgent need for an effective and integrated system in Brazil.展开更多
With the wide application of advanced information and communication technology(ICT),power systems are becoming more reliable,more efficient and self-healing.Meanwhile more sophisticated cyber-attacks have appeared,e.g...With the wide application of advanced information and communication technology(ICT),power systems are becoming more reliable,more efficient and self-healing.Meanwhile more sophisticated cyber-attacks have appeared,e.g.false data injection(FDI)attacks,which deeply affect the state estimation of power systems and can lead to destructive consequences.To better manage and protect measurement data in power systems,we propose a blockchain-based multi-chain framework,taking advantage of the existing infrastructure.In this framework,measurements from sensors are mined into blocks by base stations using Practical Byzantine Fault Tolerance(PBFT)as the consensus protocol.We analyze the security of the proposed framework and carry out simulations to show its superiority compared to existing systems.The result of the simulations further provides guidance for how to structure the networking in the proposed framework.展开更多
文摘As an introductory course for the emerging major of big data management and application,“Introduction to Big Data”has not yet formed a curriculum standard and implementation plan that is widely accepted and used by everyone.To this end,we discuss some of our explorations and attempts in the construction and teaching process of big data courses for the major of big data management and application from the perspective of course planning,course implementation,and course summary.After interviews with students and feedback from questionnaires,students are highly satisfied with some of the teaching measures and programs currently adopted.
基金supported by the NSFC(No.62072249)Yongjun Ren received the grant and the URLs to sponsors’websites is https://www.nsfc.gov.cn/.
文摘Due to the extensive use of various intelligent terminals and the popularity of network social tools,a large amount of data in the field of medical emerged.How to manage these massive data safely and reliably has become an important challenge for the medical network community.This paper proposes a data management framework of medical network community based on Consortium Blockchain(CB)and Federated learning(FL),which realizes the data security sharing between medical institutions and research institutions.Under this framework,the data security sharing mechanism of medical network community based on smart contract and the data privacy protection mechanism based on FL and alliance chain are designed to ensure the security of data and the privacy of important data in medical network community,respectively.An intelligent contract system based on Keyed-Homomorphic Public Key(KH-PKE)Encryption scheme is designed,so that medical data can be saved in the CB in the form of ciphertext,and the automatic sharing of data is realized.Zero knowledge mechanism is used to ensure the correctness of shared data.Moreover,the zero-knowledge mechanism introduces the dynamic group signature mechanism of chosen ciphertext attack(CCA)anonymity,which makes the scheme more efficient in computing and communication cost.In the end of this paper,the performance of the scheme is analyzed fromboth asymptotic and practical aspects.Through experimental comparative analysis,the scheme proposed in this paper is more effective and feasible.
基金funded by the Deanship of Scientific Research at King Khalid University,Kingdom of Saudi Arabia for large group Research Project under grant number:RGP2/249/44.
文摘Connected and autonomous vehicles are seeing their dawn at this moment.They provide numerous benefits to vehicle owners,manufacturers,vehicle service providers,insurance companies,etc.These vehicles generate a large amount of data,which makes privacy and security a major challenge to their success.The complicated machine-led mechanics of connected and autonomous vehicles increase the risks of privacy invasion and cyber security violations for their users by making them more susceptible to data exploitation and vulnerable to cyber-attacks than any of their predecessors.This could have a negative impact on how well-liked CAVs are with the general public,give them a poor name at this early stage of their development,put obstacles in the way of their adoption and expanded use,and complicate the economic models for their future operations.On the other hand,congestion is still a bottleneck for traffic management and planning.This research paper presents a blockchain-based framework that protects the privacy of vehicle owners and provides data security by storing vehicular data on the blockchain,which will be used further for congestion detection and mitigation.Numerous devices placed along the road are used to communicate with passing cars and collect their data.The collected data will be compiled periodically to find the average travel time of vehicles and traffic density on a particular road segment.Furthermore,this data will be stored in the memory pool,where other devices will also store their data.After a predetermined amount of time,the memory pool will be mined,and data will be uploaded to the blockchain in the form of blocks that will be used to store traffic statistics.The information is then used in two different ways.First,the blockchain’s final block will provide real-time traffic data,triggering an intelligent traffic signal system to reduce congestion.Secondly,the data stored on the blockchain will provide historical,statistical data that can facilitate the analysis of traffic conditions according to past behavior.
文摘The mining industry faces a number of challenges that promote the adoption of new technologies.Big data,which is driven by the accelerating progress of information and communication technology,is one of the promising technologies that can reshape the entire mining landscape.Despite numerous attempts to apply big data in the mining industry,fundamental problems of big data,especially big data management(BDM),in the mining industry persist.This paper aims to fill the gap by presenting the basics of BDM.This work provides a brief introduction to big data and BDM,and it discusses the challenges encountered by the mining industry to indicate the necessity of implementing big data.It also summarizes data sources in the mining industry and presents the potential benefits of big data to the mining industry.This work also envisions a future in which a global database project is established and big data is used together with other technologies(i.e.,automation),supported by government policies and following international standards.This paper also outlines the precautions for the utilization of BDM in the mining industry.
基金supported by research grants from Huawei Technologies Canada and from the Natural Sciences and Engineering Research Council(NSERC)of Canada.
文摘The wealth of user data acts as a fuel for network intelligence toward the sixth generation wireless networks(6G).Due to data heterogeneity and dynamics,decentralized data management(DM)is desirable for achieving transparent data operations across network domains,and blockchain can be a promising solution.However,the increasing data volume and stringent data privacy-preservation requirements in 6G bring significantly technical challenge to balance transparency,efficiency,and privacy requirements in decentralized blockchain-based DM.In this paper,we investigate blockchain solutions to address the challenge.First,we explore the consensus protocols and scalability mechanisms in blockchains and discuss the roles of DM stakeholders in blockchain architectures.Second,we investigate the authentication and authorization requirements for DM stakeholders.Third,we categorize DM privacy requirements and study blockchain-based mechanisms for collaborative data processing.Subsequently,we present research issues and potential solutions for blockchain-based DM toward 6G from these three perspectives.Finally,we conclude this paper and discuss future research directions.
文摘Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.
文摘Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system.
文摘This paper describes how database information and electronic 3D models are integrated to produce power plant designs more efficiently and accurately. Engineering CAD/CAE systems have evolved from strictly 3D modeling to spatial data management tools. This paper describes how process data, commodities, and location data are disseminated to the various project team members through a central integrated database. The database and 3D model also provide a cache of information that is valuable to the constructor, and operations and maintenance Personnel.
基金A.J.acknowledges the support from DOE Grant#DESC0016804.
文摘The next generation of high-power lasers enables repetition of experiments at orders of magnitude higher frequency than what was possible using the prior generation.Facilities requiring human intervention between laser repetitions need to adapt in order to keep pace with the new laser technology.A distributed networked control system can enable laboratory-wide automation and feedback control loops.These higher-repetition-rate experiments will create enormous quantities of data.A consistent approach to managing data can increase data accessibility,reduce repetitive data-software development and mitigate poorly organized metadata.An opportunity arises to share knowledge of improvements to control and data infrastructure currently being undertaken.We compare platforms and approaches to state-of-the-art control systems and data management at high-power laser facilities,and we illustrate these topics with case studies from our community.
文摘This study analyzed the concept of time efficiency in the data management process associated with the personnel training and competence assessments in one of the quality control (QC) laboratories of Nigeria’s Foods and Drugs Authority (NAFDAC). The laboratory administrators were burdened with a lot of mental and paper-based record keeping because the personnel training’s data were managed manually, hence not efficiently processed. The Excel spreadsheet provided by a Purdue doctoral dissertation as a remedial to this challenge was found to be deficient in handling operations in database tables, and therefore did not appropriately address the inefficiencies. Purpose: This study aimed to reduce the time it essentially takes to generate, obtain, manipulate, exchange, and securely store data that are associated with personnel competence training and assessments. Method: The study developed a software system that was integrated with a relational database management system (RDBMS) to improve manual/Excel-based data management procedures. To validate the efficiency of the software the mean operational times in using the Excel-based format were compared with that of the “New” software system. The data were obtained by performing four predefined core tasks for five hypothetical subjects using Excel and the “New” system (the model system) respectively. Results: It was verified that the average time to accomplish the specified tasks using the “New” system (37.08 seconds) was significantly (p = 0.00191, α = 0.05) lower than the time measurements for the Excel system (77.39 seconds) in the ANACHEM laboratory. The RDBMS-based “New” system provided operational (time) efficiency in the personnel training and competence assessment process in the QC laboratory and reduced human errors.
基金Taif University Researchers Supporting Project Number(TURSP-2020/98),Taif University,Taif,Saudi Arabia.
文摘The Internet of Medical Things(IoMT)is an online device that senses and transmits medical data from users to physicians within a time interval.In,recent years,IoMT has rapidly grown in the medicalfield to provide healthcare services without physical appearance.With the use of sensors,IoMT applications are used in healthcare management.In such applications,one of the most important factors is data security,given that its transmission over the network may cause obtrusion.For data security in IoMT systems,blockchain is used due to its numerous blocks for secure data storage.In this study,Blockchain-assisted secure data management framework(BSDMF)and Proof of Activity(PoA)protocol using malicious code detection algorithm is used in the proposed data security for the healthcare system.The main aim is to enhance the data security over the networks.The PoA protocol enhances high security of data from the literature review.By replacing the malicious node from the block,the PoA can provide high security for medical data in the blockchain.Comparison with existing systems shows that the proposed simulation with BSD-Malicious code detection algorithm achieves higher accuracy ratio,precision ratio,security,and efficiency and less response time for Blockchain-enabled healthcare systems.
基金the“12th Five-Year-Plan”for National Science and Technology for Rural Development in China(No.2014BAD08B05).
文摘Management of poultry farms in China mostly relies on manual labor.Since such a large amount of valuable data for the production process either are saved incomplete or saved only as paper documents,making it very difficult for data retrieve,processing and analysis.An integrated cloud-based data management system(CDMS)was proposed in this study,in which the asynchronous data transmission,distributed file system,and wireless network technology were used for information collection,management and sharing in large-scale egg production.The cloud-based platform can provide information technology infrastructures for different farms.The CDMS can also allocate the computing resources and storage space based on demand.A real-time data acquisition software was developed,which allowed farm management staff to submit reports through website or smartphone,enabled digitization of production data.The use of asynchronous transfer in the system can avoid potential data loss during the transmission between farms and the remote cloud data center.All the valid historical data of poultry farms can be stored to the remote cloud data center,and then eliminates the need for large server clusters on the farms.Users with proper identification can access the online data portal of the system through a browser or an APP from anywhere worldwide.
文摘Effective stewardship of data is a critical precursor to making data FAIR.The goal of this paper is to bring an overview of current state of the art of data management and data stewardship planning solutions(DMP).We begin by arguing why data management is an important vehicle supporting adoption and implementation of the FAIR principles,we describe the background,context and historical development,as well as major driving forces,being research initiatives and funders.Then we provide an overview of the current leading DMP tools in the form of a table presenting the key characteristics.Next,we elaborate on emerging common standards for DMPs,especially the topic of machine-actionable DMPs.As sound DMP is not only a precursor of FAIR data stewardship,but also an integral part of it,we discuss its positioning in the emerging FAIR tools ecosystem.Capacity building and training activities are an important ingredient in the whole effort.Although not being the primary goal of this paper,we touch also the topic of research workforce support,as tools can be just as much effective as their users are competent to use them properly.We conclude by discussing the relations of DMP to FAIR principles,as there are other important connections than just being a precursor.
基金This work is supported by the Strategic Priority Research Program of Chinese Academy of Sciences[grant number XDA19020201].
文摘Spatial vector data with high-precision and wide-coverage has exploded globally,such as land cover,social media,and other data-sets,which provides a good opportunity to enhance the national macroscopic decision-making,social supervision,public services,and emergency capabilities.Simultaneously,it also brings great challenges in management technology for big spatial vector data(BSVD).In recent years,a large number of new concepts,parallel algorithms,processing tools,platforms,and applications have been proposed and developed to improve the value of BSVD from both academia and industry.To better understand BSVD and take advantage of its value effectively,this paper presents a review that surveys recent studies and research work in the data management field for BSVD.In this paper,we discuss and itemize this topic from three aspects according to different information technical levels of big spatial vector data management.It aims to help interested readers to learn about the latest research advances and choose the most suitable big data technologies and approaches depending on their system architectures.To support them more fully,firstly,we identify new concepts and ideas from numerous scholars about geographic information system to focus on BSVD scope in the big data era.Then,we conclude systematically not only the most recent published literatures but also a global view of main spatial technologies of BSVD,including data storage and organization,spatial index,processing methods,and spatial analysis.Finally,based on the above commentary and related work,several opportunities and challenges are listed as the future research interests and directions for reference.
文摘Data availability is of vital importance for marine and oceanographic research but most of the European data are fragmented,not always validated and not easily accessible.In the countries bordering the European seas,more than 1000 scientific laboratories from governmental organisations and private industry collect data using various sensors on board of research vessels,submarines,fixed and drifting platforms,aeroplanes and satellites to measure physical,geophysical,geological,biological and chemical parameters,biological species and others.SeaDataNet is an Integrated Research Infrastructure Initiative(I3)(2006-2011)in the EU FP6 framework programme.It is developing an efficient distributed Pan-European marine data management infrastructure for managing these large and diverse data sets.It is interconnecting the existing professional data centres of 35 countries,active in data collection and providing integrated databases of standardised quality on-line.This article describes the architecture and the features of the SeaDataNet infrastructure.In particular it describes the way interoperability is achieved between all the contributing data centres.Finally it highlights the on-going developments and challenges.
文摘Research Data Management(RDM)has become increasingly important for more and more academic institutions.Using the Peking University Open Research Data Repository(PKU-ORDR)project as an example,this paper will review a library-based university-wide open research data repository project and related RDM services implementation process including project kickoff,needs assessment,partnerships establishment,software investigation and selection,software customization,as well as data curation services and training.Through the review,some issues revealed during the stages of the implementation process are also discussed and addressed in the paper such as awareness of research data,demands from data providers and users,data policies and requirements from home institution,requirements from funding agencies and publishers,the collaboration between administrative units and libraries,and concerns from data providers and users.The significance of the study is that the paper shows an example of creating an Open Data repository and RDM services for other Chinese academic libraries planning to implement their RDM services for their home institutions.The authors of the paper have also observed since the PKU-ORDR and RDM services implemented in 2015,the Peking University Library(PKUL)has helped numerous researchers to support the entire research life cycle and enhanced Open Science(OS)practices on campus,as well as impacted the national OS movement in China through various national events and activities hosted by the PKUL.
文摘A growing number of research funding organizations(RFOs)are taking responsibility to increase the scientific and social impact of research output.Also reusable research data are recognized as relevant output for gaining impact.RFOs are therefore promoting FAIR research data management and stewardship(RDM)in their research funding cycle.However,the implementation of FAIR RDM still faces important obstacles and challenges.To solve these,stakeholders work together to develop innovative tools and practices.Here we elaborate on the role of RFOs in developing a FAIR funding model to support the FAIR RDM in the funding cycle,integrated with research community specific guidance,criteria and metadata,and enabling automatic assessments of progress and output from RDM.The model facilitates to create research data with a high level of FAIRness that are meaningful for a research community.To fully benefit from the model,RFOs,research institutions and service providers need to implement machine actionability in their FAIR RDM tools and procedures.As many stakeholders still need to get familiar with“human actionable”FAIR data practices,the introduction of the model will be stepwise,with an active role of the RFOs in driving FAIR RDM processes as effectively as possible.
文摘Data security and privacy issues are magnified by the volume,the variety,and the velocity of Big Data and by the lack,up to now,of a reference data model and related data manipulation languages.In this paper,we focus on one of the key data security services,that is,access control,by highlighting the differences with traditional data management systems and describing a set of requirements that any access control solution for Big Data platforms may fulfill.We then describe the state of the art and discuss open research issues.
基金This work was partially supported by the CAPES-Science without Borders Scholarship[grant number 99999.013091/2013-01].
文摘Forensic investigations,especially those related to missing persons and unidentified remains,produce different types of data that must be managed and understood.The data collected and produced are extensive and originate from various sources:the police,non-governmental organizations(NGOs),medical examiner offices,specialised forensic teams,family members,and others.Some examples of information include,but are not limited to,the investigative background information,excavation data of burial sites,antemortem data on missing persons,and postmortem data on the remains of unidentified individuals.These complex data must be stored in a secured place,analysed,compared,shared,and then reported to the investigative actors and the public,especially the families of missing persons,who should be kept informed of the investigation.Therefore,a data management system with the capability of performing the tasks relevant to the goals of the investigation and the identification of an individual,while respecting the deceased and their families,is critical for standardising investigations.Data management is crucial to assure the quality of investigative processes,and it must be recognised as a holistic integrated system.The aim of this article is to discuss some of the most important components of an effective forensic data management system.The discussion is enriched by examples,challenges,and lessons learned from the erratic development and launching of databases for missing and unidentified persons in Brazil.The main objective of this article is to bring attention to the urgent need for an effective and integrated system in Brazil.
基金This work was supported in part by the National Key Research and Development Program under grant no.2016YFB0901405Science and Technology Planning Project of Guangdong province(2017B090901072)Key Research and Development Program of Hainan province(ZDYF2018003).
文摘With the wide application of advanced information and communication technology(ICT),power systems are becoming more reliable,more efficient and self-healing.Meanwhile more sophisticated cyber-attacks have appeared,e.g.false data injection(FDI)attacks,which deeply affect the state estimation of power systems and can lead to destructive consequences.To better manage and protect measurement data in power systems,we propose a blockchain-based multi-chain framework,taking advantage of the existing infrastructure.In this framework,measurements from sensors are mined into blocks by base stations using Practical Byzantine Fault Tolerance(PBFT)as the consensus protocol.We analyze the security of the proposed framework and carry out simulations to show its superiority compared to existing systems.The result of the simulations further provides guidance for how to structure the networking in the proposed framework.