With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The networ...With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The network security environment in the era of big data presents the characteristics of large amounts of data,high diversity,and high real-time requirements.Traditional security defense methods and tools have been unable to cope with the complex and changing network security threats.This paper proposes a machine-learning security defense algorithm based on metadata association features.Emphasize control over unauthorized users through privacy,integrity,and availability.The user model is established and the mapping between the user model and the metadata of the data source is generated.By analyzing the user model and its corresponding mapping relationship,the query of the user model can be decomposed into the query of various heterogeneous data sources,and the integration of heterogeneous data sources based on the metadata association characteristics can be realized.Define and classify customer information,automatically identify and perceive sensitive data,build a behavior audit and analysis platform,analyze user behavior trajectories,and complete the construction of a machine learning customer information security defense system.The experimental results show that when the data volume is 5×103 bit,the data storage integrity of the proposed method is 92%.The data accuracy is 98%,and the success rate of data intrusion is only 2.6%.It can be concluded that the data storage method in this paper is safe,the data accuracy is always at a high level,and the data disaster recovery performance is good.This method can effectively resist data intrusion and has high air traffic control security.It can not only detect all viruses in user data storage,but also realize integrated virus processing,and further optimize the security defense effect of user big data.展开更多
In view of the problems of inconsistent data semantics,inconsistent data formats,and difficult data quality assurance between the railway engineering design phase and the construction and operation phase,as well as th...In view of the problems of inconsistent data semantics,inconsistent data formats,and difficult data quality assurance between the railway engineering design phase and the construction and operation phase,as well as the difficulty in fully realizing the value of design results,this paper proposes a design and implementation scheme for a railway engineering collaborative design platform.The railway engineering collaborative design platform mainly includes functional modules such as metadata management,design collaboration,design delivery management,model component library,model rendering services,and Building Information Modeling(BIM)application services.Based on this,research is conducted on multi-disciplinary parameterized collaborative design technology for railway engineering,infrastructure data management and delivery technology,and design multi-source data fusion and application technology.The railway engineering collaborative design platform is compared with other railway design software to further validate its advantages and advanced features.The platform has been widely applied in multiple railway construction projects,greatly improving the design and project management efficiency.展开更多
Vast amounts of heterogeneous data on marine observations have been accumulated due to the rapid development of ocean observation technology.Several state-of-art methods are proposed to manage the emerging Internet of...Vast amounts of heterogeneous data on marine observations have been accumulated due to the rapid development of ocean observation technology.Several state-of-art methods are proposed to manage the emerging Internet of Things(IoT)sensor data.However,the use of an inefficient data management strategy during the data storage process can lead to missing metadata;thus,part of the sensor data cannot be indexed and utilized(i.e.,‘data swamp’).Researchers have focused on optimizing storage procedures to prevent such disasters,but few have attempted to restore the missing metadata.In this study,we propose an AI-based algorithm to reconstruct the metadata of heterogeneous marine data in data swamps to solve the above problems.First,a MapReduce algorithm is proposed to preprocess raw marine data and extract its feature tensors in parallel.Second,load the feature tensors are loaded into a machine learning algorithm and clustering operation is implemented.The similarities between the incoming data and the trained clustering results in terms of clustering results are also calculated.Finally,metadata reconstruction is performed based on existing marine observa-tion data processing results.The experiments are designed using existing datasets obtained from ocean observing systems,thus verifying the effectiveness of the algorithms.The results demonstrate the excellent performance of our proposed algorithm for the metadata recon-struction of heterogenous marine observation data.展开更多
From the beginning,the process of research and its publication is an ever-growing phenomenon and with the emergence of web technologies,its growth rate is overwhelming.On a rough estimate,more than thirty thousand res...From the beginning,the process of research and its publication is an ever-growing phenomenon and with the emergence of web technologies,its growth rate is overwhelming.On a rough estimate,more than thirty thousand research journals have been issuing around four million papers annually on average.Search engines,indexing services,and digital libraries have been searching for such publications over the web.Nevertheless,getting the most relevant articles against the user requests is yet a fantasy.It is mainly because the articles are not appropriately indexed based on the hierarchies of granular subject classification.To overcome this issue,researchers are striving to investigate new techniques for the classification of the research articles especially,when the complete article text is not available(a case of nonopen access articles).The proposed study aims to investigate the multilabel classification over the available metadata in the best possible way and to assess,“to what extent metadata-based features can perform in contrast to content-based approaches.”In this regard,novel techniques for investigating multilabel classification have been proposed,developed,and evaluated on metadata such as the Title and Keywords of the articles.The proposed technique has been assessed for two diverse datasets,namely,from the Journal of universal computer science(J.UCS)and the benchmark dataset comprises of the articles published by the Association for computing machinery(ACM).The proposed technique yields encouraging results in contrast to the state-ofthe-art techniques in the literature.展开更多
基于e交通学的交通大数据系统是通过构建由大型高性能计算机组成的集群系统来处理海量的交通数据的存储以及计算服务,不仅所需的环境十分严格,而且成本高、部署周期长、维护困难;不仅如此,随着数据量的增长,业务复杂度的增加,以及计算...基于e交通学的交通大数据系统是通过构建由大型高性能计算机组成的集群系统来处理海量的交通数据的存储以及计算服务,不仅所需的环境十分严格,而且成本高、部署周期长、维护困难;不仅如此,随着数据量的增长,业务复杂度的增加,以及计算强度的加大,通过增加Server数量来增加其处理对海量交通数据的能力会变的十分困难,甚至需要对集群的结构进行重新的设计和部署,这不仅需要大量的人力成本和财力,而且造成了巨大的浪费。MetaData交换及部署能力成为当今大数据驱动的智能交通系统研究的重点。面对海量交通数据,如何存储、管理、处理和应用MetaData是十分关键的问题。本文提出的交通大数据MetaData交换系统(Traffic Big Data Metadata Exchange System,TBMES)实现分布式交通信息交换与互访。该构架通过实时交通数据与交通信息大数据平台实时对接,让交通信息传递具有连续性、真实性;宏观交通数据和微观交通数据无缝对接,既可分析路网交通运行态势,又可评价重要道路节点的交通效率,全面掌握区域交通运营状态;使得交通组织管理可视化、可量化、系统化、自动化;系统的输出结果,可为决策者提供决策的理论支持,促进交通决策科学化。展开更多
Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advanc...Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The "utilitarian nature" and "historical and traditional views" of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part ofa metadata linguafranca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.展开更多
With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsi...With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsistent methods of data integration;and(3) disadvantages of different performing ways of data integration.This paper solves the above problems through overall planning and design,constructs unified running environment, consistent methods of data integration and system structure in order to advance the informationization展开更多
This study addressed the issues related to the collection and management of basic data for railway green performance. A railway green performance basic database has been constructed based on metadata and data exchange...This study addressed the issues related to the collection and management of basic data for railway green performance. A railway green performance basic database has been constructed based on metadata and data exchange schemas. A data classification system has been established from the perspectives of businesses, processes,and entities. A BIM(Building Information Modelling) model data extraction scheme is proposed based on field similarity matching and a document content extraction scheme is proposed based on image recognition. A railway green performance basic data collection system has been developed, achieving efficient collection and integrated management of railway green performance basic data. This system can provide data support for applications such as railway carbon emissions accounting, green cost-benefit analysis, and evaluation of green design solutions.展开更多
Digital broadcasting is a novel paradigm for the next generation broadcasting. Its goal is to provide not only better quality of pictures but also a variety of services that is impossible in traditional airwaves broad...Digital broadcasting is a novel paradigm for the next generation broadcasting. Its goal is to provide not only better quality of pictures but also a variety of services that is impossible in traditional airwaves broadcasting. One of the important factors for this new broadcasting environment is the interoperability among broadcasting applications since the environment is distributed. Therefore the broadcasting metadata becomes increasingly important and one of the metadata standards for a digital broadcasting is TV-Anytime metadata. TV-Anytime metadata is defined using XML schema, so its instances are XML data. In order to fulfill interoperability, a standard query language is also required and XQuery is a natural choice. There are some researches for dealing with broadcasting metadata. In our previous study, we have proposed the method for efficiently managing the broadcasting metadata in a service provider. However, the environment of a Set-Top Box for digital broadcasting is limited such as low-cost and low-setting. Therefore there are some considerations to apply general approaches for managing the metadata into the Set-Top Box. This paper proposes a method for efficiently managing the broadcasting metadata based on the Set-Top Box and a prototype of metadata management system for evaluating our method. Our system consists of a storage engine to store the metadata and an XQuery engine to search the stored metadata and uses special index for storing and searching. Our two engines are designed independently with hardware platform therefore these engines can be used in any low-cost applications to manage broadcasting metadata.展开更多
In this paper the application of spatialization technology on metadata quality check and updating was dis-cussed. A new method based on spatialization was proposed for checking and updating metadata to overcome the de...In this paper the application of spatialization technology on metadata quality check and updating was dis-cussed. A new method based on spatialization was proposed for checking and updating metadata to overcome the defi-ciency of text based methods with the powerful functions of spatial query and analysis provided by GIS software. Thismethod employs the technology of spatialization to transform metadata into a coordinate space and the functions ofspatial analysis in GIS to check and update spatial metadata in a visual environment. The basic principle and technicalflow of this method were explained in detail, and an example of implementation using ArcMap of GIS software wasillustrated with a metadata set of digital raster maps. The result shows the new method with the support of interactionof graph and text is much more intuitive and convenient than the ordinary text based method, and can fully utilize thefunctions of GIS spatial query and analysis with more accuracy and efficiency.展开更多
基金This work was supported by the National Natural Science Foundation of China(U2133208,U20A20161).
文摘With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The network security environment in the era of big data presents the characteristics of large amounts of data,high diversity,and high real-time requirements.Traditional security defense methods and tools have been unable to cope with the complex and changing network security threats.This paper proposes a machine-learning security defense algorithm based on metadata association features.Emphasize control over unauthorized users through privacy,integrity,and availability.The user model is established and the mapping between the user model and the metadata of the data source is generated.By analyzing the user model and its corresponding mapping relationship,the query of the user model can be decomposed into the query of various heterogeneous data sources,and the integration of heterogeneous data sources based on the metadata association characteristics can be realized.Define and classify customer information,automatically identify and perceive sensitive data,build a behavior audit and analysis platform,analyze user behavior trajectories,and complete the construction of a machine learning customer information security defense system.The experimental results show that when the data volume is 5×103 bit,the data storage integrity of the proposed method is 92%.The data accuracy is 98%,and the success rate of data intrusion is only 2.6%.It can be concluded that the data storage method in this paper is safe,the data accuracy is always at a high level,and the data disaster recovery performance is good.This method can effectively resist data intrusion and has high air traffic control security.It can not only detect all viruses in user data storage,but also realize integrated virus processing,and further optimize the security defense effect of user big data.
基金supported by the National Key Research and Development Program of China(2021YFB2600405).
文摘In view of the problems of inconsistent data semantics,inconsistent data formats,and difficult data quality assurance between the railway engineering design phase and the construction and operation phase,as well as the difficulty in fully realizing the value of design results,this paper proposes a design and implementation scheme for a railway engineering collaborative design platform.The railway engineering collaborative design platform mainly includes functional modules such as metadata management,design collaboration,design delivery management,model component library,model rendering services,and Building Information Modeling(BIM)application services.Based on this,research is conducted on multi-disciplinary parameterized collaborative design technology for railway engineering,infrastructure data management and delivery technology,and design multi-source data fusion and application technology.The railway engineering collaborative design platform is compared with other railway design software to further validate its advantages and advanced features.The platform has been widely applied in multiple railway construction projects,greatly improving the design and project management efficiency.
基金supported by the Shandong Province Natural Science Foundation(No.ZR2020QF028).
文摘Vast amounts of heterogeneous data on marine observations have been accumulated due to the rapid development of ocean observation technology.Several state-of-art methods are proposed to manage the emerging Internet of Things(IoT)sensor data.However,the use of an inefficient data management strategy during the data storage process can lead to missing metadata;thus,part of the sensor data cannot be indexed and utilized(i.e.,‘data swamp’).Researchers have focused on optimizing storage procedures to prevent such disasters,but few have attempted to restore the missing metadata.In this study,we propose an AI-based algorithm to reconstruct the metadata of heterogeneous marine data in data swamps to solve the above problems.First,a MapReduce algorithm is proposed to preprocess raw marine data and extract its feature tensors in parallel.Second,load the feature tensors are loaded into a machine learning algorithm and clustering operation is implemented.The similarities between the incoming data and the trained clustering results in terms of clustering results are also calculated.Finally,metadata reconstruction is performed based on existing marine observa-tion data processing results.The experiments are designed using existing datasets obtained from ocean observing systems,thus verifying the effectiveness of the algorithms.The results demonstrate the excellent performance of our proposed algorithm for the metadata recon-struction of heterogenous marine observation data.
文摘From the beginning,the process of research and its publication is an ever-growing phenomenon and with the emergence of web technologies,its growth rate is overwhelming.On a rough estimate,more than thirty thousand research journals have been issuing around four million papers annually on average.Search engines,indexing services,and digital libraries have been searching for such publications over the web.Nevertheless,getting the most relevant articles against the user requests is yet a fantasy.It is mainly because the articles are not appropriately indexed based on the hierarchies of granular subject classification.To overcome this issue,researchers are striving to investigate new techniques for the classification of the research articles especially,when the complete article text is not available(a case of nonopen access articles).The proposed study aims to investigate the multilabel classification over the available metadata in the best possible way and to assess,“to what extent metadata-based features can perform in contrast to content-based approaches.”In this regard,novel techniques for investigating multilabel classification have been proposed,developed,and evaluated on metadata such as the Title and Keywords of the articles.The proposed technique has been assessed for two diverse datasets,namely,from the Journal of universal computer science(J.UCS)and the benchmark dataset comprises of the articles published by the Association for computing machinery(ACM).The proposed technique yields encouraging results in contrast to the state-ofthe-art techniques in the literature.
文摘基于e交通学的交通大数据系统是通过构建由大型高性能计算机组成的集群系统来处理海量的交通数据的存储以及计算服务,不仅所需的环境十分严格,而且成本高、部署周期长、维护困难;不仅如此,随着数据量的增长,业务复杂度的增加,以及计算强度的加大,通过增加Server数量来增加其处理对海量交通数据的能力会变的十分困难,甚至需要对集群的结构进行重新的设计和部署,这不仅需要大量的人力成本和财力,而且造成了巨大的浪费。MetaData交换及部署能力成为当今大数据驱动的智能交通系统研究的重点。面对海量交通数据,如何存储、管理、处理和应用MetaData是十分关键的问题。本文提出的交通大数据MetaData交换系统(Traffic Big Data Metadata Exchange System,TBMES)实现分布式交通信息交换与互访。该构架通过实时交通数据与交通信息大数据平台实时对接,让交通信息传递具有连续性、真实性;宏观交通数据和微观交通数据无缝对接,既可分析路网交通运行态势,又可评价重要道路节点的交通效率,全面掌握区域交通运营状态;使得交通组织管理可视化、可量化、系统化、自动化;系统的输出结果,可为决策者提供决策的理论支持,促进交通决策科学化。
文摘Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The "utilitarian nature" and "historical and traditional views" of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part ofa metadata linguafranca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.
文摘With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsistent methods of data integration;and(3) disadvantages of different performing ways of data integration.This paper solves the above problems through overall planning and design,constructs unified running environment, consistent methods of data integration and system structure in order to advance the informationization
基金supported by the Science and Technology Research and Development Plan of China State Railway Group Co.,Ltd.(L2023Z001).
文摘This study addressed the issues related to the collection and management of basic data for railway green performance. A railway green performance basic database has been constructed based on metadata and data exchange schemas. A data classification system has been established from the perspectives of businesses, processes,and entities. A BIM(Building Information Modelling) model data extraction scheme is proposed based on field similarity matching and a document content extraction scheme is proposed based on image recognition. A railway green performance basic data collection system has been developed, achieving efficient collection and integrated management of railway green performance basic data. This system can provide data support for applications such as railway carbon emissions accounting, green cost-benefit analysis, and evaluation of green design solutions.
文摘Digital broadcasting is a novel paradigm for the next generation broadcasting. Its goal is to provide not only better quality of pictures but also a variety of services that is impossible in traditional airwaves broadcasting. One of the important factors for this new broadcasting environment is the interoperability among broadcasting applications since the environment is distributed. Therefore the broadcasting metadata becomes increasingly important and one of the metadata standards for a digital broadcasting is TV-Anytime metadata. TV-Anytime metadata is defined using XML schema, so its instances are XML data. In order to fulfill interoperability, a standard query language is also required and XQuery is a natural choice. There are some researches for dealing with broadcasting metadata. In our previous study, we have proposed the method for efficiently managing the broadcasting metadata in a service provider. However, the environment of a Set-Top Box for digital broadcasting is limited such as low-cost and low-setting. Therefore there are some considerations to apply general approaches for managing the metadata into the Set-Top Box. This paper proposes a method for efficiently managing the broadcasting metadata based on the Set-Top Box and a prototype of metadata management system for evaluating our method. Our system consists of a storage engine to store the metadata and an XQuery engine to search the stored metadata and uses special index for storing and searching. Our two engines are designed independently with hardware platform therefore these engines can be used in any low-cost applications to manage broadcasting metadata.
基金Project 40301042 supported by Natural Science Foundation of China
文摘In this paper the application of spatialization technology on metadata quality check and updating was dis-cussed. A new method based on spatialization was proposed for checking and updating metadata to overcome the defi-ciency of text based methods with the powerful functions of spatial query and analysis provided by GIS software. Thismethod employs the technology of spatialization to transform metadata into a coordinate space and the functions ofspatial analysis in GIS to check and update spatial metadata in a visual environment. The basic principle and technicalflow of this method were explained in detail, and an example of implementation using ArcMap of GIS software wasillustrated with a metadata set of digital raster maps. The result shows the new method with the support of interactionof graph and text is much more intuitive and convenient than the ordinary text based method, and can fully utilize thefunctions of GIS spatial query and analysis with more accuracy and efficiency.