Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Proj...Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Project' (SESADP). The result of the project was the creation of datasets and statistical outputs for the years 2011 to 2014 to meet Eurostat's annual earnings statistics requirements and the Structure of Earnings Survey (SES) Regulation. Record linking across the Census and various public sector datasets enabled the necessary information to be acquired to meet the Eurostat earnings requirements. However, the risk of statistical disclosure (i.e. identifying an individual on the dataset) is high unless privacy and confidentiality safe-guards are built into the data matching process. This paper looks at the three methods of linking records on big datasets employed on the SESADP, and how to anonymise the data to protect the identity of the individuals, where potentially disclosive variables exist.展开更多
This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both t...This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here.展开更多
[目的/意义]评价Linked Open Data Enabled Bibliographical Data(LODE-BD)3.0一书在开放关联数据赋能书目数据方面做出的学术贡献,帮助读者掌握开放关联数据的应用技能。[方法/过程]阐述开放关联数据应用指南的编撰目的,理解LODE-BD的...[目的/意义]评价Linked Open Data Enabled Bibliographical Data(LODE-BD)3.0一书在开放关联数据赋能书目数据方面做出的学术贡献,帮助读者掌握开放关联数据的应用技能。[方法/过程]阐述开放关联数据应用指南的编撰目的,理解LODE-BD的实践建议,思考如何将书目数据表示为开放关联数据,帮助用户开放获取相关的书目资源,实现书目资源的互联互通。[结果/结论]该书是一本成熟的,关于如何选择合适编码策略来生成开放关联数据赋能的书目数据的操作指南,具有丰富的理论价值、方法指导与实践意义。展开更多
Underwater magnetic induction(MI)-assisted acoustic cooperative multiple-input-multipleoutput(MIMO) has been recently proposed as a promising technique for underwater wireless sensor networks(UWSNs).For the more,the e...Underwater magnetic induction(MI)-assisted acoustic cooperative multiple-input-multipleoutput(MIMO) has been recently proposed as a promising technique for underwater wireless sensor networks(UWSNs).For the more,the energy utilization of energy-constrained sensor nodes is one of the key issues in UWSNs,and it relates to the network lifetime.In this paper,we present an energy-efficient data collection for underwater MI-assisted acoustic cooperative MIMO wireless sensor networks(WSNs),including the formation of cooperative MIMO and relay link establishment.Firstly,the cooperative MIMO is formed by considering its expected transmission range and the energy balance of nodes with it.Particularly,from the perspective of the node’s energy consumption,the expected cooperative MIMO size and the selection of master node(MN) are proposed.Sequentially,to improve the coverage of the networks and prolong the network lifetime,relay links are established by relay selection algorithm that using matching theory.Finally,the simulation results show that the proposed data collection improves its efficiency,reduces the energy consumption of the master node,improves the networks’ coverage,and extends the network lifetime.展开更多
In light of the escalating demand and intricacy of services in contemporary terrestrial,maritime,and aerial combat operations,there is a compelling need for enhanced service quality and efficiency in airborne cluster ...In light of the escalating demand and intricacy of services in contemporary terrestrial,maritime,and aerial combat operations,there is a compelling need for enhanced service quality and efficiency in airborne cluster communication networks.Software-Defined Networking(SDN)proffers a viable solution for the multifaceted task of cooperative communication transmission and management across different operational domains within complex combat contexts,due to its intrinsic ability to flexibly allocate and centrally administer network resources.This study pivots around the optimization of SDN controller deployment within airborne data link clusters.A collaborative multi-controller architecture predicated on airborne data link clusters is thus proposed.Within this architectural framework,the controller deployment issue is reframed as a two-fold problem:subdomain partition-ing and central interaction node selection.We advocate a subdomain segmentation approach grounded in node value ranking(NDVR)and a central interaction node selection methodology predicated on an enhanced Artificial Fish Swarm Algorithm(AFSA).The advanced NDVR-AFSA(Node value ranking-Improved artificial fish swarm algorithm)algorithm makes use of a chaos algorithm for population initialization,boosting population diversity and circumventing premature algorithm convergence.By the integration of adaptive strategies and incorporation of the genetic algorithm’s crossover and mutation operations,the algorithm’s search range adaptability is enhanced,thereby increasing the possibility of obtaining globally optimal solutions,while concurrently augmenting cluster reliability.The simulation results verify the advantages of the NDVR-IAFSA algorithm,achieve a better load balancing effect,improve the reliability of aviation data link cluster,and significantly reduce the average propagation delay and disconnection rate,respectively,by 12.8%and 11.7%.This shows that the optimization scheme has important significance in practical application,and can meet the high requirements of modern sea,land,and air operations to aviation airborne communication networks.展开更多
Purpose:To develop a set of metrics and identify criteria for assessing the functionality of LOD KOS products while providing common guiding principles that can be used by LOD KOS producers and users to maximize the f...Purpose:To develop a set of metrics and identify criteria for assessing the functionality of LOD KOS products while providing common guiding principles that can be used by LOD KOS producers and users to maximize the functions and usages of LOD KOS products.Design/methodology/approach:Data collection and analysis were conducted at three time periods in 2015–16,2017 and 2019.The sample data used in the comprehensive data analysis comprises all datasets tagged as types of KOS in the Datahub and extracted through their respective SPARQL endpoints.A comparative study of the LOD KOS collected from terminology services Linked Open Vocabularies(LOV)and BioPortal was also performed.Findings:The study proposes a set of Functional,Impactful and Transformable(FIT)metrics for LOD KOS as value vocabularies.The FAIR principles,with additional recommendations,are presented for LOD KOS as open data.Research limitations:The metrics need to be further tested and aligned with the best practices and international standards of both open data and various types of KOS.Practical implications:Assessment performed with FAIR and FIT metrics support the creation and delivery of user-friendly,discoverable and interoperable LOD KOS datasets which can be used for innovative applications,act as a knowledge base,become a foundation of semantic analysis and entity extractions and enhance research in science and the humanities.Originality/value:Our research provides best practice guidelines for LOD KOS as value vocabularies.展开更多
Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leadi...Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leading social media platform—Sina Weibo, a hybrid of Twitter and Facebook—has more than 600 million users. Weibo’s great market penetration suggests that tourism operators and markets need to understand how to build effective and sustainable communications on Chinese social media platforms. In order to offer a better decision support platform to tourism destination managers as well as Chinese tourists, we proposed a framework using linked data on Sina Weibo. Linked Data is a term referring to using the Internet to connect related data. We will show how it can be used and how ontology can be designed to include the users’ context (e.g., GPS locations). Our framework will provide a good theoretical foundation for further understand Chinese tourists’ expectation, experiences, behaviors and new trends in Switzerland.展开更多
A large number of ontologies have been introduced by the biomedical community in recent years. Knowledge discovery for entity identification from ontology has become an important research area, and it is always intere...A large number of ontologies have been introduced by the biomedical community in recent years. Knowledge discovery for entity identification from ontology has become an important research area, and it is always interesting to discovery how associations are established to connect concepts in a single ontology or across multiple ontologies. However, due to the exponential growth of biomedical big data and their complicated associations, it becomes very challenging to detect key associations among entities in an inefficient dynamic manner. Therefore, there exists a gap between the increasing needs for association detection and large volume of biomedical ontologies. In this paper, to bridge this gap, we presented a knowledge discovery framework, the BioBroker, for grouping entities to facilitate the process of biomedical knowledge discovery in an intelligent way. Specifically, we developed an innovative knowledge discovery algorithm that combines a graph clustering method and an indexing technique to discovery knowledge patterns over a set of interlinked data sources in an efficient way. We have demonstrated capabilities of the BioBroker for query execution with a use case study on a subset of the Bio2RDF life science linked data.展开更多
This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of t...This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of the authors can be used as contextual keywords in query expressions to efficiently dig out the expected slides for reuse rather than using only the part-of-slide-descriptions-based keyword queries. As a system, a new slide repository is proposed, composed of slide material collections, slide content data and pieces of information from authors' episodic memories related to each slide and presentation together with a slide retrieval application enabling authors to use the episodic memories as part of queries. The result of our experiment shows that the episodic memory-used queries can give more discoverability than the keyword-based queries. Additionally, an improvement model is discussed on the slide retrieval for further slide-finding efficiency by expanding the episodic memories model in the repository taking in the links with the author-and-slide-related data and events having been post on the private and social media sites.展开更多
A new method of data access which can effectively resolve the problem of high speed and real time reading data of nuclear instrument in small storage space is introduced. This method applies the data storage mode of ...A new method of data access which can effectively resolve the problem of high speed and real time reading data of nuclear instrument in small storage space is introduced. This method applies the data storage mode of “linked list” to the system of Micro Control Unit (MCU), and realizes the pointer access of nuclear data on the small storage space of MCU. Experimental results show that this method can solve some problems of traditional data storage method, which has the advantages of simple program design, stable performance, accurate data, strong repeatability, saving storage space and so on.展开更多
Based on the M-ary spread spectrum (M-ary-SS), direct sequence spread spectrum (DS-SS), and orthogonal frequency division multiplex (OFDM), a novel anti-jamming scheme, named orthogonal code time division multi-...Based on the M-ary spread spectrum (M-ary-SS), direct sequence spread spectrum (DS-SS), and orthogonal frequency division multiplex (OFDM), a novel anti-jamming scheme, named orthogonal code time division multi-subchannels spread spectrum modulation (OC-TDMSCSSM), is proposed to enhance the anti-jamming ability of the unmanned aerial vehicle (UAV) data link. The anti-jamming system with its mathematical model is presented first, and then the signal formats of transmitter and receiver are derived. The receiver's bit error rate (BER) is demonstrated and anti-jamming performance analysis is carded out in an additive white Ganssian noise (AWGN) channel. Theoretical research and simulation results show the anti-jamming performance of the proposed scheme better than that of the hybrid direct sequence frequency hopping spread spectrum (DS/FH SS) system. The jamming margin of the OC-TDMSCSSM system is 5 dB higher than that of DS/FH SS system under the condition of Rician channel and full-band jamming, and 6 dB higher under the condition of Rician channel environment and partial-band jamming.展开更多
文摘Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Project' (SESADP). The result of the project was the creation of datasets and statistical outputs for the years 2011 to 2014 to meet Eurostat's annual earnings statistics requirements and the Structure of Earnings Survey (SES) Regulation. Record linking across the Census and various public sector datasets enabled the necessary information to be acquired to meet the Eurostat earnings requirements. However, the risk of statistical disclosure (i.e. identifying an individual on the dataset) is high unless privacy and confidentiality safe-guards are built into the data matching process. This paper looks at the three methods of linking records on big datasets employed on the SESADP, and how to anonymise the data to protect the identity of the individuals, where potentially disclosive variables exist.
文摘This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here.
文摘[目的/意义]评价Linked Open Data Enabled Bibliographical Data(LODE-BD)3.0一书在开放关联数据赋能书目数据方面做出的学术贡献,帮助读者掌握开放关联数据的应用技能。[方法/过程]阐述开放关联数据应用指南的编撰目的,理解LODE-BD的实践建议,思考如何将书目数据表示为开放关联数据,帮助用户开放获取相关的书目资源,实现书目资源的互联互通。[结果/结论]该书是一本成熟的,关于如何选择合适编码策略来生成开放关联数据赋能的书目数据的操作指南,具有丰富的理论价值、方法指导与实践意义。
基金supported in part by the program for "Industrial Io T and Emergency Collaboration" Innovative Research Team in CUMT (No.2020ZY002)in part by the Postgraduate Research&Practice Innovation Program of Jiangsu Province,2021WLKXJ054Postgraduate Research&Practice Innovation Program of China University of Mining and Technology,KYCX21_2242
文摘Underwater magnetic induction(MI)-assisted acoustic cooperative multiple-input-multipleoutput(MIMO) has been recently proposed as a promising technique for underwater wireless sensor networks(UWSNs).For the more,the energy utilization of energy-constrained sensor nodes is one of the key issues in UWSNs,and it relates to the network lifetime.In this paper,we present an energy-efficient data collection for underwater MI-assisted acoustic cooperative MIMO wireless sensor networks(WSNs),including the formation of cooperative MIMO and relay link establishment.Firstly,the cooperative MIMO is formed by considering its expected transmission range and the energy balance of nodes with it.Particularly,from the perspective of the node’s energy consumption,the expected cooperative MIMO size and the selection of master node(MN) are proposed.Sequentially,to improve the coverage of the networks and prolong the network lifetime,relay links are established by relay selection algorithm that using matching theory.Finally,the simulation results show that the proposed data collection improves its efficiency,reduces the energy consumption of the master node,improves the networks’ coverage,and extends the network lifetime.
基金supported by the following funds:Defense Industrial Technology Development Program Grant:G20210513Shaanxi Provincal Department of Science and Technology Grant:2021KW-07Shaanxi Provincal Department of Science and Technology Grant:2022 QFY01-14.
文摘In light of the escalating demand and intricacy of services in contemporary terrestrial,maritime,and aerial combat operations,there is a compelling need for enhanced service quality and efficiency in airborne cluster communication networks.Software-Defined Networking(SDN)proffers a viable solution for the multifaceted task of cooperative communication transmission and management across different operational domains within complex combat contexts,due to its intrinsic ability to flexibly allocate and centrally administer network resources.This study pivots around the optimization of SDN controller deployment within airborne data link clusters.A collaborative multi-controller architecture predicated on airborne data link clusters is thus proposed.Within this architectural framework,the controller deployment issue is reframed as a two-fold problem:subdomain partition-ing and central interaction node selection.We advocate a subdomain segmentation approach grounded in node value ranking(NDVR)and a central interaction node selection methodology predicated on an enhanced Artificial Fish Swarm Algorithm(AFSA).The advanced NDVR-AFSA(Node value ranking-Improved artificial fish swarm algorithm)algorithm makes use of a chaos algorithm for population initialization,boosting population diversity and circumventing premature algorithm convergence.By the integration of adaptive strategies and incorporation of the genetic algorithm’s crossover and mutation operations,the algorithm’s search range adaptability is enhanced,thereby increasing the possibility of obtaining globally optimal solutions,while concurrently augmenting cluster reliability.The simulation results verify the advantages of the NDVR-IAFSA algorithm,achieve a better load balancing effect,improve the reliability of aviation data link cluster,and significantly reduce the average propagation delay and disconnection rate,respectively,by 12.8%and 11.7%.This shows that the optimization scheme has important significance in practical application,and can meet the high requirements of modern sea,land,and air operations to aviation airborne communication networks.
基金College of Communication and Information(CCI)Research and Creative Activity Fund,Kent State University
文摘Purpose:To develop a set of metrics and identify criteria for assessing the functionality of LOD KOS products while providing common guiding principles that can be used by LOD KOS producers and users to maximize the functions and usages of LOD KOS products.Design/methodology/approach:Data collection and analysis were conducted at three time periods in 2015–16,2017 and 2019.The sample data used in the comprehensive data analysis comprises all datasets tagged as types of KOS in the Datahub and extracted through their respective SPARQL endpoints.A comparative study of the LOD KOS collected from terminology services Linked Open Vocabularies(LOV)and BioPortal was also performed.Findings:The study proposes a set of Functional,Impactful and Transformable(FIT)metrics for LOD KOS as value vocabularies.The FAIR principles,with additional recommendations,are presented for LOD KOS as open data.Research limitations:The metrics need to be further tested and aligned with the best practices and international standards of both open data and various types of KOS.Practical implications:Assessment performed with FAIR and FIT metrics support the creation and delivery of user-friendly,discoverable and interoperable LOD KOS datasets which can be used for innovative applications,act as a knowledge base,become a foundation of semantic analysis and entity extractions and enhance research in science and the humanities.Originality/value:Our research provides best practice guidelines for LOD KOS as value vocabularies.
文摘Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leading social media platform—Sina Weibo, a hybrid of Twitter and Facebook—has more than 600 million users. Weibo’s great market penetration suggests that tourism operators and markets need to understand how to build effective and sustainable communications on Chinese social media platforms. In order to offer a better decision support platform to tourism destination managers as well as Chinese tourists, we proposed a framework using linked data on Sina Weibo. Linked Data is a term referring to using the Internet to connect related data. We will show how it can be used and how ontology can be designed to include the users’ context (e.g., GPS locations). Our framework will provide a good theoretical foundation for further understand Chinese tourists’ expectation, experiences, behaviors and new trends in Switzerland.
文摘A large number of ontologies have been introduced by the biomedical community in recent years. Knowledge discovery for entity identification from ontology has become an important research area, and it is always interesting to discovery how associations are established to connect concepts in a single ontology or across multiple ontologies. However, due to the exponential growth of biomedical big data and their complicated associations, it becomes very challenging to detect key associations among entities in an inefficient dynamic manner. Therefore, there exists a gap between the increasing needs for association detection and large volume of biomedical ontologies. In this paper, to bridge this gap, we presented a knowledge discovery framework, the BioBroker, for grouping entities to facilitate the process of biomedical knowledge discovery in an intelligent way. Specifically, we developed an innovative knowledge discovery algorithm that combines a graph clustering method and an indexing technique to discovery knowledge patterns over a set of interlinked data sources in an efficient way. We have demonstrated capabilities of the BioBroker for query execution with a use case study on a subset of the Bio2RDF life science linked data.
文摘This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of the authors can be used as contextual keywords in query expressions to efficiently dig out the expected slides for reuse rather than using only the part-of-slide-descriptions-based keyword queries. As a system, a new slide repository is proposed, composed of slide material collections, slide content data and pieces of information from authors' episodic memories related to each slide and presentation together with a slide retrieval application enabling authors to use the episodic memories as part of queries. The result of our experiment shows that the episodic memory-used queries can give more discoverability than the keyword-based queries. Additionally, an improvement model is discussed on the slide retrieval for further slide-finding efficiency by expanding the episodic memories model in the repository taking in the links with the author-and-slide-related data and events having been post on the private and social media sites.
文摘A new method of data access which can effectively resolve the problem of high speed and real time reading data of nuclear instrument in small storage space is introduced. This method applies the data storage mode of “linked list” to the system of Micro Control Unit (MCU), and realizes the pointer access of nuclear data on the small storage space of MCU. Experimental results show that this method can solve some problems of traditional data storage method, which has the advantages of simple program design, stable performance, accurate data, strong repeatability, saving storage space and so on.
基金Aeronautical Science Foundation of China (2007ZC53030)
文摘Based on the M-ary spread spectrum (M-ary-SS), direct sequence spread spectrum (DS-SS), and orthogonal frequency division multiplex (OFDM), a novel anti-jamming scheme, named orthogonal code time division multi-subchannels spread spectrum modulation (OC-TDMSCSSM), is proposed to enhance the anti-jamming ability of the unmanned aerial vehicle (UAV) data link. The anti-jamming system with its mathematical model is presented first, and then the signal formats of transmitter and receiver are derived. The receiver's bit error rate (BER) is demonstrated and anti-jamming performance analysis is carded out in an additive white Ganssian noise (AWGN) channel. Theoretical research and simulation results show the anti-jamming performance of the proposed scheme better than that of the hybrid direct sequence frequency hopping spread spectrum (DS/FH SS) system. The jamming margin of the OC-TDMSCSSM system is 5 dB higher than that of DS/FH SS system under the condition of Rician channel and full-band jamming, and 6 dB higher under the condition of Rician channel environment and partial-band jamming.