To achieve the high availability of health data in erasure-coded cloud storage systems,the data update performance in erasure coding should be continuously optimized.However,the data update performance is often bottle...To achieve the high availability of health data in erasure-coded cloud storage systems,the data update performance in erasure coding should be continuously optimized.However,the data update performance is often bottlenecked by the constrained cross-rack bandwidth.Various techniques have been proposed in the literature to improve network bandwidth efficiency,including delta transmission,relay,and batch update.These techniques were largely proposed individually previously,and in this work,we seek to use them jointly.To mitigate the cross-rack update traffic,we propose DXR-DU which builds on four valuable techniques:(i)delta transmission,(ii)XOR-based data update,(iii)relay,and(iv)batch update.Meanwhile,we offer two selective update approaches:1)data-deltabased update,and 2)parity-delta-based update.The proposed DXR-DU is evaluated via trace-driven local testbed experiments.Comprehensive experiments show that DXR-DU can significantly improve data update throughput while mitigating the cross-rack update traffic.展开更多
Retrieving data from mobile source vehicles is a crucial routine operation for a wide spectrum of vehicular network applications, in- cluding road surface monitoring and sharing. Network coding has been widely exploit...Retrieving data from mobile source vehicles is a crucial routine operation for a wide spectrum of vehicular network applications, in- cluding road surface monitoring and sharing. Network coding has been widely exploited and is an effective technique for diffusing in- formation over a network. The use of network coding to improve data availability in vehicular networks is explored in this paper. With random linear network codes, simple replication is avoided, and instead, a node forwards a coded block that is a random combination of all data received by the node. We use a network-coding-based approach to improve data availability in vehicular networks. To deter- mine the feasibility of this approach, we conducted an empirical study with extensive simulations based on two real vehicular GPS traces, both of which contain records from thousands of vehicles over more than a year. We observed that, despite significant improve- ment in data availability, there is a serious issue with linear correlation between the received codes. This reduces the data-retrieval success rate. By analyzing the real vehicular traces, we discovered that there is a strong community structure within a real vehicular network. We verify that such a structure contributes to the issue of linear dependence. Then, we point out opportunities to improve the network-coding-based approach by developing community-aware code-distribution techniques.展开更多
The Swiss Agency for Development and Cooperation (SDC) has funded the Rural Water and Sanitation Support Programme (RWSSP) that has increased the access to public water supply throughout Europe’s youngest state—Kos...The Swiss Agency for Development and Cooperation (SDC) has funded the Rural Water and Sanitation Support Programme (RWSSP) that has increased the access to public water supply throughout Europe’s youngest state—Kosovo—in the past ten years. The Programme, implemented by Dorsch International Consultants GmbH and Community Development Initiatives has, among other activities, implemented groundwater protection methods. Nevertheless, groundwater protection remains a challenge in Kosovo. The water law describes that water source protection is similar to German rules, yet modelling-based planning of water source protection zones remains challenging. In the present study, the development of the hydrogeological and the mathematical groundwater model for the technical delineation of the well head protection area for the Ferizaj well fields under limited data availability is described in detail. The study shows that even when not all data are available, it is possible and necessary to use mathematical groundwater models to delineate well head protection areas.展开更多
Data availability statements can provide useful information about how researchers actually share research data.We used unsupervised machine learning to analyze 124,000 data availability statements submitted by researc...Data availability statements can provide useful information about how researchers actually share research data.We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019.We categorized the data availability statements,and looked at trends over time.We found expected increases in the number of data availability statements submitted over time,and marked increases that correlate with policy changes made by journals.Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.展开更多
At the beginning of 2020, human activities were interrupted by a new virus, identified as SARS-CoV-2, which causes COVID-19 disease. The scientific area was no exception: for a certain period, researchers around the w...At the beginning of 2020, human activities were interrupted by a new virus, identified as SARS-CoV-2, which causes COVID-19 disease. The scientific area was no exception: for a certain period, researchers around the world were forced to leave their laboratories and work remotely. There was a global necessity for finding alternatives focused on generating knowledge and publishing data, so repositories of scientific information, such as databases, represented strong support. In the specific case of life sciences, different strategies allowed rapid compilation of data and its sharing worldwide. Therefore, in this work, the impact of the SARS-CoV-2 pandemic on the amount of peer-reviewed and published papers during COVID-19 times was analyzed along with the role of databases. Our results pointed out that an increase in the number of papers belonging to different knowledge fields took place, with the medical field being the most significant. On the other hand, the complete genome of the new virus was sequenced, and repositories were created with sufficient data for monitoring, preventing, and controlling its dissemination. This was the case for the generation of vaccines in addition to potential candidates for drugs against COVID-19. However, although in 2021, vaccines allowed us to gradually return to our activities, databases and the generation of other repositories remain a key point for facing new strains and adapting to a new reality. Finally, this paper discusses joint efforts to tackle the obstacles of the pandemic, not only from a medical but also from the point of view regarding the fight against misinformation.展开更多
Peer-to-peer (P2P) networking is a distributed architecture that partitions tasks or data between peer nodes. In this paper, an efficient Hypercube Sequential Matrix Partition (HS-MP) for efficient data sharing in P2P...Peer-to-peer (P2P) networking is a distributed architecture that partitions tasks or data between peer nodes. In this paper, an efficient Hypercube Sequential Matrix Partition (HS-MP) for efficient data sharing in P2P Networks using tokenizer method is proposed to resolve the problems of the larger P2P networks. The availability of data is first measured by the tokenizer using Dynamic Hypercube Organization. By applying Dynamic Hypercube Organization, that efficiently coordinates and assists the peers in P2P network ensuring data availability at many locations. Each data in peer is then assigned with valid ID by the tokenizer using Sequential Self-Organizing (SSO) ID generation model. This ensures data sharing with other nodes in large P2P network at minimum time interval which is obtained through proximity of data availability. To validate the framework HS-MP, the performance is evaluated using traffic traces collected from data sharing applications. Simulations conducting using Network simulator-2 show that the proposed framework outperforms the conventional streaming models. The performance of the proposed system is analyzed using energy consumption, average latency and average data availability rate with respect to the number of peer nodes, data size, amount of data shared and execution time. The proposed method reduces the energy consumption 43.35% to transpose traffic, 35.29% to bitrev traffic and 25% to bitcomp traffic patterns.展开更多
This article investigates the dynamic relationship between technology and AI(artificial intelligence)and the role that societal requirements play in pushing AI research and adoption.Technology has advanced dramaticall...This article investigates the dynamic relationship between technology and AI(artificial intelligence)and the role that societal requirements play in pushing AI research and adoption.Technology has advanced dramatically throughout the years,providing the groundwork for the rise of AI.AI systems have achieved incredible feats in various disciplines thanks to advancements in computer power,data availability,and complex algorithms.On the other hand,society’s needs for efficiency,enhanced healthcare,environmental sustainability,and personalized experiences have worked as powerful accelerators for AI’s progress.This article digs into how technology empowers AI and how societal needs dictate its progress,emphasizing their symbiotic relationship.The findings underline the significance of responsible AI research,which considers both technological prowess and ethical issues,to ensure that AI continues to serve the greater good.展开更多
Natural hazards impact interdependent infrastructure networks that keep modern society functional.While a va-riety of modelling approaches are available to represent critical infrastructure networks(CINs)on different ...Natural hazards impact interdependent infrastructure networks that keep modern society functional.While a va-riety of modelling approaches are available to represent critical infrastructure networks(CINs)on different scales and analyse the impacts of natural hazards,a recurring challenge for all modelling approaches is the availability and accessibility of sufficiently high-quality input and validation data.The resulting data gaps often require mod-ellers to assume specific technical parameters,functional relationships,and system behaviours.In other cases,expert knowledge from one sector is extrapolated to other sectoral structures or even cross-sectorally applied to fill data gaps.The uncertainties introduced by these assumptions and extrapolations and their influence on the quality of modelling outcomes are often poorly understood and difficult to capture,thereby eroding the reliability of these models to guide resilience enhancements.Additionally,ways of overcoming the data avail-ability challenges in CIN modelling,with respect to each modelling purpose,remain an open question.To address these challenges,a generic modelling workflow is derived from existing modelling approaches to examine model definition and validations,as well as the six CIN modelling stages,including mapping of infrastructure assets,quantification of dependencies,assessment of natural hazard impacts,response&recovery,quantification of CI services,and adaptation measures.The data requirements of each stage were systematically defined,and the literature on potential sources was reviewed to enhance data collection and raise awareness of potential pitfalls.The application of the derived workflow funnels into a framework to assess data availability challenges.This is shown through three case studies,taking into account their different modelling purposes:hazard hotspot assess-ments,hazard risk management,and sectoral adaptation.Based on the three model purpose types provided,a framework is suggested to explore the implications of data scarcity for certain data types,as well as their reasons and consequences for CIN model reliability.Finally,a discussion on overcoming the challenges of data scarcity is presented.展开更多
The number of multi-drug-resistant bacteria has increased over the last few decades,which has caused a detrimental impact on public health worldwide.In resolving antibiotic resistance development among different bacte...The number of multi-drug-resistant bacteria has increased over the last few decades,which has caused a detrimental impact on public health worldwide.In resolving antibiotic resistance development among different bacterial communities,new antimicrobial agents and nanoparticle-based strategies need to be designed foreseeing the slow discovery of new functioning antibiotics.Advanced research studies have revealed the significant disinfection potential of two-dimensional nanomaterials(2D NMs)to be severed as effective antibacterial agents due to their unique physicochemical properties.This review covers the current research progress of 2D NMs-based antibacterial strategies based on an inclusive explanation of 2D NMs’impact as antibacterial agents,including a detailed introduction to each possible well-known antibacterial mechanism.The impact of the physicochemical properties of 2D NMs on their antibacterial activities has been deliberated while explaining the toxic effects of 2D NMs and discussing their biomedical significance,dysbiosis,and cellular nanotoxicity.Adding to the challenges,we also discussed the major issues regarding the current quality and availability of nanotoxicity data.However,smart advancements are required to fabricate biocompatible 2D antibacterial NMs and exploit their potential to combat bacterial resistance clinically.展开更多
Among redundant arrays of independent disks(RAID)-6 codes, maximum distance separable(MDS)based RAID-6 codes are popular because they have the optimal storage efficiency. Although vertical MDS codes exhibit better loa...Among redundant arrays of independent disks(RAID)-6 codes, maximum distance separable(MDS)based RAID-6 codes are popular because they have the optimal storage efficiency. Although vertical MDS codes exhibit better load balancing compared to horizontal MDS codes in partial stripes, an I/O unbalancing problem still exists in some vertical codes. To address this issue, we propose a novel efficient data layout, uniform P-code(UPC), to support highly balanced I/Os among P-coded disk arrays(i.e., PC). In UPC, the nonuniformly distributed information symbols in each parity chain of P-code are moved along their columns to other rows, thus enabling the parity chain to keep original parity relationships and tolerate double disk failures. The UPC scheme not only achieves optimal storage efficiency, computational complexity, and update complexity, but also supports better I/O balancing in the context of large-scale storage systems. We also conduct a performance study on reconstruction algorithms using an analytical model. Besides extensive theoretical analysis, comparative performance experiments are conducted by replaying real-world workloads under various configurations. Experimental results illustrate that our UPC scheme significantly outperforms the PC scheme in terms of average user response time. In particular, in the case of a 12-disk array, the UPC scheme can improve the access performance of the RAID-6 storage system by29.9% compared to the PC scheme.展开更多
To ensure the reliability and availability of data,redundancy strategies are always required for distributed storage systems.Erasure coding,one of the representative redundancy strategies,has the advantage of low stor...To ensure the reliability and availability of data,redundancy strategies are always required for distributed storage systems.Erasure coding,one of the representative redundancy strategies,has the advantage of low storage overhead,which facilitates its employment in distributed storage systems.Among the various erasure coding schemes,XOR-based erasure codes are becoming popular due to their high computing speed.When a single-node failure occurs in such coding schemes,a process called data recovery takes place to retrieve the failed node’s lost data from surviving nodes.However,data transmission during the data recovery process usually requires a considerable amount of time.Current research has focused mainly on reducing the amount of data needed for data recovery to reduce the time required for data transmission,but it has encountered problems such as significant complexity and local optima.In this paper,we propose a random search recovery algorithm,named SA-RSR,to speed up single-node failure recovery of XOR-based erasure codes.SA-RSR uses a simulated annealing technique to search for an optimal recovery solution that reads and transmits a minimum amount of data.In addition,this search process can be done in polynomial time.We evaluate SA-RSR with a variety of XOR-based erasure codes in simulations and in a real storage system,Ceph.Experimental results in Ceph show that SA-RSR reduces the amount of data required for recovery by up to 30.0%and improves the performance of data recovery by up to 20.36%compared to the conventional recovery method.展开更多
基金supported by Major Special Project of Sichuan Science and Technology Department(2020YFG0460)Central University Project of China(ZYGX2020ZB020,ZYGX2020ZB019).
文摘To achieve the high availability of health data in erasure-coded cloud storage systems,the data update performance in erasure coding should be continuously optimized.However,the data update performance is often bottlenecked by the constrained cross-rack bandwidth.Various techniques have been proposed in the literature to improve network bandwidth efficiency,including delta transmission,relay,and batch update.These techniques were largely proposed individually previously,and in this work,we seek to use them jointly.To mitigate the cross-rack update traffic,we propose DXR-DU which builds on four valuable techniques:(i)delta transmission,(ii)XOR-based data update,(iii)relay,and(iv)batch update.Meanwhile,we offer two selective update approaches:1)data-deltabased update,and 2)parity-delta-based update.The proposed DXR-DU is evaluated via trace-driven local testbed experiments.Comprehensive experiments show that DXR-DU can significantly improve data update throughput while mitigating the cross-rack update traffic.
基金supported by China 973 Program(2014CB340303)NSFC(No.61170238,60903190)National 863 Program(2013AA01A601)
文摘Retrieving data from mobile source vehicles is a crucial routine operation for a wide spectrum of vehicular network applications, in- cluding road surface monitoring and sharing. Network coding has been widely exploited and is an effective technique for diffusing in- formation over a network. The use of network coding to improve data availability in vehicular networks is explored in this paper. With random linear network codes, simple replication is avoided, and instead, a node forwards a coded block that is a random combination of all data received by the node. We use a network-coding-based approach to improve data availability in vehicular networks. To deter- mine the feasibility of this approach, we conducted an empirical study with extensive simulations based on two real vehicular GPS traces, both of which contain records from thousands of vehicles over more than a year. We observed that, despite significant improve- ment in data availability, there is a serious issue with linear correlation between the received codes. This reduces the data-retrieval success rate. By analyzing the real vehicular traces, we discovered that there is a strong community structure within a real vehicular network. We verify that such a structure contributes to the issue of linear dependence. Then, we point out opportunities to improve the network-coding-based approach by developing community-aware code-distribution techniques.
文摘The Swiss Agency for Development and Cooperation (SDC) has funded the Rural Water and Sanitation Support Programme (RWSSP) that has increased the access to public water supply throughout Europe’s youngest state—Kosovo—in the past ten years. The Programme, implemented by Dorsch International Consultants GmbH and Community Development Initiatives has, among other activities, implemented groundwater protection methods. Nevertheless, groundwater protection remains a challenge in Kosovo. The water law describes that water source protection is similar to German rules, yet modelling-based planning of water source protection zones remains challenging. In the present study, the development of the hydrogeological and the mathematical groundwater model for the technical delineation of the well head protection area for the Ferizaj well fields under limited data availability is described in detail. The study shows that even when not all data are available, it is possible and necessary to use mathematical groundwater models to delineate well head protection areas.
文摘Data availability statements can provide useful information about how researchers actually share research data.We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019.We categorized the data availability statements,and looked at trends over time.We found expected increases in the number of data availability statements submitted over time,and marked increases that correlate with policy changes made by journals.Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.
文摘At the beginning of 2020, human activities were interrupted by a new virus, identified as SARS-CoV-2, which causes COVID-19 disease. The scientific area was no exception: for a certain period, researchers around the world were forced to leave their laboratories and work remotely. There was a global necessity for finding alternatives focused on generating knowledge and publishing data, so repositories of scientific information, such as databases, represented strong support. In the specific case of life sciences, different strategies allowed rapid compilation of data and its sharing worldwide. Therefore, in this work, the impact of the SARS-CoV-2 pandemic on the amount of peer-reviewed and published papers during COVID-19 times was analyzed along with the role of databases. Our results pointed out that an increase in the number of papers belonging to different knowledge fields took place, with the medical field being the most significant. On the other hand, the complete genome of the new virus was sequenced, and repositories were created with sufficient data for monitoring, preventing, and controlling its dissemination. This was the case for the generation of vaccines in addition to potential candidates for drugs against COVID-19. However, although in 2021, vaccines allowed us to gradually return to our activities, databases and the generation of other repositories remain a key point for facing new strains and adapting to a new reality. Finally, this paper discusses joint efforts to tackle the obstacles of the pandemic, not only from a medical but also from the point of view regarding the fight against misinformation.
文摘Peer-to-peer (P2P) networking is a distributed architecture that partitions tasks or data between peer nodes. In this paper, an efficient Hypercube Sequential Matrix Partition (HS-MP) for efficient data sharing in P2P Networks using tokenizer method is proposed to resolve the problems of the larger P2P networks. The availability of data is first measured by the tokenizer using Dynamic Hypercube Organization. By applying Dynamic Hypercube Organization, that efficiently coordinates and assists the peers in P2P network ensuring data availability at many locations. Each data in peer is then assigned with valid ID by the tokenizer using Sequential Self-Organizing (SSO) ID generation model. This ensures data sharing with other nodes in large P2P network at minimum time interval which is obtained through proximity of data availability. To validate the framework HS-MP, the performance is evaluated using traffic traces collected from data sharing applications. Simulations conducting using Network simulator-2 show that the proposed framework outperforms the conventional streaming models. The performance of the proposed system is analyzed using energy consumption, average latency and average data availability rate with respect to the number of peer nodes, data size, amount of data shared and execution time. The proposed method reduces the energy consumption 43.35% to transpose traffic, 35.29% to bitrev traffic and 25% to bitcomp traffic patterns.
文摘This article investigates the dynamic relationship between technology and AI(artificial intelligence)and the role that societal requirements play in pushing AI research and adoption.Technology has advanced dramatically throughout the years,providing the groundwork for the rise of AI.AI systems have achieved incredible feats in various disciplines thanks to advancements in computer power,data availability,and complex algorithms.On the other hand,society’s needs for efficiency,enhanced healthcare,environmental sustainability,and personalized experiences have worked as powerful accelerators for AI’s progress.This article digs into how technology empowers AI and how societal needs dictate its progress,emphasizing their symbiotic relationship.The findings underline the significance of responsible AI research,which considers both technological prowess and ethical issues,to ensure that AI continues to serve the greater good.
基金partially funded by Germany’s Federal Ministry of Education and Research within the framework of IKARIM and the PARADeS project,grant number 13N15273,the ARSINOE project(GA 101037424)the MIRACA(GA 101093854)under European Union’s H2020 innovation action programme.
文摘Natural hazards impact interdependent infrastructure networks that keep modern society functional.While a va-riety of modelling approaches are available to represent critical infrastructure networks(CINs)on different scales and analyse the impacts of natural hazards,a recurring challenge for all modelling approaches is the availability and accessibility of sufficiently high-quality input and validation data.The resulting data gaps often require mod-ellers to assume specific technical parameters,functional relationships,and system behaviours.In other cases,expert knowledge from one sector is extrapolated to other sectoral structures or even cross-sectorally applied to fill data gaps.The uncertainties introduced by these assumptions and extrapolations and their influence on the quality of modelling outcomes are often poorly understood and difficult to capture,thereby eroding the reliability of these models to guide resilience enhancements.Additionally,ways of overcoming the data avail-ability challenges in CIN modelling,with respect to each modelling purpose,remain an open question.To address these challenges,a generic modelling workflow is derived from existing modelling approaches to examine model definition and validations,as well as the six CIN modelling stages,including mapping of infrastructure assets,quantification of dependencies,assessment of natural hazard impacts,response&recovery,quantification of CI services,and adaptation measures.The data requirements of each stage were systematically defined,and the literature on potential sources was reviewed to enhance data collection and raise awareness of potential pitfalls.The application of the derived workflow funnels into a framework to assess data availability challenges.This is shown through three case studies,taking into account their different modelling purposes:hazard hotspot assess-ments,hazard risk management,and sectoral adaptation.Based on the three model purpose types provided,a framework is suggested to explore the implications of data scarcity for certain data types,as well as their reasons and consequences for CIN model reliability.Finally,a discussion on overcoming the challenges of data scarcity is presented.
基金supported by the Science and Technology Innovation Commission of Shenzhen,China(20231121191245001 and JCYJ20210324095607021 to HX)the Special Project of Key Fields of Universities in Guangdong Province,China(2021ZDZX2047 to HX).
文摘The number of multi-drug-resistant bacteria has increased over the last few decades,which has caused a detrimental impact on public health worldwide.In resolving antibiotic resistance development among different bacterial communities,new antimicrobial agents and nanoparticle-based strategies need to be designed foreseeing the slow discovery of new functioning antibiotics.Advanced research studies have revealed the significant disinfection potential of two-dimensional nanomaterials(2D NMs)to be severed as effective antibacterial agents due to their unique physicochemical properties.This review covers the current research progress of 2D NMs-based antibacterial strategies based on an inclusive explanation of 2D NMs’impact as antibacterial agents,including a detailed introduction to each possible well-known antibacterial mechanism.The impact of the physicochemical properties of 2D NMs on their antibacterial activities has been deliberated while explaining the toxic effects of 2D NMs and discussing their biomedical significance,dysbiosis,and cellular nanotoxicity.Adding to the challenges,we also discussed the major issues regarding the current quality and availability of nanotoxicity data.However,smart advancements are required to fabricate biocompatible 2D antibacterial NMs and exploit their potential to combat bacterial resistance clinically.
基金supported by the National Basic Research Program(973)of China(No.2011CB302303)the National High-Tech R&D Program(863)of China(No.2013AA013203)
文摘Among redundant arrays of independent disks(RAID)-6 codes, maximum distance separable(MDS)based RAID-6 codes are popular because they have the optimal storage efficiency. Although vertical MDS codes exhibit better load balancing compared to horizontal MDS codes in partial stripes, an I/O unbalancing problem still exists in some vertical codes. To address this issue, we propose a novel efficient data layout, uniform P-code(UPC), to support highly balanced I/Os among P-coded disk arrays(i.e., PC). In UPC, the nonuniformly distributed information symbols in each parity chain of P-code are moved along their columns to other rows, thus enabling the parity chain to keep original parity relationships and tolerate double disk failures. The UPC scheme not only achieves optimal storage efficiency, computational complexity, and update complexity, but also supports better I/O balancing in the context of large-scale storage systems. We also conduct a performance study on reconstruction algorithms using an analytical model. Besides extensive theoretical analysis, comparative performance experiments are conducted by replaying real-world workloads under various configurations. Experimental results illustrate that our UPC scheme significantly outperforms the PC scheme in terms of average user response time. In particular, in the case of a 12-disk array, the UPC scheme can improve the access performance of the RAID-6 storage system by29.9% compared to the PC scheme.
基金the National Natural Science Foundation of China(No.62172327)。
文摘To ensure the reliability and availability of data,redundancy strategies are always required for distributed storage systems.Erasure coding,one of the representative redundancy strategies,has the advantage of low storage overhead,which facilitates its employment in distributed storage systems.Among the various erasure coding schemes,XOR-based erasure codes are becoming popular due to their high computing speed.When a single-node failure occurs in such coding schemes,a process called data recovery takes place to retrieve the failed node’s lost data from surviving nodes.However,data transmission during the data recovery process usually requires a considerable amount of time.Current research has focused mainly on reducing the amount of data needed for data recovery to reduce the time required for data transmission,but it has encountered problems such as significant complexity and local optima.In this paper,we propose a random search recovery algorithm,named SA-RSR,to speed up single-node failure recovery of XOR-based erasure codes.SA-RSR uses a simulated annealing technique to search for an optimal recovery solution that reads and transmits a minimum amount of data.In addition,this search process can be done in polynomial time.We evaluate SA-RSR with a variety of XOR-based erasure codes in simulations and in a real storage system,Ceph.Experimental results in Ceph show that SA-RSR reduces the amount of data required for recovery by up to 30.0%and improves the performance of data recovery by up to 20.36%compared to the conventional recovery method.