Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data i...Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters.Thus,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes.In performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model.New distributed computing frameworks need to be developed to conquer these challenges.In this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis.In addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.展开更多
Background Pan-genomics is a recently emerging strategy that can be utilized to provide a more comprehensive characterization of genetic variation.Joint calling is routinely used to combine identified variants across ...Background Pan-genomics is a recently emerging strategy that can be utilized to provide a more comprehensive characterization of genetic variation.Joint calling is routinely used to combine identified variants across multiple related samples.However,the improvement of variants identification using the mutual support information from mul-tiple samples remains quite limited for population-scale genotyping.Results In this study,we developed a computational framework for joint calling genetic variants from 5,061 sheep by incorporating the sequencing error and optimizing mutual support information from multiple samples’data.The variants were accurately identified from multiple samples by using four steps:(1)Probabilities of variants from two widely used algorithms,GATK and Freebayes,were calculated by Poisson model incorporating base sequencing error potential;(2)The variants with high mapping quality or consistently identified from at least two samples by GATK and Freebayes were used to construct the raw high-confidence identification(rHID)variants database;(3)The high confidence variants identified in single sample were ordered by probability value and controlled by false discovery rate(FDR)using rHID database;(4)To avoid the elimination of potentially true variants from rHID database,the vari-ants that failed FDR were reexamined to rescued potential true variants and ensured high accurate identification variants.The results indicated that the percent of concordant SNPs and Indels from Freebayes and GATK after our new method were significantly improved 12%-32%compared with raw variants and advantageously found low frequency variants of individual sheep involved several traits including nipples number(GPC5),scrapie pathology(PAPSS2),sea-sonal reproduction and litter size(GRM1),coat color(RAB27A),and lentivirus susceptibility(TMEM154).Conclusion The new method used the computational strategy to reduce the number of false positives,and simulta-neously improve the identification of genetic variants.This strategy did not incur any extra cost by using any addi-tional samples or sequencing data information and advantageously identified rare variants which can be important for practical applications of animal breeding.展开更多
Nowadays,smart buildings rely on Internet of things(loT)technology derived from the cloud and fog computing paradigms to coordinate and collaborate between connected objects.Fog is characterized by low latency with a ...Nowadays,smart buildings rely on Internet of things(loT)technology derived from the cloud and fog computing paradigms to coordinate and collaborate between connected objects.Fog is characterized by low latency with a wider spread and geographically distributed nodes to support mobility,real-time interaction,and location-based services.To provide optimum quality of user life in moderm buildings,we rely on a holistic Framework,designed in a way that decreases latency and improves energy saving and services efficiency with different capabilities.Discrete EVent system Specification(DEVS)is a formalism used to describe simulation models in a modular way.In this work,the sub-models of connected objects in the building are accurately and independently designed,and after installing them together,we easily get an integrated model which is subject to the fog computing Framework.Simulation results show that this new approach significantly,improves energy efficiency of buildings and reduces latency.Additionally,with DEVS,we can easily add or remove sub-models to or from the overall model,allowing us to continually improve our designs.展开更多
Practical real-world scenarios such as the Internet,social networks,and biological networks present the challenges of data scarcity and complex correlations,which limit the applications of artificial intelligence.The ...Practical real-world scenarios such as the Internet,social networks,and biological networks present the challenges of data scarcity and complex correlations,which limit the applications of artificial intelligence.The graph structure is a typical tool used to formulate such correlations,it is incapable of modeling highorder correlations among different objects in systems;thus,the graph structure cannot fully convey the intricate correlations among objects.Confronted with the aforementioned two challenges,hypergraph computation models high-order correlations among data,knowledge,and rules through hyperedges and leverages these high-order correlations to enhance the data.Additionally,hypergraph computation achieves collaborative computation using data and high-order correlations,thereby offering greater modeling flexibility.In particular,we introduce three types of hypergraph computation methods:①hypergraph structure modeling,②hypergraph semantic computing,and③efficient hypergraph computing.We then specify how to adopt hypergraph computation in practice by focusing on specific tasks such as three-dimensional(3D)object recognition,revealing that hypergraph computation can reduce the data requirement by 80%while achieving comparable performance or improve the performance by 52%given the same data,compared with a traditional data-based method.A comprehensive overview of the applications of hypergraph computation in diverse domains,such as intelligent medicine and computer vision,is also provided.Finally,we introduce an open-source deep learning library,DeepHypergraph(DHG),which can serve as a tool for the practical usage of hypergraph computation.展开更多
The three-dimensional discontinuous deformation analysis(3D-DDA) is a promising numerical method for both static and dynamic analyses of rock systems. Lacking mature software, its popularity is far behind its ability....The three-dimensional discontinuous deformation analysis(3D-DDA) is a promising numerical method for both static and dynamic analyses of rock systems. Lacking mature software, its popularity is far behind its ability. To address this problem, this paper presents a new software architecture from a software engineering viewpoint. Based on 3D-DDA characteristics, the implementation of the proposed architecture has the following merits. Firstly, the software architecture separates data, computing, visualization, and signal control into individual modules. Secondly, data storage and parallel access are fully considered for different conditions. Thirdly, an open computing framework is provided which supports most numerical computing methods; common tools for equation solving and parallel computing are provided for further development. Fourthly, efficient visualization functions are provided by integrating a variety of visualization algorithms. A user-friendly graphical user interface is designed to improve the user experience. Finally, through a set of examples, the software is verified against both analytical solutions and the original code by Dr. Shi Gen Hua.展开更多
基金supported by the National Natural Science Foundation of China(No.61972261)Basic Research Foundations of Shenzhen(Nos.JCYJ 20210324093609026 and JCYJ20200813091134001).
文摘Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters.Thus,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes.In performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model.New distributed computing frameworks need to be developed to conquer these challenges.In this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis.In addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.
基金Superior Farms sheep producersIBEST for their supportfinancial support from the Idaho Global Entrepreneurial Mission
文摘Background Pan-genomics is a recently emerging strategy that can be utilized to provide a more comprehensive characterization of genetic variation.Joint calling is routinely used to combine identified variants across multiple related samples.However,the improvement of variants identification using the mutual support information from mul-tiple samples remains quite limited for population-scale genotyping.Results In this study,we developed a computational framework for joint calling genetic variants from 5,061 sheep by incorporating the sequencing error and optimizing mutual support information from multiple samples’data.The variants were accurately identified from multiple samples by using four steps:(1)Probabilities of variants from two widely used algorithms,GATK and Freebayes,were calculated by Poisson model incorporating base sequencing error potential;(2)The variants with high mapping quality or consistently identified from at least two samples by GATK and Freebayes were used to construct the raw high-confidence identification(rHID)variants database;(3)The high confidence variants identified in single sample were ordered by probability value and controlled by false discovery rate(FDR)using rHID database;(4)To avoid the elimination of potentially true variants from rHID database,the vari-ants that failed FDR were reexamined to rescued potential true variants and ensured high accurate identification variants.The results indicated that the percent of concordant SNPs and Indels from Freebayes and GATK after our new method were significantly improved 12%-32%compared with raw variants and advantageously found low frequency variants of individual sheep involved several traits including nipples number(GPC5),scrapie pathology(PAPSS2),sea-sonal reproduction and litter size(GRM1),coat color(RAB27A),and lentivirus susceptibility(TMEM154).Conclusion The new method used the computational strategy to reduce the number of false positives,and simulta-neously improve the identification of genetic variants.This strategy did not incur any extra cost by using any addi-tional samples or sequencing data information and advantageously identified rare variants which can be important for practical applications of animal breeding.
文摘Nowadays,smart buildings rely on Internet of things(loT)technology derived from the cloud and fog computing paradigms to coordinate and collaborate between connected objects.Fog is characterized by low latency with a wider spread and geographically distributed nodes to support mobility,real-time interaction,and location-based services.To provide optimum quality of user life in moderm buildings,we rely on a holistic Framework,designed in a way that decreases latency and improves energy saving and services efficiency with different capabilities.Discrete EVent system Specification(DEVS)is a formalism used to describe simulation models in a modular way.In this work,the sub-models of connected objects in the building are accurately and independently designed,and after installing them together,we easily get an integrated model which is subject to the fog computing Framework.Simulation results show that this new approach significantly,improves energy efficiency of buildings and reduces latency.Additionally,with DEVS,we can easily add or remove sub-models to or from the overall model,allowing us to continually improve our designs.
文摘Practical real-world scenarios such as the Internet,social networks,and biological networks present the challenges of data scarcity and complex correlations,which limit the applications of artificial intelligence.The graph structure is a typical tool used to formulate such correlations,it is incapable of modeling highorder correlations among different objects in systems;thus,the graph structure cannot fully convey the intricate correlations among objects.Confronted with the aforementioned two challenges,hypergraph computation models high-order correlations among data,knowledge,and rules through hyperedges and leverages these high-order correlations to enhance the data.Additionally,hypergraph computation achieves collaborative computation using data and high-order correlations,thereby offering greater modeling flexibility.In particular,we introduce three types of hypergraph computation methods:①hypergraph structure modeling,②hypergraph semantic computing,and③efficient hypergraph computing.We then specify how to adopt hypergraph computation in practice by focusing on specific tasks such as three-dimensional(3D)object recognition,revealing that hypergraph computation can reduce the data requirement by 80%while achieving comparable performance or improve the performance by 52%given the same data,compared with a traditional data-based method.A comprehensive overview of the applications of hypergraph computation in diverse domains,such as intelligent medicine and computer vision,is also provided.Finally,we introduce an open-source deep learning library,DeepHypergraph(DHG),which can serve as a tool for the practical usage of hypergraph computation.
基金supported by the National Natural Science Foundation of China(Grant No.61471338)the Knowledge Innovation Program of the Chinese Academy of Sciences,Youth Innovation Promotion Association CAS,President Fund of UCASCRSRI Open Research Program(Grant No.CKWV2015217/KY)
文摘The three-dimensional discontinuous deformation analysis(3D-DDA) is a promising numerical method for both static and dynamic analyses of rock systems. Lacking mature software, its popularity is far behind its ability. To address this problem, this paper presents a new software architecture from a software engineering viewpoint. Based on 3D-DDA characteristics, the implementation of the proposed architecture has the following merits. Firstly, the software architecture separates data, computing, visualization, and signal control into individual modules. Secondly, data storage and parallel access are fully considered for different conditions. Thirdly, an open computing framework is provided which supports most numerical computing methods; common tools for equation solving and parallel computing are provided for further development. Fourthly, efficient visualization functions are provided by integrating a variety of visualization algorithms. A user-friendly graphical user interface is designed to improve the user experience. Finally, through a set of examples, the software is verified against both analytical solutions and the original code by Dr. Shi Gen Hua.