Our ability to perceive the correlation of different substances in the world is one of the key aspects of human intelligence.The passing of this faculty to artificial intelligence(AI)represents arguably one of the lon...Our ability to perceive the correlation of different substances in the world is one of the key aspects of human intelligence.The passing of this faculty to artificial intelligence(AI)represents arguably one of the long-standing challenges in the application of AI to scientific problems.To meet this challenge in the burgeoning field of AI for chemistry,we may adopt the paradigm of knowledge graph.Herein,focusing on catalytic chemical reactions,we have developed a semantic knowledge graph framework based on both structured and unstructured data,the latter of which are extracted from the text of 220,000articles on catalysts for organic molecules.The framework captures the latent knowledge of reactant-catalyst-product relationships and can therefore provide accurate recommendation on potential catalysts for targeted reaction,which especially facilitates the research involving large molecules.This study presents a viable pathway towards the implementation of literature-based data management in a catalyst recommendation platform.展开更多
With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new ...With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on Iaa S(DHCI) architecture, which includes four key modules: monitoring,scheduling, Virtual Machine(VM) management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Open Stack.Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.展开更多
基金supported by Guangdong Basic and Applied Basic Research Foundation(2023A1515011391 and 2020A1515110843)the Soft Science Research Project of Guangdong Province(2017B030301013)+2 种基金the National Key Research and Development Program of China(2022YFB2702301)the Key-Area Research and Development Program of Guangdong Province(2020B0101090003)the Major Science and Technology Infrastructure Project of Material Genome Big-science Facilities Platform supported by Municipal Development and Reform Commission of Shenzhen
文摘Our ability to perceive the correlation of different substances in the world is one of the key aspects of human intelligence.The passing of this faculty to artificial intelligence(AI)represents arguably one of the long-standing challenges in the application of AI to scientific problems.To meet this challenge in the burgeoning field of AI for chemistry,we may adopt the paradigm of knowledge graph.Herein,focusing on catalytic chemical reactions,we have developed a semantic knowledge graph framework based on both structured and unstructured data,the latter of which are extracted from the text of 220,000articles on catalysts for organic molecules.The framework captures the latent knowledge of reactant-catalyst-product relationships and can therefore provide accurate recommendation on potential catalysts for targeted reaction,which especially facilitates the research involving large molecules.This study presents a viable pathway towards the implementation of literature-based data management in a catalyst recommendation platform.
基金supported by the Open Project Program of Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks(No.WSNLBKF201503)the Fundamental Research Funds for the Central Universities(No.2016JBM011)+2 种基金Fundamental Research Funds for the Central Universities(No.2014ZD03-03)the Priority Academic Program Development of Jiangsu Higher Education InstitutionsJiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology
文摘With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on Iaa S(DHCI) architecture, which includes four key modules: monitoring,scheduling, Virtual Machine(VM) management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Open Stack.Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.