Based on the modern earthquake catalogue,the incomplete centroidal voronoi tessellation(ICVT)method was used in this study to estimate the seismic hazard in Sichuan-Yunnan region of China.We calculated spatial distrib...Based on the modern earthquake catalogue,the incomplete centroidal voronoi tessellation(ICVT)method was used in this study to estimate the seismic hazard in Sichuan-Yunnan region of China.We calculated spatial distributions of the total seismic hazard and background seismic hazard in this area.The Bayesian delaunay tessellation smoothing method put forward by Ogata was used to calculate the spatial distributions of b-value.The results show that seismic hazards in Sichuan-Yunan region are high,and areas with relatively high hazard values are distributed along the main faults,while seismic hazards in Sichuan basin are relatively low.展开更多
For structural comparisons of paired prokaryotic genomes,an important topic in synthetic and evolutionary biology,the locations of shared orthologous genes(henceforth orthologs)are observed as binned data.This and oth...For structural comparisons of paired prokaryotic genomes,an important topic in synthetic and evolutionary biology,the locations of shared orthologous genes(henceforth orthologs)are observed as binned data.This and other data,e.g.,wind directions recorded at monitoring sites and intensive care unit arrival times on the 24-hour clock,are counted in binned circular arcs,thus modeling them by discrete circular distributions(DCDs)is required.We propose a novel method to construct a DCD from a base continuous circular distribution(CCD).The probability mass function is defined to take the normalized values of the probability density function at some pre-fixed equidistant points on the circle.Five families of constructed DCDs which have normalizing constants in closed form are presented.Simulation studies show that DCDs outperform the corresponding CCDs in modeling grouped(discrete)circular data,and minimum chi-square estimation outperforms maximum likelihood estimation for parameters.We apply the constructed DCDs,invariant wrapped Poisson and wrapped discrete skew Laplace to compare the structures of paired bacterial genomes.Specifically,discrete four-parameter wrapped Cauchy(nonnegative trigonometric sums)distribution models multi-modal shared orthologs in Clostridium(Sulfolobus)better than the others considered,in terms of AIC and Freedman’s goodness-of-fit test.The result that different DCDs fit the shared orthologs is consistent with the fact they belong to two kingdoms.Nevertheless,these prokaryotes have a common favored site around 70°on the unit circle;this finding is important for building synthetic prokaryotic genomes in synthetic biology.These DCDs can also be applied to other binned circular data.展开更多
The use of machine learning in computational molecular design has great potential to accelerate the discovery of innovative materials.However,its practical benefits still remain unproven in real-world applications,par...The use of machine learning in computational molecular design has great potential to accelerate the discovery of innovative materials.However,its practical benefits still remain unproven in real-world applications,particularly in polymer science.We demonstrate the successful discovery of new polymers with high thermal conductivity,inspired by machine-learning-assisted polymer chemistry.This discovery was made by the interplay between machine intelligence trained on a substantially limited amount of polymeric properties data,expertise from laboratory synthesis and advanced technologies for thermophysical property measurements.Using a molecular design algorithm trained to recognize quantitative structure—property relationships with respect to thermal conductivity and other targeted polymeric properties,we identified thousands of promising hypothetical polymers.From these candidates,three were selected for monomer synthesis and polymerization because of their synthetic accessibility and their potential for ease of processing in further applications.The synthesized polymers reached thermal conductivities of 0.18–0.41 W/mK,which are comparable to those of state-of-the-art polymers in non-composite thermo-plastics.展开更多
Earthquakes are one of the natural disasters that pose a major threat to human lives and property. Earthquake prediction propels the construction and development of modern seismology;however, current deterministic ear...Earthquakes are one of the natural disasters that pose a major threat to human lives and property. Earthquake prediction propels the construction and development of modern seismology;however, current deterministic earthquake prediction is limited by numerous difficulties. Identifying the temporal and spatial statistical characteristics of earthquake occurrences and constructing earthquake risk statistical prediction models have become significant;particularly for evaluating earthquake risks and addressing seismic planning requirements such as the design of cities and lifeline projects based on the obtained insight. Since the 21 st century, the occurrence of a series of strong earthquakes represented by the Wenchuan M8 earthquake in 2008 in certain low-risk prediction areas has caused seismologists to reflect on traditional seismic hazard assessment globally. This article briefly reviews the development of statistical seismology, emphatically analyzes the research results and existing problems of statistical seismology in seismic hazard assessment, and discusses the direction of its development. The analysis shows that the seismic hazard assessment based on modern earthquake catalogues in most regions should be effective. Particularly, the application of seismic hazard assessment based on ETAS(epidemic type aftershock sequence)should be the easiest and most effective method for the compilation of seismic hazard maps in large urban agglomeration areas and low seismic hazard areas with thick sedimentary zones.展开更多
Materials informatics has significantly accelerated the discovery and analysis of materials in the past decade.One of the key contributors to accelerated materials discovery is the use of on-the-fly data analysis with...Materials informatics has significantly accelerated the discovery and analysis of materials in the past decade.One of the key contributors to accelerated materials discovery is the use of on-the-fly data analysis with high-throughput experiments,which has given rise to the need for accelerated and accurate automated estimation of the properties of materials.In this regard,spectroscopic data are widely used for materials discovery because these data include essential information about materials.An important requirement for the realisation of the automated estimation of materials parameters is the selection of a similarity measure,or kernel function.The required measure should be robust in terms of peak shifting,peak broadening,and noise.However,the determination of appropriate similarity measures for spectra and the automated estimation of materials parameters from these spectra currently remain unresolved.We examined major similarity measures to evaluate the similarity of both X-ray absorption and electron energy-loss spectra.The similarity measures show good correspondence with the materials parameter,that is,the crystal-field parameter,in all measures.The Pearson's correlation coefficient was the highest for the robustness against noise and peak broadening.We obtained the regression model for the crystal-field parameter 10 Dq from the similarity of the spectra.The regression model enabled the materials parameter,that is,10 Dq,to be automatically estimated from the spectra.With regard to research progress in similarity measures,this methodology would make it possible to extract the materials parameter from a large-scale dataset of experimental data.展开更多
Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses during epidemic outbreaks.In this study,we impr...Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses during epidemic outbreaks.In this study,we improve the estimation of the effective reproduction number through two main approaches.First,we derive a discrete model to represent a time series of case counts and propose an estimation method based on this framework.We also conduct numerical experiments to demonstrate the effectiveness of the proposed discretization scheme.By doing so,we enhance the accuracy of approximating the underlying epidemic process compared to previous methods,even when the counting period is similar to the mean generation time of an infectious disease.Second,we employ a negative binomial distribution to model the variability of count data to accommodate overdispersion.Specifically,given that observed incidence counts follow a negative binomial distribution,the posterior distribution of secondary infections is obtained as a Dirichlet multinomial distribution.With this formulation,we establish posterior uncertainty bounds for the effective reproduction number.Finally,we demonstrate the effectiveness of the proposed method using incidence data from the COVID-19 pandemic.展开更多
The spread of data-driven materials research has increased the need for systematically designed materials property databases.However,the development of polymer databases has lagged far behind other material systems.We...The spread of data-driven materials research has increased the need for systematically designed materials property databases.However,the development of polymer databases has lagged far behind other material systems.We present RadonPy,an open-source library that can automate the complete process of all-atom classical molecular dynamics(MD)simulations applicable to a wide variety of polymeric materials.Herein,15 different properties were calculated for more than 1000 amorphous polymers.The MD-calculated properties were systematically compared with experimental data to validate the calculation conditions;the bias and variance in the MD-calculated properties were successfully calibrated by a machine learning technique.During the high-throughput data production,we identified eight amorphous polymers with extremely high thermal conductivity(>0.4 W∙m^(–1)∙K^(–1))and their underlying mechanisms.Similar to the advancement of materials informatics since the advent of computational property databases for inorganic crystals,database construction using RadonPy will promote the development of polymer informatics.展开更多
The automated stopping of a spectral measurement with active learning is proposed.The optimal stopping of the measurement is realised with a stopping criterion based on the upper bound of the posterior average of the ...The automated stopping of a spectral measurement with active learning is proposed.The optimal stopping of the measurement is realised with a stopping criterion based on the upper bound of the posterior average of the generalisation error of the Gaussian process regression.It is revealed that the automated stopping criterion of the spectral measurement gives an approximated X-ray absorption spectrum with sufficient accuracy and reduced data size.The proposed method is not only a proof-of-concept of the optimal stopping problem in active learning but also the key to enhancing the efficiency of spectral measurements for highthroughput experiments in the era of materials informatics.展开更多
Recent progress in material data mining has been driven by high-capacity models trained on large datasets.However,collecting experimental data(real data)has been extremely costly owing to the amount of human effort an...Recent progress in material data mining has been driven by high-capacity models trained on large datasets.However,collecting experimental data(real data)has been extremely costly owing to the amount of human effort and expertise required.Here,we develop a novel transfer learning strategy to address problems of small or insufficient data.This strategy realizes the fusion of real and simulated data and the augmentation of training data in a data mining procedure.For a specific task of grain instance image segmentation,this strategy aims to generate synthetic data by fusing the images obtained from simulating the physical mechanism of grain formation and the“image style”information in real images.The results show that the model trained with the acquired synthetic data and only 35%of the real data can already achieve competitive segmentation performance of a model trained on all of the real data.Because the time required to perform grain simulation and to generate synthetic data are almost negligible as compared to the effort for obtaining real data,our proposed strategy is able to exploit the strong prediction power of deep learning without significantly increasing the experimental burden of training data preparation.展开更多
基金Ningxia Hui Autonomous Region Key R&D Plan East West cooperation Project(No.2018BFG02011)National Natural Science Foundation of China(No.41674047)China Earthquake Science Experiment Site Project,CEA(Nos.2019CSES0105 and 2019CSES0106).
文摘Based on the modern earthquake catalogue,the incomplete centroidal voronoi tessellation(ICVT)method was used in this study to estimate the seismic hazard in Sichuan-Yunnan region of China.We calculated spatial distributions of the total seismic hazard and background seismic hazard in this area.The Bayesian delaunay tessellation smoothing method put forward by Ogata was used to calculate the spatial distributions of b-value.The results show that seismic hazards in Sichuan-Yunan region are high,and areas with relatively high hazard values are distributed along the main faults,while seismic hazards in Sichuan basin are relatively low.
基金supported by JSPS KAKENHI Grant Number 18K13459 and Grace S.Shieh was supported in part by MOST 106-2118-M-001-017 and MOST 107-2118-M-001-009-MY2.
文摘For structural comparisons of paired prokaryotic genomes,an important topic in synthetic and evolutionary biology,the locations of shared orthologous genes(henceforth orthologs)are observed as binned data.This and other data,e.g.,wind directions recorded at monitoring sites and intensive care unit arrival times on the 24-hour clock,are counted in binned circular arcs,thus modeling them by discrete circular distributions(DCDs)is required.We propose a novel method to construct a DCD from a base continuous circular distribution(CCD).The probability mass function is defined to take the normalized values of the probability density function at some pre-fixed equidistant points on the circle.Five families of constructed DCDs which have normalizing constants in closed form are presented.Simulation studies show that DCDs outperform the corresponding CCDs in modeling grouped(discrete)circular data,and minimum chi-square estimation outperforms maximum likelihood estimation for parameters.We apply the constructed DCDs,invariant wrapped Poisson and wrapped discrete skew Laplace to compare the structures of paired bacterial genomes.Specifically,discrete four-parameter wrapped Cauchy(nonnegative trigonometric sums)distribution models multi-modal shared orthologs in Clostridium(Sulfolobus)better than the others considered,in terms of AIC and Freedman’s goodness-of-fit test.The result that different DCDs fit the shared orthologs is consistent with the fact they belong to two kingdoms.Nevertheless,these prokaryotes have a common favored site around 70°on the unit circle;this finding is important for building synthetic prokaryotic genomes in synthetic biology.These DCDs can also be applied to other binned circular data.
基金This work was supported in part by the“Materials Research by Information Integration”Initiative(MI2I)project of the Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency(JST)and a Grant-in-Aid for Scientific Research(B)15H02672 from the Japan Society for the Promotion of Science(JSPS)S.W.gratefully acknowledges financial support from JSPS KAKENHI Grant Number JP18K18017+3 种基金K.H.gratefully acknowledges financial support from JSPS KAKENHI Grant Number JP17K17762a Grant-in-Aid for Scientific Research on Innovative Areas(16H06439)and PRESTO(JPMJPR16NA)C.S.gratefully acknowledges financial support from the Ministry of Education and Science of the Russian Federation(Grant 14.Y26.31.0019)J.M.acknowledges partial financial support by JSPS KAKENHI Grant Number JP16K06768.
文摘The use of machine learning in computational molecular design has great potential to accelerate the discovery of innovative materials.However,its practical benefits still remain unproven in real-world applications,particularly in polymer science.We demonstrate the successful discovery of new polymers with high thermal conductivity,inspired by machine-learning-assisted polymer chemistry.This discovery was made by the interplay between machine intelligence trained on a substantially limited amount of polymeric properties data,expertise from laboratory synthesis and advanced technologies for thermophysical property measurements.Using a molecular design algorithm trained to recognize quantitative structure—property relationships with respect to thermal conductivity and other targeted polymeric properties,we identified thousands of promising hypothetical polymers.From these candidates,three were selected for monomer synthesis and polymerization because of their synthetic accessibility and their potential for ease of processing in further applications.The synthesized polymers reached thermal conductivities of 0.18–0.41 W/mK,which are comparable to those of state-of-the-art polymers in non-composite thermo-plastics.
基金This work was supported by the National Natural Science Foundation of China(Grant No.U2039204)the National Key R&D Program of China(Grant No.2018YFC1504203).
文摘Earthquakes are one of the natural disasters that pose a major threat to human lives and property. Earthquake prediction propels the construction and development of modern seismology;however, current deterministic earthquake prediction is limited by numerous difficulties. Identifying the temporal and spatial statistical characteristics of earthquake occurrences and constructing earthquake risk statistical prediction models have become significant;particularly for evaluating earthquake risks and addressing seismic planning requirements such as the design of cities and lifeline projects based on the obtained insight. Since the 21 st century, the occurrence of a series of strong earthquakes represented by the Wenchuan M8 earthquake in 2008 in certain low-risk prediction areas has caused seismologists to reflect on traditional seismic hazard assessment globally. This article briefly reviews the development of statistical seismology, emphatically analyzes the research results and existing problems of statistical seismology in seismic hazard assessment, and discusses the direction of its development. The analysis shows that the seismic hazard assessment based on modern earthquake catalogues in most regions should be effective. Particularly, the application of seismic hazard assessment based on ETAS(epidemic type aftershock sequence)should be the easiest and most effective method for the compilation of seismic hazard maps in large urban agglomeration areas and low seismic hazard areas with thick sedimentary zones.
基金This work is partly supported by the Elements Strategy Initiative Centre for Magnetic Materials(ESICMM)under the outsourcing project of the Ministry of Education,Culture,Sports,Science,Technology(MEXT)This work is partly supported in part by‘Materials Research by Information Integration’Initiative(MI2I)project of the Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency(JST)+1 种基金H.H.is partly supported by JST CREST grant number JPMJCR1761.Y.S.is supported by JST,ACT-I,grant Number JPMJPR18UEK.O.gratefully acknowledges the financial support by Toyota Motor Corporation.
文摘Materials informatics has significantly accelerated the discovery and analysis of materials in the past decade.One of the key contributors to accelerated materials discovery is the use of on-the-fly data analysis with high-throughput experiments,which has given rise to the need for accelerated and accurate automated estimation of the properties of materials.In this regard,spectroscopic data are widely used for materials discovery because these data include essential information about materials.An important requirement for the realisation of the automated estimation of materials parameters is the selection of a similarity measure,or kernel function.The required measure should be robust in terms of peak shifting,peak broadening,and noise.However,the determination of appropriate similarity measures for spectra and the automated estimation of materials parameters from these spectra currently remain unresolved.We examined major similarity measures to evaluate the similarity of both X-ray absorption and electron energy-loss spectra.The similarity measures show good correspondence with the materials parameter,that is,the crystal-field parameter,in all measures.The Pearson's correlation coefficient was the highest for the robustness against noise and peak broadening.We obtained the regression model for the crystal-field parameter 10 Dq from the similarity of the spectra.The regression model enabled the materials parameter,that is,10 Dq,to be automatically estimated from the spectra.With regard to research progress in similarity measures,this methodology would make it possible to extract the materials parameter from a large-scale dataset of experimental data.
文摘Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses during epidemic outbreaks.In this study,we improve the estimation of the effective reproduction number through two main approaches.First,we derive a discrete model to represent a time series of case counts and propose an estimation method based on this framework.We also conduct numerical experiments to demonstrate the effectiveness of the proposed discretization scheme.By doing so,we enhance the accuracy of approximating the underlying epidemic process compared to previous methods,even when the counting period is similar to the mean generation time of an infectious disease.Second,we employ a negative binomial distribution to model the variability of count data to accommodate overdispersion.Specifically,given that observed incidence counts follow a negative binomial distribution,the posterior distribution of secondary infections is obtained as a Dirichlet multinomial distribution.With this formulation,we establish posterior uncertainty bounds for the effective reproduction number.Finally,we demonstrate the effectiveness of the proposed method using incidence data from the COVID-19 pandemic.
基金The numerical calculations were conducted on the five supercomputer systems,Fugaku at the RIKEN Center for Computational Science,Kobe,Japanthe supercomputer at the Research Center for Computational Science,Okazaki,Japan(Project:21-IMS-C126,22-IMS-C125)+7 种基金the supercomputer Ohtaka at the Supercomputer Center,the Institute for Solid State Physics,the University of Tokyo,Tokyo,Japanthe supercomputer TSUBAME3.0 at the Tokyo Institute of Technology,Tokyo,Japanthe supercomputer ABCI at the National Institute of Advanced Industrial Science and Technology,Tsukuba,JapanThis work was supported by the following five grants:a JST CREST(Grant Number JPMJCR19I3 to J.M.and R.Y.)the MEXT as“Program for Promoting Researches on the Supercomputer Fugaku”(Project ID:hp210264 to R.Y.)the Grant-in-Aid for Scientific Research(A)from the Japan Society for the Promotion of Science(19H01132 to R.Y.)the Grant-in-Aid for Scientific Research(C)from the Japan Society for the Promotion of Science(22K11949 to Y.H.)the HPCI System Research Project(Project ID:hp210213 to Y.H.).
文摘The spread of data-driven materials research has increased the need for systematically designed materials property databases.However,the development of polymer databases has lagged far behind other material systems.We present RadonPy,an open-source library that can automate the complete process of all-atom classical molecular dynamics(MD)simulations applicable to a wide variety of polymeric materials.Herein,15 different properties were calculated for more than 1000 amorphous polymers.The MD-calculated properties were systematically compared with experimental data to validate the calculation conditions;the bias and variance in the MD-calculated properties were successfully calibrated by a machine learning technique.During the high-throughput data production,we identified eight amorphous polymers with extremely high thermal conductivity(>0.4 W∙m^(–1)∙K^(–1))and their underlying mechanisms.Similar to the advancement of materials informatics since the advent of computational property databases for inorganic crystals,database construction using RadonPy will promote the development of polymer informatics.
基金This work was supported by JST-Mirai Program Grant Numbers JPMJMI19G1 and JPMJMI21G2T.U.acknowledges the support of JSPS KAKENHI Grant Number JP18K13984 and QST President’s Strategic Grant(Exploratory Research).H.H.acknowledges the support of NEDO Grant Number JPNP18002 and JST CREST Grant Number JPMJCR1761+2 种基金This work was carried out under the ISM Cooperative Research Program(H30-J-4302 and 2019-ISMCRP-4206)The XAS experiment was performed under the approval of the Photon Factory Program Advisory Committee(Proposal No.2018MP001)The authors thank Dr.Yasuo Takeichi for the support of the experiments at the Photon Factory.
文摘The automated stopping of a spectral measurement with active learning is proposed.The optimal stopping of the measurement is realised with a stopping criterion based on the upper bound of the posterior average of the generalisation error of the Gaussian process regression.It is revealed that the automated stopping criterion of the spectral measurement gives an approximated X-ray absorption spectrum with sufficient accuracy and reduced data size.The proposed method is not only a proof-of-concept of the optimal stopping problem in active learning but also the key to enhancing the efficiency of spectral measurements for highthroughput experiments in the era of materials informatics.
基金The authors acknowledge financial support from the National Key Research and Development Program of China(No.2016YFB0700500)the National Science Foundation of China(No.51574027,No.61572075,No.6170203,No.61873299)+1 种基金the Finance science and technology project of Hainan province(No.ZDYF2019009)the Fundamental Research Funds for the University of Science and Technology Beijing(No.FRF-BD-19-012A,No.FRF-TP-19-043A2).
文摘Recent progress in material data mining has been driven by high-capacity models trained on large datasets.However,collecting experimental data(real data)has been extremely costly owing to the amount of human effort and expertise required.Here,we develop a novel transfer learning strategy to address problems of small or insufficient data.This strategy realizes the fusion of real and simulated data and the augmentation of training data in a data mining procedure.For a specific task of grain instance image segmentation,this strategy aims to generate synthetic data by fusing the images obtained from simulating the physical mechanism of grain formation and the“image style”information in real images.The results show that the model trained with the acquired synthetic data and only 35%of the real data can already achieve competitive segmentation performance of a model trained on all of the real data.Because the time required to perform grain simulation and to generate synthetic data are almost negligible as compared to the effort for obtaining real data,our proposed strategy is able to exploit the strong prediction power of deep learning without significantly increasing the experimental burden of training data preparation.