The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with...The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with molecular simulations to improve the sampling efficiency of the vast conformational space of large biomolecules.This review focuses on recent studies that utilize ML-based techniques in the exploration of protein conformational landscape.We first highlight the recent development of ML-aided enhanced sampling methods,including heuristic algorithms and neural networks that are designed to refine the selection of reaction coordinates for the construction of bias potential,or facilitate the exploration of the unsampled region of the energy landscape.Further,we review the development of autoencoder based methods that combine molecular simulations and deep learning to expand the search for protein conformations.Lastly,we discuss the cutting-edge methodologies for the one-shot generation of protein conformations with precise Boltzmann weights.Collectively,this review demonstrates the promising potential of machine learning in revolutionizing our insight into the complex conformational ensembles of proteins.展开更多
In recent years, the water content of oilfield production fluid is high and there is a large amount of oily sewage. In order to improve the capability of sewage treatment, usually using demulsifier for oily sewage pro...In recent years, the water content of oilfield production fluid is high and there is a large amount of oily sewage. In order to improve the capability of sewage treatment, usually using demulsifier for oily sewage processing. This article uses simulated water sample to test the treatment effect of the optimized reverse demulsifier at different oscillation time. As the increase of action time and oscillation, the average size of droplets increases and the amount of the droplets under 1 μm decreases.展开更多
In structural reliability analysis,simulation methods are widely used.The statistical characteristics of failure probability estimate of these methods have been well investigated.In this study,the sensitivities of the...In structural reliability analysis,simulation methods are widely used.The statistical characteristics of failure probability estimate of these methods have been well investigated.In this study,the sensitivities of the failure probability estimate and its statistical characteristics with regard to sample,called‘contribution indexes’,are proposed to measure the contribution of sample.The contribution indexes in four widely simulation methods,i.e.,Monte Carlo simulation(MCS),importance sampling(IS),line sampling(LS)and subset simulation(SS)are derived and analyzed.The proposed contribution indexes of sample can provide valuable information understanding the methods deeply,and enlighten potential improvement of methods.It is found that the main differences between these investigated methods lie in the contribution indexes of the safety samples,which are the main factors to the efficiency of the methods.Moreover,numerical examples are used to validate these findings.展开更多
Event correlation is one key technique in network fault management. For the event sample acquisition problem in event correlation, a novel approach is proposed to collect the samples by constructing network simulation...Event correlation is one key technique in network fault management. For the event sample acquisition problem in event correlation, a novel approach is proposed to collect the samples by constructing network simulation platform. The platform designed can set kinds of network faults according to user's demand and generate a lot of network fault events, which will benefit the research on efficient event correlation techniques.展开更多
Binary logistic regression models are commonly used to assess the association between outcomes and covariates. Many covariates are inherently continuous, and have a variety of distributions, including those that are h...Binary logistic regression models are commonly used to assess the association between outcomes and covariates. Many covariates are inherently continuous, and have a variety of distributions, including those that are heavily skewed to the left or right. Existing theoretical formulas, criteria, and simulation programs cannot accurately estimate the sample size and power of non-standard distributions. Therefore, we have developed a simulation program that uses Monte Carlo methods to estimate the exact power of a binary logistic regression model. This power calculation can be used for distributions of any shape and covariates of any type (continuous, ordinal, and nominal), and can account for nonlinear relationships between covariates and outcomes. For illustrative purposes, this simulation program is applied to real data obtained from a study on the influence of smoking on 90-day outcomes after acute atherothrombotic stroke. Our program is applicable to all effect sizes and makes it possible to apply various statistical methods, logistic regression and related simulations such as Bayesian inference with some modifications.展开更多
Water conservation initiatives promote installation of water efficient and low-flow appliances in waste water collection systems. This has resulted in lower flow rates in those systems than the intended design loading...Water conservation initiatives promote installation of water efficient and low-flow appliances in waste water collection systems. This has resulted in lower flow rates in those systems than the intended design loading, causing solid deposition and sedimentation in some areas. A joint UKWIR/EPSRC CASE grant (14440031) has funded the work described in this paper which investigates sedimentation and solid deposition in building drainage system pipes. The purpose of this paper is to detail the design, calibration and operation of a sediment dosing apparatus to simulate sedimentation rates and explore possible solutions to this issue with a full scale laboratory model based on real site data. The methodology adopted is an experimental approach, where tests have been conducted on the sediment dosing apparatus based on calculations and observations to determine an appropriate sediment dosing regime representative of typical systems. Further tests were conducted with the addition of everyday household products to investigate their effects on sedimentation. The results indicated that a suitable dosing rate was approximately 12% weight-to-volume (w/v) of a fine sand with a known particle size distribution, diluted 1:5 in a clean water base flow. It was also shown that the addition of the household products added to the problem of sedimentation within drainage systems. The results give excellent correlation to real site data, with deposition depth and distribution comparable to measured site data to within 10%. The deposition was achieved within three hours, which approximated six weeks deposition in the live site used in the study. This straightforward investigation details the design, construction and testing of a device to cause accelerated sedimentation in a full scale model of a building drainage system. This is the first step in the process of updating research underpinning our understanding of the behaviour of these systems under conditions of low flow rates caused by water conservation, sedimentation, and the use of common household additives. It will be used to improve simulation of water flow and solid transport in sediment-laden systems. Specifically, the results will be used to determine refinements required to a specific drainage simulation model (DRAINET), which currently has an unquantified sedimentation component. This work is part of a larger body of current research funded by two joint EPSRK/UKWIR grants.展开更多
A composite random variable is a product (or sum of products) of statistically distributed quantities. Such a variable can represent the solution to a multi-factor quantitative problem submitted to a large, diverse, i...A composite random variable is a product (or sum of products) of statistically distributed quantities. Such a variable can represent the solution to a multi-factor quantitative problem submitted to a large, diverse, independent, anonymous group of non-expert respondents (the “crowd”). The objective of this research is to examine the statistical distribution of solutions from a large crowd to a quantitative problem involving image analysis and object counting. Theoretical analysis by the author, covering a range of conditions and types of factor variables, predicts that composite random variables are distributed log-normally to an excellent approximation. If the factors in a problem are themselves distributed log-normally, then their product is rigorously log-normal. A crowdsourcing experiment devised by the author and implemented with the assistance of a BBC (British Broadcasting Corporation) television show, yielded a sample of approximately 2000 responses consistent with a log-normal distribution. The sample mean was within ~12% of the true count. However, a Monte Carlo simulation (MCS) of the experiment, employing either normal or log-normal random variables as factors to model the processes by which a crowd of 1 million might arrive at their estimates, resulted in a visually perfect log-normal distribution with a mean response within ~5% of the true count. The results of this research suggest that a well-modeled MCS, by simulating a sample of responses from a large, rational, and incentivized crowd, can provide a more accurate solution to a quantitative problem than might be attainable by direct sampling of a smaller crowd or an uninformed crowd, irrespective of size, that guesses randomly.展开更多
Fluorescence tomography can obtain a sufficient dataset and optimal three-dimensional imageswhen projections are captured over 360◦ by CCD camera. Herein a non-stop dynamic samplingmode for fluorescence tomography is ...Fluorescence tomography can obtain a sufficient dataset and optimal three-dimensional imageswhen projections are captured over 360◦ by CCD camera. Herein a non-stop dynamic samplingmode for fluorescence tomography is proposed in an attempt to improve the optical measurementspeed of the traditional imaging system and stability of the object to be imaged. A series ofsimulations are carried out to evaluate the accuracy of dataset acquired from the dynamicsampling mode. Reconstruction with the corresponding data obtained in the dynamic-modeprocess is also performed with the phantom. The results demonstrate the feasibility of suchan imaging mode when the angular velocity is set to the appropriate value, thus laying thefoundation for real experiments to verify the superiority in performance of this new imagingmode over the traditional one.展开更多
This study used Ecopath model of the Jiaozhou Bay as an example to evaluate the effect of stomach sample size of three fish species on the projection of this model. The derived ecosystem indices were classified into t...This study used Ecopath model of the Jiaozhou Bay as an example to evaluate the effect of stomach sample size of three fish species on the projection of this model. The derived ecosystem indices were classified into three categories:(1) direct indices, like the trophic level of species, influenced by stomach sample size directly;(2)indirect indices, like ecology efficiency(EE) of invertebrates, influenced by the multiple prey-predator relationships;and(3) systemic indices, like total system throughout(TST), describing the status of the whole ecosystem. The influences of different stomach sample sizes on these indices were evaluated. The results suggest that systemic indices of the ecosystem model were robust to stomach sample sizes, whereas specific indices related to species were indicated to be with low accuracy and precision when stomach samples were insufficient.The indices became more uncertain when the stomach sample sizes varied for more species. This study enhances the understanding of how the quality of diet composition data influences ecosystem modeling outputs. The results can also guide the design of stomach content analysis for developing ecosystem models.展开更多
An analytical solution is derived for the probability that a random pair of individuals from a panmictic population of size N will share ancestors who lived G generations previously. The analysis is extended to obtain...An analytical solution is derived for the probability that a random pair of individuals from a panmictic population of size N will share ancestors who lived G generations previously. The analysis is extended to obtain 1) the probability that a sample of size s will contain at least one pair of (G - 1)<sup>th</sup> cousins;and 2) the expected number of pairs of (G - 1)<sup>th</sup> cousins in that sample. Solutions are given for both monogamous and promiscuous (non-monogamous) cases. Simulation results for a population size of N = 20,000 closely approximate the analytical expectations. Simulation results also agree very well with previously derived expectations for the proportion of unrelated individuals in a sample. The analysis is broadly consistent with genetic estimates of relatedness among a sample of 406 Danish school children, but suggests that a different genetic study of a heterogenous sample of Europeans overestimates the frequency of cousin pairs by as much as one order of magnitude.展开更多
This paper aims to discuss how to effectively suppress intersymbol interference by optimizing the filter design, so as to achieve a distortion-free output effect, and effectively compensate the transmission characteri...This paper aims to discuss how to effectively suppress intersymbol interference by optimizing the filter design, so as to achieve a distortion-free output effect, and effectively compensate the transmission characteristics of the baseband transmission system in a non-ideal channel environment, so as to minimize the impact of intersymbol crosser. The simulation experiment model of digital optimal baseband transmission and the overall structure of the system are designed based on the Matlab simulation platform, and the parameters of each module in the simulation experiment model are set. The working process and performance of the digital optimal baseband transmission system are simulated, and the conditions and performance of the digital optimal baseband transmission system are verified according to the simulation results.展开更多
基金Project supported by the National Key Research and Development Program of China(Grant No.2023YFF1204402)the National Natural Science Foundation of China(Grant Nos.12074079 and 12374208)+1 种基金the Natural Science Foundation of Shanghai(Grant No.22ZR1406800)the China Postdoctoral Science Foundation(Grant No.2022M720815).
文摘The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with molecular simulations to improve the sampling efficiency of the vast conformational space of large biomolecules.This review focuses on recent studies that utilize ML-based techniques in the exploration of protein conformational landscape.We first highlight the recent development of ML-aided enhanced sampling methods,including heuristic algorithms and neural networks that are designed to refine the selection of reaction coordinates for the construction of bias potential,or facilitate the exploration of the unsampled region of the energy landscape.Further,we review the development of autoencoder based methods that combine molecular simulations and deep learning to expand the search for protein conformations.Lastly,we discuss the cutting-edge methodologies for the one-shot generation of protein conformations with precise Boltzmann weights.Collectively,this review demonstrates the promising potential of machine learning in revolutionizing our insight into the complex conformational ensembles of proteins.
文摘In recent years, the water content of oilfield production fluid is high and there is a large amount of oily sewage. In order to improve the capability of sewage treatment, usually using demulsifier for oily sewage processing. This article uses simulated water sample to test the treatment effect of the optimized reverse demulsifier at different oscillation time. As the increase of action time and oscillation, the average size of droplets increases and the amount of the droplets under 1 μm decreases.
基金NSAF(Grant No.U1530122)the Aeronautical Science Foundation of China(Grant No.ASFC-20170968002)the Fundamental Research Funds for the Central Universities of China(XMU,20720180072).
文摘In structural reliability analysis,simulation methods are widely used.The statistical characteristics of failure probability estimate of these methods have been well investigated.In this study,the sensitivities of the failure probability estimate and its statistical characteristics with regard to sample,called‘contribution indexes’,are proposed to measure the contribution of sample.The contribution indexes in four widely simulation methods,i.e.,Monte Carlo simulation(MCS),importance sampling(IS),line sampling(LS)and subset simulation(SS)are derived and analyzed.The proposed contribution indexes of sample can provide valuable information understanding the methods deeply,and enlighten potential improvement of methods.It is found that the main differences between these investigated methods lie in the contribution indexes of the safety samples,which are the main factors to the efficiency of the methods.Moreover,numerical examples are used to validate these findings.
基金the National Natural Science Foundation of China(69983 0 0 5 )
文摘Event correlation is one key technique in network fault management. For the event sample acquisition problem in event correlation, a novel approach is proposed to collect the samples by constructing network simulation platform. The platform designed can set kinds of network faults according to user's demand and generate a lot of network fault events, which will benefit the research on efficient event correlation techniques.
文摘Binary logistic regression models are commonly used to assess the association between outcomes and covariates. Many covariates are inherently continuous, and have a variety of distributions, including those that are heavily skewed to the left or right. Existing theoretical formulas, criteria, and simulation programs cannot accurately estimate the sample size and power of non-standard distributions. Therefore, we have developed a simulation program that uses Monte Carlo methods to estimate the exact power of a binary logistic regression model. This power calculation can be used for distributions of any shape and covariates of any type (continuous, ordinal, and nominal), and can account for nonlinear relationships between covariates and outcomes. For illustrative purposes, this simulation program is applied to real data obtained from a study on the influence of smoking on 90-day outcomes after acute atherothrombotic stroke. Our program is applicable to all effect sizes and makes it possible to apply various statistical methods, logistic regression and related simulations such as Bayesian inference with some modifications.
文摘Water conservation initiatives promote installation of water efficient and low-flow appliances in waste water collection systems. This has resulted in lower flow rates in those systems than the intended design loading, causing solid deposition and sedimentation in some areas. A joint UKWIR/EPSRC CASE grant (14440031) has funded the work described in this paper which investigates sedimentation and solid deposition in building drainage system pipes. The purpose of this paper is to detail the design, calibration and operation of a sediment dosing apparatus to simulate sedimentation rates and explore possible solutions to this issue with a full scale laboratory model based on real site data. The methodology adopted is an experimental approach, where tests have been conducted on the sediment dosing apparatus based on calculations and observations to determine an appropriate sediment dosing regime representative of typical systems. Further tests were conducted with the addition of everyday household products to investigate their effects on sedimentation. The results indicated that a suitable dosing rate was approximately 12% weight-to-volume (w/v) of a fine sand with a known particle size distribution, diluted 1:5 in a clean water base flow. It was also shown that the addition of the household products added to the problem of sedimentation within drainage systems. The results give excellent correlation to real site data, with deposition depth and distribution comparable to measured site data to within 10%. The deposition was achieved within three hours, which approximated six weeks deposition in the live site used in the study. This straightforward investigation details the design, construction and testing of a device to cause accelerated sedimentation in a full scale model of a building drainage system. This is the first step in the process of updating research underpinning our understanding of the behaviour of these systems under conditions of low flow rates caused by water conservation, sedimentation, and the use of common household additives. It will be used to improve simulation of water flow and solid transport in sediment-laden systems. Specifically, the results will be used to determine refinements required to a specific drainage simulation model (DRAINET), which currently has an unquantified sedimentation component. This work is part of a larger body of current research funded by two joint EPSRK/UKWIR grants.
文摘A composite random variable is a product (or sum of products) of statistically distributed quantities. Such a variable can represent the solution to a multi-factor quantitative problem submitted to a large, diverse, independent, anonymous group of non-expert respondents (the “crowd”). The objective of this research is to examine the statistical distribution of solutions from a large crowd to a quantitative problem involving image analysis and object counting. Theoretical analysis by the author, covering a range of conditions and types of factor variables, predicts that composite random variables are distributed log-normally to an excellent approximation. If the factors in a problem are themselves distributed log-normally, then their product is rigorously log-normal. A crowdsourcing experiment devised by the author and implemented with the assistance of a BBC (British Broadcasting Corporation) television show, yielded a sample of approximately 2000 responses consistent with a log-normal distribution. The sample mean was within ~12% of the true count. However, a Monte Carlo simulation (MCS) of the experiment, employing either normal or log-normal random variables as factors to model the processes by which a crowd of 1 million might arrive at their estimates, resulted in a visually perfect log-normal distribution with a mean response within ~5% of the true count. The results of this research suggest that a well-modeled MCS, by simulating a sample of responses from a large, rational, and incentivized crowd, can provide a more accurate solution to a quantitative problem than might be attainable by direct sampling of a smaller crowd or an uninformed crowd, irrespective of size, that guesses randomly.
文摘Fluorescence tomography can obtain a sufficient dataset and optimal three-dimensional imageswhen projections are captured over 360◦ by CCD camera. Herein a non-stop dynamic samplingmode for fluorescence tomography is proposed in an attempt to improve the optical measurementspeed of the traditional imaging system and stability of the object to be imaged. A series ofsimulations are carried out to evaluate the accuracy of dataset acquired from the dynamicsampling mode. Reconstruction with the corresponding data obtained in the dynamic-modeprocess is also performed with the phantom. The results demonstrate the feasibility of suchan imaging mode when the angular velocity is set to the appropriate value, thus laying thefoundation for real experiments to verify the superiority in performance of this new imagingmode over the traditional one.
基金The National Natural Science Foundation of China under contract No.31772852the Fundamental Research Funds for the Central Universities under contract No.201612004。
文摘This study used Ecopath model of the Jiaozhou Bay as an example to evaluate the effect of stomach sample size of three fish species on the projection of this model. The derived ecosystem indices were classified into three categories:(1) direct indices, like the trophic level of species, influenced by stomach sample size directly;(2)indirect indices, like ecology efficiency(EE) of invertebrates, influenced by the multiple prey-predator relationships;and(3) systemic indices, like total system throughout(TST), describing the status of the whole ecosystem. The influences of different stomach sample sizes on these indices were evaluated. The results suggest that systemic indices of the ecosystem model were robust to stomach sample sizes, whereas specific indices related to species were indicated to be with low accuracy and precision when stomach samples were insufficient.The indices became more uncertain when the stomach sample sizes varied for more species. This study enhances the understanding of how the quality of diet composition data influences ecosystem modeling outputs. The results can also guide the design of stomach content analysis for developing ecosystem models.
文摘An analytical solution is derived for the probability that a random pair of individuals from a panmictic population of size N will share ancestors who lived G generations previously. The analysis is extended to obtain 1) the probability that a sample of size s will contain at least one pair of (G - 1)<sup>th</sup> cousins;and 2) the expected number of pairs of (G - 1)<sup>th</sup> cousins in that sample. Solutions are given for both monogamous and promiscuous (non-monogamous) cases. Simulation results for a population size of N = 20,000 closely approximate the analytical expectations. Simulation results also agree very well with previously derived expectations for the proportion of unrelated individuals in a sample. The analysis is broadly consistent with genetic estimates of relatedness among a sample of 406 Danish school children, but suggests that a different genetic study of a heterogenous sample of Europeans overestimates the frequency of cousin pairs by as much as one order of magnitude.
文摘This paper aims to discuss how to effectively suppress intersymbol interference by optimizing the filter design, so as to achieve a distortion-free output effect, and effectively compensate the transmission characteristics of the baseband transmission system in a non-ideal channel environment, so as to minimize the impact of intersymbol crosser. The simulation experiment model of digital optimal baseband transmission and the overall structure of the system are designed based on the Matlab simulation platform, and the parameters of each module in the simulation experiment model are set. The working process and performance of the digital optimal baseband transmission system are simulated, and the conditions and performance of the digital optimal baseband transmission system are verified according to the simulation results.