Genome-wide epigenomic datasets allow us to validate the biological function of motifs and understand the regulatory mechanisms more comprehensively. How different motifs determine whether transcription factors (TFs) ...Genome-wide epigenomic datasets allow us to validate the biological function of motifs and understand the regulatory mechanisms more comprehensively. How different motifs determine whether transcription factors (TFs) can bind to DNA at a specific position is a critical research question. In this project, we apply computational techniques that were used in Natural Language Processing (NLP) to predict the Transcription Factor Bound Regions (TFBRs) given motif instances. Most existing motif prediction methods using deep neural network apply base sequences with one-hot encoding as an input feature to realize TFBRs identification, contributing to low-resolution and indirect binding mechanisms. However, how the collective effect of motifs on binding sites is complicated to figure out. In our pipeline, we apply Word2Vec algorithm, with names of motifs as an input to predict TFBRs utilizing Convolutional Neural Network (CNN) to realize binary classification, based on the ENCODE dataset. In this regard, we consider different types of motifs as separate “words”, and their corresponding TFBR as the meanings of “sentences”. One “sentence” itself is merely the combination of these motifs, and all “sentences” compose of the whole “passage”. For each binding site, we do the binary classification within different cell types to show the performance of our model in different binding sites and cell types. Each “word” has a corresponding vector in high dimensions, and the distances between each vector can be figured out, so we can extract the similarity between each motif, and the explicit binding mechanism from our model. We apply Convolutional Neural Network (CNN) to extract features in the process of mapping and pooling from motif vectors extracted by Word2Vec Algorithm and gain the result of 87% accuracy at the peak.展开更多
The mathematical model of the three-dimensional semiconductor devices of heat conduction is described by a system of four quasilinear partial differential equations for initial boundary value problem. One equation in ...The mathematical model of the three-dimensional semiconductor devices of heat conduction is described by a system of four quasilinear partial differential equations for initial boundary value problem. One equation in elliptic form is for the electric potential; two equations of convection-dominated diffusion type are for the electron and hole concentration; and one heat conduction equation is for temperature. Characteristic finite difference schemes for two kinds of boundary value problems are put forward. By using the thick and thin grids to form a complete set and treating the product threefold-quadratic interpolation, variable time step method with the boundary condition, calculus of variations and the theory of prior estimates and techniques, the optimal error estimates in L2 norm are derived in the approximate solutions.展开更多
Characteristic finite difference fractional step schemes are put forward. The electric potential equation is described by a seven-point finite difference scheme, and the electron and hole concentration equations are t...Characteristic finite difference fractional step schemes are put forward. The electric potential equation is described by a seven-point finite difference scheme, and the electron and hole concentration equations are treated by a kind of characteristic finite difference fractional step methods. The temperature equation is described by a fractional step method. Thick and thin grids are made use of to form a complete set. Piecewise threefold quadratic interpolation, symmetrical extension, calculus of variations, commutativity of operator product, decomposition of high order difference operators and prior estimates are also made use of. Optimal order estimates in l2 norm are derived to determine the error of the approximate solution. The well-known problem is thorongley and completely solred.展开更多
A 2-dimensional, multicomponent, multiphase, and incompressible compositional reservoir simulator has been developed and applied to chemical flooding (surfactants, alcohol and polymers) and convergence analysis. The c...A 2-dimensional, multicomponent, multiphase, and incompressible compositional reservoir simulator has been developed and applied to chemical flooding (surfactants, alcohol and polymers) and convergence analysis. The characteristic finite difference methods for 2-dimensional enhanced oil recovery can be described as a coupled system of nonlinear partial differential equations. For a generic case of the cross interference and bounded region, we put forward a kind of characteristic finite difference schemes and make use of thick and thin grids to form a complete set, and of calculus of variations, the theory of prior estimates and techniques. Optimal order estimates in L^2 norm are derived for the error in the approximate solutions. Thus we have thoroughly solved the well-known theoretical problem proposed by a famous scientist, J. Douglas, Jr.展开更多
文摘Genome-wide epigenomic datasets allow us to validate the biological function of motifs and understand the regulatory mechanisms more comprehensively. How different motifs determine whether transcription factors (TFs) can bind to DNA at a specific position is a critical research question. In this project, we apply computational techniques that were used in Natural Language Processing (NLP) to predict the Transcription Factor Bound Regions (TFBRs) given motif instances. Most existing motif prediction methods using deep neural network apply base sequences with one-hot encoding as an input feature to realize TFBRs identification, contributing to low-resolution and indirect binding mechanisms. However, how the collective effect of motifs on binding sites is complicated to figure out. In our pipeline, we apply Word2Vec algorithm, with names of motifs as an input to predict TFBRs utilizing Convolutional Neural Network (CNN) to realize binary classification, based on the ENCODE dataset. In this regard, we consider different types of motifs as separate “words”, and their corresponding TFBR as the meanings of “sentences”. One “sentence” itself is merely the combination of these motifs, and all “sentences” compose of the whole “passage”. For each binding site, we do the binary classification within different cell types to show the performance of our model in different binding sites and cell types. Each “word” has a corresponding vector in high dimensions, and the distances between each vector can be figured out, so we can extract the similarity between each motif, and the explicit binding mechanism from our model. We apply Convolutional Neural Network (CNN) to extract features in the process of mapping and pooling from motif vectors extracted by Word2Vec Algorithm and gain the result of 87% accuracy at the peak.
基金Project supported by the National Scaling Program,the National Eighth-Five Year Tackling Key Problems Program and the Doctoral Found of the National Education Commission.
文摘The mathematical model of the three-dimensional semiconductor devices of heat conduction is described by a system of four quasilinear partial differential equations for initial boundary value problem. One equation in elliptic form is for the electric potential; two equations of convection-dominated diffusion type are for the electron and hole concentration; and one heat conduction equation is for temperature. Characteristic finite difference schemes for two kinds of boundary value problems are put forward. By using the thick and thin grids to form a complete set and treating the product threefold-quadratic interpolation, variable time step method with the boundary condition, calculus of variations and the theory of prior estimates and techniques, the optimal error estimates in L2 norm are derived in the approximate solutions.
基金This work is supported by the Major State Basic Research Program of China (19990328), the National Tackling Key Problem Program, the National Science Foundation of China (10271066 and 0372052), and the Doctorate Foundation of the Ministry of Education of China (20030422047).
文摘Characteristic finite difference fractional step schemes are put forward. The electric potential equation is described by a seven-point finite difference scheme, and the electron and hole concentration equations are treated by a kind of characteristic finite difference fractional step methods. The temperature equation is described by a fractional step method. Thick and thin grids are made use of to form a complete set. Piecewise threefold quadratic interpolation, symmetrical extension, calculus of variations, commutativity of operator product, decomposition of high order difference operators and prior estimates are also made use of. Optimal order estimates in l2 norm are derived to determine the error of the approximate solution. The well-known problem is thorongley and completely solred.
基金Project supported by the National Scaling Program and the National Eighth-Five-Year Tackling Key Problems Program
文摘A 2-dimensional, multicomponent, multiphase, and incompressible compositional reservoir simulator has been developed and applied to chemical flooding (surfactants, alcohol and polymers) and convergence analysis. The characteristic finite difference methods for 2-dimensional enhanced oil recovery can be described as a coupled system of nonlinear partial differential equations. For a generic case of the cross interference and bounded region, we put forward a kind of characteristic finite difference schemes and make use of thick and thin grids to form a complete set, and of calculus of variations, the theory of prior estimates and techniques. Optimal order estimates in L^2 norm are derived for the error in the approximate solutions. Thus we have thoroughly solved the well-known theoretical problem proposed by a famous scientist, J. Douglas, Jr.