We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction...We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.展开更多
The behavior of the Kozachenko–Leonenko estimates for the(differential) Shannon entropy is studied when the number of i.i.d. vector-valued observations tends to infinity. The asymptotic unbiasedness and L^2-consisten...The behavior of the Kozachenko–Leonenko estimates for the(differential) Shannon entropy is studied when the number of i.i.d. vector-valued observations tends to infinity. The asymptotic unbiasedness and L^2-consistency of the estimates are established. The conditions employed involve the analogues of the Hardy–Littlewood maximal function. It is shown that the results are valid in particular for the entropy estimation of any nondegenerate Gaussian vector.展开更多
文摘We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.
基金Supported by the Russian Science Foundation(Grant No.14-21-00162)
文摘The behavior of the Kozachenko–Leonenko estimates for the(differential) Shannon entropy is studied when the number of i.i.d. vector-valued observations tends to infinity. The asymptotic unbiasedness and L^2-consistency of the estimates are established. The conditions employed involve the analogues of the Hardy–Littlewood maximal function. It is shown that the results are valid in particular for the entropy estimation of any nondegenerate Gaussian vector.