Cryo-electron microscopy(cryo-EM) provides a powerful tool to resolve the structure of biological macromolecules in natural state. One advantage of cryo-EM technology is that different conformation states of a protein...Cryo-electron microscopy(cryo-EM) provides a powerful tool to resolve the structure of biological macromolecules in natural state. One advantage of cryo-EM technology is that different conformation states of a protein complex structure can be simultaneously built, and the distribution of different states can be measured. This provides a tool to push cryo-EM technology beyond just to resolve protein structures, but to obtain the thermodynamic properties of protein machines. Here, we used a deep manifold learning framework to get the conformational landscape of Kai C proteins, and further obtained the thermodynamic properties of this central oscillator component in the circadian clock by means of statistical physics.展开更多
Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating du...Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating due to the small size of datasets while mapping the relative importance of properties to the model response.This paper proposes an augmented Bayesian multi-model inference(BMMI)coupled with GSA methodology(BMMI-GSA)to address this issue by estimating the imprecision in the momentindependent sensitivity indices of rock structures arising from the small size of input data.The methodology employs BMMI to quantify the epistemic uncertainties associated with model type and parameters of input properties.The estimated uncertainties are propagated in estimating imprecision in moment-independent Borgonovo’s indices by employing a reweighting approach on candidate probabilistic models.The proposed methodology is showcased for a rock slope prone to stress-controlled failure in the Himalayan region of India.The proposed methodology was superior to the conventional GSA(neglects all epistemic uncertainties)and Bayesian coupled GSA(B-GSA)(neglects model uncertainty)due to its capability to incorporate the uncertainties in both model type and parameters of properties.Imprecise Borgonovo’s indices estimated via proposed methodology provide the confidence intervals of the sensitivity indices instead of their fixed-point estimates,which makes the user more informed in the data collection efforts.Analyses performed with the varying sample sizes suggested that the uncertainties in sensitivity indices reduce significantly with the increasing sample sizes.The accurate importance ranking of properties was only possible via samples of large sizes.Further,the impact of the prior knowledge in terms of prior ranges and distributions was significant;hence,any related assumption should be made carefully.展开更多
Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities tu...Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities turn up dangerous contaminants in our surroundings. This study investigated two years’ worth of air quality and outlier detection data from two Indian cities. Studies on air pollution have used numerous types of methodologies, with various gases being seen as a vector whose components include gas concentration values for each observation per-formed. We use curves to represent the monthly average of daily gas emissions in our technique. The approach, which is based on functional depth, was used to find outliers in the city of Delhi and Kolkata’s gas emissions, and the outcomes were compared to those from the traditional method. In the evaluation and comparison of these models’ performances, the functional approach model studied well.展开更多
The aim of this study is to establish the prevailing conditions of changing climatic trends and change point dates in four selected meteorological stations of Uyo, Benin, Port Harcourt, and Warri in the Niger Delta re...The aim of this study is to establish the prevailing conditions of changing climatic trends and change point dates in four selected meteorological stations of Uyo, Benin, Port Harcourt, and Warri in the Niger Delta region of Nigeria. Using daily or 24-hourly annual maximum series (AMS) data with the Indian Meteorological Department (IMD) and the modified Chowdury Indian Meteorological Department (MCIMD) models were adopted to downscale the time series data. Mann-Kendall (MK) trend and Sen’s Slope Estimator (SSE) test showed a statistically significant trend for Uyo and Benin, while Port Harcourt and Warri showed mild trends. The Sen’s Slope magnitude and variation rate were 21.6, 10.8, 6.00 and 4.4 mm/decade, respectively. The trend change-point analysis showed the initial rainfall change-point dates as 2002, 2005, 1988, and 2000 for Uyo, Benin, Port Harcourt, and Warri, respectively. These prove positive changing climatic conditions for rainfall in the study area. Erosion and flood control facilities analysis and design in the Niger Delta will require the application of Non-stationary IDF modelling.展开更多
Noise is a significant part within a millimeter-wave molecular line datacube.Analyzing the noise improves our understanding of noise characteristics,and further contributes to scientific discoveries.We measure the noi...Noise is a significant part within a millimeter-wave molecular line datacube.Analyzing the noise improves our understanding of noise characteristics,and further contributes to scientific discoveries.We measure the noise level of a single datacube from MWISP and perform statistical analyses.We identified major factors which increase the noise level of a single datacube,including bad channels,edge effects,baseline distortion and line contamination.Cleaning algorithms are applied to remove or reduce these noise components.As a result,we obtained the cleaned datacube in which noise follows a positively skewed normal distribution.We further analyzed the noise structure distribution of a 3 D mosaicked datacube in the range l=40°7 to 43°3 and b=-2°3 to 0°3 and found that noise in the final mosaicked datacube is mainly characterized by noise fluctuation among the cells.展开更多
This paper analyzes the application value of statistical analysis method of big data in economic management from the macro and micro perspectives,and analyzes its specific application from three aspects such as econom...This paper analyzes the application value of statistical analysis method of big data in economic management from the macro and micro perspectives,and analyzes its specific application from three aspects such as economic trends,industrial operations and marketing strategies.展开更多
There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal compo...There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal component analysis, cluster analysis, factor analysis, pattern analysis, discriminant analysis, multivariate analysis of variance, correspondence analysis, canonical correlation analysis, redundancy analysis, genetic diversity analysis, and stability analysis, which involve in joint regression, additive main effects and multiplicative interactions, and genotype-by-environment interaction biplot analysis. The advanced statistical tools, such as non-parametric analysis of disease association, meta-analysis, Bayesian analysis, and decision theory, take an important place in analysis of disease dynamics. Disease forecasting methods by simulation models for plant diseases have a great potentiality in practical disease control strategies. Common mathematical tools such as monomolecular, exponential, logistic, Gompertz and linked differential equations take an important place in growth curve analysis of disease epidemics. The highly informative means of displaying a range of numerical data through construction of box and whisker plots has been suggested. The probable applications of recent advanced tools of linear and non-linear mixed models like the linear mixed model, generalized linear model, and generalized linear mixed models have been presented. The most recent technologies such as micro-array analysis, though cost effective, provide estimates of gene expressions for thousands of genes simultaneously and need attention by the molecular biologists. Some of these advanced tools can be well applied in different branches of rice research, including crop improvement, crop production, crop protection, social sciences as well as agricultural engineering. The rice research scientists should take advantage of these new opportunities adequately in adoption of the new highly potential advanced technologies while planning experimental designs, data collection, analysis and interpretation of their research data sets.展开更多
A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general...A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Statistical coupling analysis (SCA) is a statistical technique that uses evolutionary data of a protein family to measure correlation between distant functional sites and suggests allosteric communication. In proteins, very distant and small interactions between collections of amino acids provide the communication which can be important for signaling process. In this paper, we present the SCA of protein alignment of the esterase family (pfam ID: PF00756) containing the sequence of antigen 85C secreted by Mycobacterium tuberculosis to identify a subset of interacting residues. Clustering analysis of the pairwise correlation highlighted seven important residue positions in the esterase family alignments. These resi-dues were then mapped on the crystal structure of antigen 85C (PDB ID: 1DQZ). The mapping revealed corre-lation between 3 distant residues (Asp38, Leu123 and Met125) and suggests allosteric communication between them. This information can be used for a new drug against this fatal disease.展开更多
The extraction of high-temperature regions in active regions(ARs)is an important means to help understand the mechanism of coronal heating.The important observational means of high-temperature radiation in ARs is the ...The extraction of high-temperature regions in active regions(ARs)is an important means to help understand the mechanism of coronal heating.The important observational means of high-temperature radiation in ARs is the main emission line of Fe XVⅢin the 94?of the Atmospheric Imaging Assembly.However,the diagnostic algorithms for Fe XVⅢ,including the differential emission measure(DEM)and linear diagnostics proposed by Del based on the DEM,have been greatly limited for a long time,and the results obtained are different from the predictions.In this paper,we use the outlier detection method to establish the nonlinear correlation between 94?and 171,193,211?based on the former researches by others.A neural network based on 171,193,211?is constructed to replace the low-temperature emission lines in the ARs of 94?.The predicted results are regarded as the low-temperature components of 94?,and then the predicted results are subtracted from 94?to obtain the outlier component of 94?,or Fe XVⅢ.Then,the outlier components obtained by neural network are compared with the Fe XVⅢobtained by DEM and Del's method,and a high similarity is found,which proves the reliability of neural network to obtain the high-temperature components of ARs,but there are still many differences.In order to analyze the differences between the Fe XVⅢobtained by the three methods,we subtract the Fe XVⅢobtained by the DEM and Del's method from the Fe XVⅢobtained by the neural network to obtain the residual value,and compare it with the results of Fe XIV in the temperature range of 6.1-6.45 MK.It is found that there is a great similarity,which also shows that the Fe XVⅢobtained by DEM and Del's method still has a large low-temperature component dominated by Fe XIV,and the Fe XVⅢobtained by neural network is relatively pure.展开更多
It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when th...It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when the covariates of the nonparametric component are functional,the robust estimates for the regression parameter and regression operator are introduced.The main propose of the paper is to consider data-driven methods of selecting the number of neighbors in order to make the proposed processes fully automatic.We use thek Nearest Neighbors procedure(kNN)to construct the kernel estimator of the proposed robust model.Under some regularity conditions,we state consistency results for kNN functional estimators,which are uniform in the number of neighbors(UINN).Furthermore,a simulation study and an empirical application to a real data analysis of octane gasoline predictions are carried out to illustrate the higher predictive performances and the usefulness of the kNN approach.展开更多
In atmospheric data assimilation systems, the forecast error covariance model is an important component. However, the paralneters required by a forecast error covariance model are difficult to obtain due to the absenc...In atmospheric data assimilation systems, the forecast error covariance model is an important component. However, the paralneters required by a forecast error covariance model are difficult to obtain due to the absence of the truth. This study applies an error statistics estimation method to the Pfiysical-space Statistical Analysis System (PSAS) height-wind forecast error covariance model. This method consists of two components: the first component computes the error statistics by using the National Meteorological Center (NMC) method, which is a lagged-forecast difference approach, within the framework of the PSAS height-wind forecast error covariance model; the second obtains a calibration formula to rescale the error standard deviations provided by the NMC method. The calibration is against the error statistics estimated by using a maximum-likelihood estimation (MLE) with rawindsonde height observed-minus-forecast residuals. A complete set of formulas for estimating the error statistics and for the calibration is applied to a one-month-long dataset generated by a general circulation model of the Global Model and Assimilation Office (GMAO), NASA. There is a clear constant relationship between the error statistics estimates of the NMC-method and MLE. The final product provides a full set of 6-hour error statistics required by the PSAS height-wind forecast error covariance model over the globe. The features of these error statistics are examined and discussed.展开更多
The rejected specimens from the Emergency Department of the Center of Clinical Laboratory from January 1,2022 to January 1,2023 were analyzed to reduce the specimen rejection rates and to improve the quality of inspec...The rejected specimens from the Emergency Department of the Center of Clinical Laboratory from January 1,2022 to January 1,2023 were analyzed to reduce the specimen rejection rates and to improve the quality of inspection.The results showed that there were 1488 samples of rejected specimens and the non-conforming rate was 0.58%.The departments involved were mainly the Emergency Department,the Hematology Department,the Cardiology Department,the Intensive Care Department,and the Brain Surgery Department.Among the reasons for rejection,blood hemolysis accounted for 43.15%,blood coagulation accounted for 26.61%,and the rate of insufficient specimens was 17.14%.Among them,the sample rejection rate for arterial blood gas analysis was the highest,which accounted for 1.74%;followed by specimens for coagulation test,which was 1.18%.These results indicate the main reason for producing rejected specimens is mainly due to not following the standard operating procedure.Specimen rejection can largely be avoided if the standards for specimen collection are strictly followed.展开更多
Some geophysical parameters, such as those related to gravitation and the geomagnetic field, could change during solar eclipses. In order to observe geomagnetic fluctuations, geomagnetic measurements were carded out i...Some geophysical parameters, such as those related to gravitation and the geomagnetic field, could change during solar eclipses. In order to observe geomagnetic fluctuations, geomagnetic measurements were carded out in a limited time frame during the partial solar eclipse that occurred on 2011 January 4 and was observed in Canakkale and Ankara, Turkey. Additionally, records of the geomagnetic field spanning 24 hours, obtained from another observatory (in Iznik, Turkey), were also analyzed to check for any peculiar variations. In the data processing stage, a polynomial fit, following the application of a running average routine, was applied to the geomagnetic field data sets. Geomagnetic field data sets indicated there was a characteristic decrease at the beginning of the solar eclipse and this decrease can be well-correlated with previous geomagnetic field measurements that were taken during the total solar eclipse that was observed in Turkey on 2006 March 29. The behavior of the geomagnetic field is also consistent with previous observations in the literature. As a result of these analyses, it can be suggested that eclipses can cause a shielding effect on the geomagnetic field of the Earth.展开更多
Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at t...Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at the sampled point in the roadway's roof,and then,how the statistical processing of the available geomechanical data can affect the results of numerical modelling of the roadway's stability.Four cases were applied in the numerical analysis,using average values(the most common in geomechanical data analysis),average minus standard deviation,median,and average value minus statistical error.The study show that different approach to the same geomechanical data set can change the modelling results considerably.The case shows that average minus standard deviation is the most conservative and least risky.It gives the displacements and yielded elements zone in four times broader range comparing to the average values scenario,which is the least conservative option.The two other cases need to be studied further.The results obtained from them are placed between most favorable and most adverse values.Taking the average values corrected by statistical error for the numerical analysis seems to be the best solution.Moreover,the confidence level can be adjusted depending on the object importance and the assumed risk level.展开更多
The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge ...The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.展开更多
We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction...We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.展开更多
Mediterranean anemia is a genetic disease that currently relies heavily on expert clinical experience to determine whether patients are affected. This method is overly reliant on expert experience and is not precise e...Mediterranean anemia is a genetic disease that currently relies heavily on expert clinical experience to determine whether patients are affected. This method is overly reliant on expert experience and is not precise enough. This paper proposes two modeling methods to predict whether patients have Mediterranean anemia. The first method involves using Principal Component Analysis (PCA) to reduce the dimensionality of the data, followed by logistic regression modeling (PCA-LR) on the reduced dataset. The second method involves building a Partial Least Squares Regression (PLS) model. Experimental results show that the prediction accuracy of the PCA-LR model is 87.5% (degree = 2, λ=4), and the prediction accuracy of the PLS model is 92.5% (ncomp = 4), indicating good predictive performance of the models.展开更多
This paper focuses on how to extract physically meaningful information from climate data,with emphases placed on adaptive and local analysis. It is argued that many traditional statistical analysis methods with rigoro...This paper focuses on how to extract physically meaningful information from climate data,with emphases placed on adaptive and local analysis. It is argued that many traditional statistical analysis methods with rigorous mathematical footing may not be efficient in extracting essential physical information from climate data;rather,adaptive and local analysis methods that agree well with fundamental physical principles are more capable of capturing key information of climate data. To illustrate the improved power of adaptive and local analysis of climate data,we also introduce briefly the empirical mode decomposition and its later developments.展开更多
基金supported by the National Natural Science Foundation of China (Grant No. 12090054)。
文摘Cryo-electron microscopy(cryo-EM) provides a powerful tool to resolve the structure of biological macromolecules in natural state. One advantage of cryo-EM technology is that different conformation states of a protein complex structure can be simultaneously built, and the distribution of different states can be measured. This provides a tool to push cryo-EM technology beyond just to resolve protein structures, but to obtain the thermodynamic properties of protein machines. Here, we used a deep manifold learning framework to get the conformational landscape of Kai C proteins, and further obtained the thermodynamic properties of this central oscillator component in the circadian clock by means of statistical physics.
文摘Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating due to the small size of datasets while mapping the relative importance of properties to the model response.This paper proposes an augmented Bayesian multi-model inference(BMMI)coupled with GSA methodology(BMMI-GSA)to address this issue by estimating the imprecision in the momentindependent sensitivity indices of rock structures arising from the small size of input data.The methodology employs BMMI to quantify the epistemic uncertainties associated with model type and parameters of input properties.The estimated uncertainties are propagated in estimating imprecision in moment-independent Borgonovo’s indices by employing a reweighting approach on candidate probabilistic models.The proposed methodology is showcased for a rock slope prone to stress-controlled failure in the Himalayan region of India.The proposed methodology was superior to the conventional GSA(neglects all epistemic uncertainties)and Bayesian coupled GSA(B-GSA)(neglects model uncertainty)due to its capability to incorporate the uncertainties in both model type and parameters of properties.Imprecise Borgonovo’s indices estimated via proposed methodology provide the confidence intervals of the sensitivity indices instead of their fixed-point estimates,which makes the user more informed in the data collection efforts.Analyses performed with the varying sample sizes suggested that the uncertainties in sensitivity indices reduce significantly with the increasing sample sizes.The accurate importance ranking of properties was only possible via samples of large sizes.Further,the impact of the prior knowledge in terms of prior ranges and distributions was significant;hence,any related assumption should be made carefully.
文摘Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities turn up dangerous contaminants in our surroundings. This study investigated two years’ worth of air quality and outlier detection data from two Indian cities. Studies on air pollution have used numerous types of methodologies, with various gases being seen as a vector whose components include gas concentration values for each observation per-formed. We use curves to represent the monthly average of daily gas emissions in our technique. The approach, which is based on functional depth, was used to find outliers in the city of Delhi and Kolkata’s gas emissions, and the outcomes were compared to those from the traditional method. In the evaluation and comparison of these models’ performances, the functional approach model studied well.
文摘The aim of this study is to establish the prevailing conditions of changing climatic trends and change point dates in four selected meteorological stations of Uyo, Benin, Port Harcourt, and Warri in the Niger Delta region of Nigeria. Using daily or 24-hourly annual maximum series (AMS) data with the Indian Meteorological Department (IMD) and the modified Chowdury Indian Meteorological Department (MCIMD) models were adopted to downscale the time series data. Mann-Kendall (MK) trend and Sen’s Slope Estimator (SSE) test showed a statistically significant trend for Uyo and Benin, while Port Harcourt and Warri showed mild trends. The Sen’s Slope magnitude and variation rate were 21.6, 10.8, 6.00 and 4.4 mm/decade, respectively. The trend change-point analysis showed the initial rainfall change-point dates as 2002, 2005, 1988, and 2000 for Uyo, Benin, Port Harcourt, and Warri, respectively. These prove positive changing climatic conditions for rainfall in the study area. Erosion and flood control facilities analysis and design in the Niger Delta will require the application of Non-stationary IDF modelling.
基金supported by the National Key R&D Program of China(2017YFA0402701)Key Research Program of Frontier Sciences of CAS(QYZDJ-SSW-SLH047)partially supported by the National Natural Science Foundation of China(Grant No.U2031202)。
文摘Noise is a significant part within a millimeter-wave molecular line datacube.Analyzing the noise improves our understanding of noise characteristics,and further contributes to scientific discoveries.We measure the noise level of a single datacube from MWISP and perform statistical analyses.We identified major factors which increase the noise level of a single datacube,including bad channels,edge effects,baseline distortion and line contamination.Cleaning algorithms are applied to remove or reduce these noise components.As a result,we obtained the cleaned datacube in which noise follows a positively skewed normal distribution.We further analyzed the noise structure distribution of a 3 D mosaicked datacube in the range l=40°7 to 43°3 and b=-2°3 to 0°3 and found that noise in the final mosaicked datacube is mainly characterized by noise fluctuation among the cells.
文摘This paper analyzes the application value of statistical analysis method of big data in economic management from the macro and micro perspectives,and analyzes its specific application from three aspects such as economic trends,industrial operations and marketing strategies.
文摘There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal component analysis, cluster analysis, factor analysis, pattern analysis, discriminant analysis, multivariate analysis of variance, correspondence analysis, canonical correlation analysis, redundancy analysis, genetic diversity analysis, and stability analysis, which involve in joint regression, additive main effects and multiplicative interactions, and genotype-by-environment interaction biplot analysis. The advanced statistical tools, such as non-parametric analysis of disease association, meta-analysis, Bayesian analysis, and decision theory, take an important place in analysis of disease dynamics. Disease forecasting methods by simulation models for plant diseases have a great potentiality in practical disease control strategies. Common mathematical tools such as monomolecular, exponential, logistic, Gompertz and linked differential equations take an important place in growth curve analysis of disease epidemics. The highly informative means of displaying a range of numerical data through construction of box and whisker plots has been suggested. The probable applications of recent advanced tools of linear and non-linear mixed models like the linear mixed model, generalized linear model, and generalized linear mixed models have been presented. The most recent technologies such as micro-array analysis, though cost effective, provide estimates of gene expressions for thousands of genes simultaneously and need attention by the molecular biologists. Some of these advanced tools can be well applied in different branches of rice research, including crop improvement, crop production, crop protection, social sciences as well as agricultural engineering. The rice research scientists should take advantage of these new opportunities adequately in adoption of the new highly potential advanced technologies while planning experimental designs, data collection, analysis and interpretation of their research data sets.
文摘A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Statistical coupling analysis (SCA) is a statistical technique that uses evolutionary data of a protein family to measure correlation between distant functional sites and suggests allosteric communication. In proteins, very distant and small interactions between collections of amino acids provide the communication which can be important for signaling process. In this paper, we present the SCA of protein alignment of the esterase family (pfam ID: PF00756) containing the sequence of antigen 85C secreted by Mycobacterium tuberculosis to identify a subset of interacting residues. Clustering analysis of the pairwise correlation highlighted seven important residue positions in the esterase family alignments. These resi-dues were then mapped on the crystal structure of antigen 85C (PDB ID: 1DQZ). The mapping revealed corre-lation between 3 distant residues (Asp38, Leu123 and Met125) and suggests allosteric communication between them. This information can be used for a new drug against this fatal disease.
基金supported by the National Natural Science Foundation of China under Grant Nos.U2031140,11873027,and 12073077。
文摘The extraction of high-temperature regions in active regions(ARs)is an important means to help understand the mechanism of coronal heating.The important observational means of high-temperature radiation in ARs is the main emission line of Fe XVⅢin the 94?of the Atmospheric Imaging Assembly.However,the diagnostic algorithms for Fe XVⅢ,including the differential emission measure(DEM)and linear diagnostics proposed by Del based on the DEM,have been greatly limited for a long time,and the results obtained are different from the predictions.In this paper,we use the outlier detection method to establish the nonlinear correlation between 94?and 171,193,211?based on the former researches by others.A neural network based on 171,193,211?is constructed to replace the low-temperature emission lines in the ARs of 94?.The predicted results are regarded as the low-temperature components of 94?,and then the predicted results are subtracted from 94?to obtain the outlier component of 94?,or Fe XVⅢ.Then,the outlier components obtained by neural network are compared with the Fe XVⅢobtained by DEM and Del's method,and a high similarity is found,which proves the reliability of neural network to obtain the high-temperature components of ARs,but there are still many differences.In order to analyze the differences between the Fe XVⅢobtained by the three methods,we subtract the Fe XVⅢobtained by the DEM and Del's method from the Fe XVⅢobtained by the neural network to obtain the residual value,and compare it with the results of Fe XIV in the temperature range of 6.1-6.45 MK.It is found that there is a great similarity,which also shows that the Fe XVⅢobtained by DEM and Del's method still has a large low-temperature component dominated by Fe XIV,and the Fe XVⅢobtained by neural network is relatively pure.
文摘It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when the covariates of the nonparametric component are functional,the robust estimates for the regression parameter and regression operator are introduced.The main propose of the paper is to consider data-driven methods of selecting the number of neighbors in order to make the proposed processes fully automatic.We use thek Nearest Neighbors procedure(kNN)to construct the kernel estimator of the proposed robust model.Under some regularity conditions,we state consistency results for kNN functional estimators,which are uniform in the number of neighbors(UINN).Furthermore,a simulation study and an empirical application to a real data analysis of octane gasoline predictions are carried out to illustrate the higher predictive performances and the usefulness of the kNN approach.
文摘In atmospheric data assimilation systems, the forecast error covariance model is an important component. However, the paralneters required by a forecast error covariance model are difficult to obtain due to the absence of the truth. This study applies an error statistics estimation method to the Pfiysical-space Statistical Analysis System (PSAS) height-wind forecast error covariance model. This method consists of two components: the first component computes the error statistics by using the National Meteorological Center (NMC) method, which is a lagged-forecast difference approach, within the framework of the PSAS height-wind forecast error covariance model; the second obtains a calibration formula to rescale the error standard deviations provided by the NMC method. The calibration is against the error statistics estimated by using a maximum-likelihood estimation (MLE) with rawindsonde height observed-minus-forecast residuals. A complete set of formulas for estimating the error statistics and for the calibration is applied to a one-month-long dataset generated by a general circulation model of the Global Model and Assimilation Office (GMAO), NASA. There is a clear constant relationship between the error statistics estimates of the NMC-method and MLE. The final product provides a full set of 6-hour error statistics required by the PSAS height-wind forecast error covariance model over the globe. The features of these error statistics are examined and discussed.
文摘The rejected specimens from the Emergency Department of the Center of Clinical Laboratory from January 1,2022 to January 1,2023 were analyzed to reduce the specimen rejection rates and to improve the quality of inspection.The results showed that there were 1488 samples of rejected specimens and the non-conforming rate was 0.58%.The departments involved were mainly the Emergency Department,the Hematology Department,the Cardiology Department,the Intensive Care Department,and the Brain Surgery Department.Among the reasons for rejection,blood hemolysis accounted for 43.15%,blood coagulation accounted for 26.61%,and the rate of insufficient specimens was 17.14%.Among them,the sample rejection rate for arterial blood gas analysis was the highest,which accounted for 1.74%;followed by specimens for coagulation test,which was 1.18%.These results indicate the main reason for producing rejected specimens is mainly due to not following the standard operating procedure.Specimen rejection can largely be avoided if the standards for specimen collection are strictly followed.
文摘Some geophysical parameters, such as those related to gravitation and the geomagnetic field, could change during solar eclipses. In order to observe geomagnetic fluctuations, geomagnetic measurements were carded out in a limited time frame during the partial solar eclipse that occurred on 2011 January 4 and was observed in Canakkale and Ankara, Turkey. Additionally, records of the geomagnetic field spanning 24 hours, obtained from another observatory (in Iznik, Turkey), were also analyzed to check for any peculiar variations. In the data processing stage, a polynomial fit, following the application of a running average routine, was applied to the geomagnetic field data sets. Geomagnetic field data sets indicated there was a characteristic decrease at the beginning of the solar eclipse and this decrease can be well-correlated with previous geomagnetic field measurements that were taken during the total solar eclipse that was observed in Turkey on 2006 March 29. The behavior of the geomagnetic field is also consistent with previous observations in the literature. As a result of these analyses, it can be suggested that eclipses can cause a shielding effect on the geomagnetic field of the Earth.
文摘Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at the sampled point in the roadway's roof,and then,how the statistical processing of the available geomechanical data can affect the results of numerical modelling of the roadway's stability.Four cases were applied in the numerical analysis,using average values(the most common in geomechanical data analysis),average minus standard deviation,median,and average value minus statistical error.The study show that different approach to the same geomechanical data set can change the modelling results considerably.The case shows that average minus standard deviation is the most conservative and least risky.It gives the displacements and yielded elements zone in four times broader range comparing to the average values scenario,which is the least conservative option.The two other cases need to be studied further.The results obtained from them are placed between most favorable and most adverse values.Taking the average values corrected by statistical error for the numerical analysis seems to be the best solution.Moreover,the confidence level can be adjusted depending on the object importance and the assumed risk level.
文摘The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.
文摘We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.
文摘Mediterranean anemia is a genetic disease that currently relies heavily on expert clinical experience to determine whether patients are affected. This method is overly reliant on expert experience and is not precise enough. This paper proposes two modeling methods to predict whether patients have Mediterranean anemia. The first method involves using Principal Component Analysis (PCA) to reduce the dimensionality of the data, followed by logistic regression modeling (PCA-LR) on the reduced dataset. The second method involves building a Partial Least Squares Regression (PLS) model. Experimental results show that the prediction accuracy of the PCA-LR model is 87.5% (degree = 2, λ=4), and the prediction accuracy of the PLS model is 92.5% (ncomp = 4), indicating good predictive performance of the models.
基金US National Science Foundation Grant(No.AGS-1139479)
文摘This paper focuses on how to extract physically meaningful information from climate data,with emphases placed on adaptive and local analysis. It is argued that many traditional statistical analysis methods with rigorous mathematical footing may not be efficient in extracting essential physical information from climate data;rather,adaptive and local analysis methods that agree well with fundamental physical principles are more capable of capturing key information of climate data. To illustrate the improved power of adaptive and local analysis of climate data,we also introduce briefly the empirical mode decomposition and its later developments.