Extracting and parameterizing ionospheric waves globally and statistically is a longstanding problem. Based on the multichannel maximum entropy method(MMEM) used for studying ionospheric waves by previous work, we c...Extracting and parameterizing ionospheric waves globally and statistically is a longstanding problem. Based on the multichannel maximum entropy method(MMEM) used for studying ionospheric waves by previous work, we calculate the parameters of ionospheric waves by applying the MMEM to numerously temporally approximate and spatially close global-positioning-system radio occultation total electron content profile triples provided by the unique clustered satellites flight between years 2006 and 2007 right after the constellation observing system for meteorology, ionosphere, and climate(COSMIC) mission launch. The results show that the amplitude of ionospheric waves increases at the low and high latitudes(~0.15 TECU) and decreases in the mid-latitudes(~0.05 TECU). The vertical wavelength of the ionospheric waves increases in the mid-latitudes(e.g., ~50 km at altitudes of 200–250 km) and decreases at the low and high latitudes(e.g., ~35 km at altitudes of 200–250 km).The horizontal wavelength shows a similar result(e.g., ~1400 km in the mid-latitudes and ~800 km at the low and high latitudes).展开更多
There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal compo...There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal component analysis, cluster analysis, factor analysis, pattern analysis, discriminant analysis, multivariate analysis of variance, correspondence analysis, canonical correlation analysis, redundancy analysis, genetic diversity analysis, and stability analysis, which involve in joint regression, additive main effects and multiplicative interactions, and genotype-by-environment interaction biplot analysis. The advanced statistical tools, such as non-parametric analysis of disease association, meta-analysis, Bayesian analysis, and decision theory, take an important place in analysis of disease dynamics. Disease forecasting methods by simulation models for plant diseases have a great potentiality in practical disease control strategies. Common mathematical tools such as monomolecular, exponential, logistic, Gompertz and linked differential equations take an important place in growth curve analysis of disease epidemics. The highly informative means of displaying a range of numerical data through construction of box and whisker plots has been suggested. The probable applications of recent advanced tools of linear and non-linear mixed models like the linear mixed model, generalized linear model, and generalized linear mixed models have been presented. The most recent technologies such as micro-array analysis, though cost effective, provide estimates of gene expressions for thousands of genes simultaneously and need attention by the molecular biologists. Some of these advanced tools can be well applied in different branches of rice research, including crop improvement, crop production, crop protection, social sciences as well as agricultural engineering. The rice research scientists should take advantage of these new opportunities adequately in adoption of the new highly potential advanced technologies while planning experimental designs, data collection, analysis and interpretation of their research data sets.展开更多
Traffic tunnels include tunnel works for traffic and transport in the areas of railway, highway, and rail transit. With many mountains and nearly one fifth of the global population, China possesses numerous large citi...Traffic tunnels include tunnel works for traffic and transport in the areas of railway, highway, and rail transit. With many mountains and nearly one fifth of the global population, China possesses numerous large cities and megapolises with rapidly growing economies and huge traffic demands. As a result, a great deal of railway, highway, and rail transit facilities are required in this country. In the past, the construction of these facilities mainly involved subgrade and bridge works; in recent years.展开更多
The statistical map is usually used to indicate the quantitative features of various socio economic phenomena among regions on the base map of administrative divisions or on other base maps which connected with stati...The statistical map is usually used to indicate the quantitative features of various socio economic phenomena among regions on the base map of administrative divisions or on other base maps which connected with statistical unit. Making use of geographic information system (GIS) techniques, and supported by Auto CAD software, the author of this paper has put forward a practical method for making statistical map and developed a software (SMT) for the making of small scale statistical map using C language.展开更多
The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods...The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods is required. In this paper, the methodological approach to collecting, structuring and publishing the methods, which have been used or developed by former or present adaptation initiatives, is described. The intention is to communicate achieved knowledge and thus support future users. A key component is the participation of users in the development process. Main elements of the approach are standardized, template-based descriptions of the methods including the specific applications, references, and method assessment. All contributions have been quality checked, sorted, and placed in a larger context. The result is a report on statistical methods which is freely available as printed or online version. Examples of how to use the methods are presented in this paper and are also included in the brochure.展开更多
A novel approach to detect and filter out an unhealthy dataset from a matrix of datasets is developed, tested, and proved. The technique employs a new type of self organizing map called Accumulative Statistical Spread...A novel approach to detect and filter out an unhealthy dataset from a matrix of datasets is developed, tested, and proved. The technique employs a new type of self organizing map called Accumulative Statistical Spread Map (ASSM) to establish the destructive and negative effect a dataset will have on the rest of the matrix if stayed within that matrix. The ASSM is supported by training a neural network engine, which will determine which dataset is responsible for its inability to learn, classify and predict. The carried out experiments proved that a neural system was not able to learn in the presence of such an unhealthy dataset that possessed some deviated characteristics, even though it was produced under the same conditions and through the same process as the rest of the datasets in the matrix, and hence, it should be disqualified, and either removed completely or transferred to another matrix. Such novel approach is very useful in pattern recognition of datasets and features that do not belong to their source and could be used as an effective tool to detect suspicious activities in many areas of secure filing, communication and data storage.展开更多
Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at t...Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at the sampled point in the roadway's roof,and then,how the statistical processing of the available geomechanical data can affect the results of numerical modelling of the roadway's stability.Four cases were applied in the numerical analysis,using average values(the most common in geomechanical data analysis),average minus standard deviation,median,and average value minus statistical error.The study show that different approach to the same geomechanical data set can change the modelling results considerably.The case shows that average minus standard deviation is the most conservative and least risky.It gives the displacements and yielded elements zone in four times broader range comparing to the average values scenario,which is the least conservative option.The two other cases need to be studied further.The results obtained from them are placed between most favorable and most adverse values.Taking the average values corrected by statistical error for the numerical analysis seems to be the best solution.Moreover,the confidence level can be adjusted depending on the object importance and the assumed risk level.展开更多
Predicting seeing of astronomical observations can provide hints of the quality of optical imaging in the near future,and facilitate flexible scheduling of observation tasks to maximize the use of astronomical observa...Predicting seeing of astronomical observations can provide hints of the quality of optical imaging in the near future,and facilitate flexible scheduling of observation tasks to maximize the use of astronomical observatories.Traditional approaches to seeing prediction mostly rely on regional weather models to capture the in-dome optical turbulence patterns.Thanks to the developing of data gathering and aggregation facilities of astronomical observatories in recent years,data-driven approaches are becoming increasingly feasible and attractive to predict astronomical seeing.This paper systematically investigates data-driven approaches to seeing prediction by leveraging various big data techniques,from traditional statistical modeling,machine learning to new emerging deep learning methods,on the monitoring data of the Large sky Area Multi-Object fiber Spectroscopic Telescope(LAMOST).The raw monitoring data are preprocessed to allow for big data modeling.Then we formulate the seeing prediction task under each type of modeling framework and develop seeing prediction models through using representative big data techniques,including ARIMA and Prophet for statistical modeling,MLP and XGBoost for machine learning,and LSTM,GRU and Transformer for deep learning.We perform empirical studies on the developed models with a variety of feature configurations,yielding notable insights into the applicability of big data techniques to the seeing prediction task.展开更多
The most common way to analyze economics data is to use statistics software and spreadsheets.The paper presents opportunities of modern Geographical Information System (GIS) for analysis of marketing, statistical, a...The most common way to analyze economics data is to use statistics software and spreadsheets.The paper presents opportunities of modern Geographical Information System (GIS) for analysis of marketing, statistical, and macroeconomic data. It considers existing tools and models and their applications in various sectors. The advantage is that the statistical data could be combined with geographic views, maps and also additional data derived from the GIS. As a result, a programming system is developed, using GIS for analysis of marketing, statistical, macroeconomic data, and risk assessment in real time and prevention. The system has been successfully implemented as web-based software application designed for use with a variety of hardware platforms (mobile devices, laptops, and desktop computers). The software is mainly written in the programming language Python, which offers a better structure and supports for the development of large applications. Optimization of the analysis, visualization of macroeconomic, and statistical data by region for different business research are achieved. The system is designed with Geographical Information System for settlements in their respective countries and regions. Information system integration with external software packages for statistical calculations and analysis is implemented in order to share data analyzing, processing, and forecasting. Technologies and processes for loading data from different sources and tools for data analysis are developed. The successfully developed system allows implementation of qualitative data analysis.展开更多
This paper analyzes the application value of statistical analysis method of big data in economic management from the macro and micro perspectives,and analyzes its specific application from three aspects such as econom...This paper analyzes the application value of statistical analysis method of big data in economic management from the macro and micro perspectives,and analyzes its specific application from three aspects such as economic trends,industrial operations and marketing strategies.展开更多
Bearing-only passive tracking is regarded as a nonlinear hard tracking problem. There are still no completely good solutions to this problem until now. Based on current statistical model, the novel solution to this pr...Bearing-only passive tracking is regarded as a nonlinear hard tracking problem. There are still no completely good solutions to this problem until now. Based on current statistical model, the novel solution to this problem utilizing particle filter (PF) and the unscented Kalman filter (UKF) is proposed. The new solution adopts data fusion from two observers to increase the observability of passive tracking. It applies the residual resampling step to reduce the degeneracy of PF and it introduces the Markov Chain Monte Carlo methods (MCMC) to reduce the effect of the “sample impoverish”. Based on current statistical model, the EKF, the UKF and particle filter with various proposal distributions are compared in the passive tracking experiments with two observers. The simulation results demonstrate the good performance of the proposed new filtering methods with the novel techniques.展开更多
Natural soil-forming factors such as landforms, parent materials or biota lead to high variability in soil properties. However, there is not enough research quantifying which environmental factor(s) can be the most re...Natural soil-forming factors such as landforms, parent materials or biota lead to high variability in soil properties. However, there is not enough research quantifying which environmental factor(s) can be the most relevant to predicting soil properties at the catchment scale in semi-arid areas. Thus, this research aims to investigate the ability of multivariate statistical analyses to distinguish which soil properties follow a clear spatial pattern conditioned by specific environmental characteristics in a semi-arid region of Iran. To achieve this goal, we digitized parent materials and landforms by recent orthophotography. Also, we extracted ten topographical attributes and five remote sensing variables from a digital elevation model(DEM) and the Landsat Enhanced Thematic Mapper(ETM), respectively. These factors were contrasted for 334 soil samples(depth of 0–30 cm). Cluster analysis and soil maps reveal that Cluster 1 comprises of limestones, massive limestones and mixed deposits of conglomerates with low soil organic carbon(SOC) and clay contents, and Cluster 2 is composed of soils that originated from quaternary and early quaternary parent materials such as terraces, alluvial fans, lake deposits, and marls or conglomerates that register the highest SOC content and the lowest sand and silt contents. Further, it is confirmed that soils with the highest SOC and clay contents are located in wetlands, lagoons, alluvial fans and piedmonts, while soils with the lowest SOC and clay contents are located in dissected alluvial fans, eroded hills, rock outcrops and steep hills. The results of principal component analysis using the remote sensing data and topographical attributes identify five main components, which explain 73.3% of the total variability of soil properties. Environmental factors such as hillslope morphology and all of the remote sensing variables can largely explain SOC variability, but no significant correlation is found for soil texture and calcium carbonate equivalent contents. Therefore, we conclude that SOC can be considered as the best-predicted soil property in semi-arid regions.展开更多
This work correlated the detailed work zone location and time data from the Wis LCS system with the five-min inductive loop detector data. One-sample percentile value test and two-sample Kolmogorov-Smirnov(K-S) test w...This work correlated the detailed work zone location and time data from the Wis LCS system with the five-min inductive loop detector data. One-sample percentile value test and two-sample Kolmogorov-Smirnov(K-S) test were applied to compare the speed and flow characteristics between work zone and non-work zone conditions. Furthermore, we analyzed the mobility characteristics of freeway work zones within the urban area of Milwaukee, WI, USA. More than 50% of investigated work zones have experienced speed reduction and 15%-30% is necessary reduced volumes. Speed reduction was more significant within and at the downstream of work zones than at the upstream.展开更多
A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general...A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Statistical coupling analysis (SCA) is a statistical technique that uses evolutionary data of a protein family to measure correlation between distant functional sites and suggests allosteric communication. In proteins, very distant and small interactions between collections of amino acids provide the communication which can be important for signaling process. In this paper, we present the SCA of protein alignment of the esterase family (pfam ID: PF00756) containing the sequence of antigen 85C secreted by Mycobacterium tuberculosis to identify a subset of interacting residues. Clustering analysis of the pairwise correlation highlighted seven important residue positions in the esterase family alignments. These resi-dues were then mapped on the crystal structure of antigen 85C (PDB ID: 1DQZ). The mapping revealed corre-lation between 3 distant residues (Asp38, Leu123 and Met125) and suggests allosteric communication between them. This information can be used for a new drug against this fatal disease.展开更多
A body frame composed of thin sheet metal is a crucial structure that determines the safety performance of a vehicle.Designing a correct weight and high-performance automotive body is an emerging engineering problem.T...A body frame composed of thin sheet metal is a crucial structure that determines the safety performance of a vehicle.Designing a correct weight and high-performance automotive body is an emerging engineering problem.To improve the performance of the automotive frame,we attempt to reconstruct its design criteria based on statistical and mechanical approaches.At first,a fundamental study on the frame strength is conducted and a cross-sectional shape optimization problem is developed for designing the cross-sectional shape of an automobile frame having a very high mass efficiency for strength.Shape optimization is carried out using the nonlinear finite element method and a meta-modeling-based genetic algorithm.Data analysis of the obtained set of optimal results is performed to identify the dominant design variables by employing the smoothing spline analysis of variance,the principal component analysis,and the self-organizing map technique.The relationship between the cross-sectional shape and the objective function is also analyzed by hierarchical clustering.A design guideline is obtained from these statistical approach results.A comparison between the statistically obtained design guideline and the conventional one based on the designers’experience is performed based on mechanical interpretation of the optimal cross-sectional frame.Finally,a mechanically reasonable new general-purpose design guideline is proposed for the cross-sectional shape of the automotive frame.展开更多
Spatial downscaling methods are widely used for the production of bioclimatic variables(e.g. temperature and precipitation) in studies related to species ecological niche and drainage basin management and planning. Th...Spatial downscaling methods are widely used for the production of bioclimatic variables(e.g. temperature and precipitation) in studies related to species ecological niche and drainage basin management and planning. This study applied three different statistical methods, i.e. the moving window regression(MWR), nonparametric multiplicative regression(NPMR), and generalized linear model(GLM), to downscale the annual mean temperature(Bio1) and annual precipitation(Bio12) in central Iran from coarse scale(1 km × 1 km) to fine scale(250 m ×250 m). Elevation, aspect, distance from sea and normalized difference vegetation index(NDVI) were used as covariates to create downscaled bioclimatic variables. Model assessment was performed by comparing model outcomes with observational data from weather stations. Coefficients of determination(R2), bias, and root-mean-square error(RMSE) were used to evaluate models and covariates. The elevation could effectively justify the changes in bioclimatic factors related to temperature and precipitation. Allthree models could downscale the mean annual temperature data with similar R2, RMSE, and bias values. The MWR had the best performance and highest accuracy in downscaling annual precipitation(R2=0.70; RMSE=123.44). In general, the two nonparametric models, i.e. MWR and NPMR, can be reliably used for the downscaling of bioclimatic variables which have wide applications in species distribution modeling.展开更多
The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic,...The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of k-shinglings, a k-shingling being defined as a sequence of k consecutive characters that are extracted from a text (k ≥ 1). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman et al. ([1]), denoted here as (RUM). The Jaccard index is the one being used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.展开更多
The basic inference function of mathematical statistics, the score function, is a vector function. The author has introduced the scalar score, a scalar inference function, which reflects main features of a continuous ...The basic inference function of mathematical statistics, the score function, is a vector function. The author has introduced the scalar score, a scalar inference function, which reflects main features of a continuous probability distribution and which is simple. Its simplicity makes it possible to introduce new relevant numerical characteristics of continuous distributions. The t-mean and score variance are descriptions of distributions without the drawbacks of the mean and variance, which may not exist even in cases of regular distributions. Their sample counterparts appear to be alternative descriptions of the observed data. The scalar score itself appears to be a new mathematical tool, which could be used in solving traditional statistical problems for models far from the normal one, skewed and heavy-tailed.展开更多
Within the framework of its Statistical Capacity Building Program the African Development Bank (AfDB) is supporting development and improvement of statistical business registers (SBRs) in African countries. As a f...Within the framework of its Statistical Capacity Building Program the African Development Bank (AfDB) is supporting development and improvement of statistical business registers (SBRs) in African countries. As a first step, the AfDB prepared a document entitled Guidelines .for Building Statistical Business Registers in Africa, which describes SBR design, construction, introduction, use and maintenance. To support dissemination, interpretation and effective use of the Guidelines, the AfDB is now sponsoring a programme of review and recommendations for enhancements to SBRs in selected African national statistical offices. The paper outlines the content of the Guidelines and experiences in their application. The views expressed are those of the authors and ,do nnt nec'e^arilv reflect an official nosition of the AfDB.展开更多
基金Supported by the National Natural Science Foundation of China under Grant Nos 41774158,41474129 and 41704148the Chinese Meridian Projectthe Youth Innovation Promotion Association of the Chinese Academy of Sciences under Grant No2011324
文摘Extracting and parameterizing ionospheric waves globally and statistically is a longstanding problem. Based on the multichannel maximum entropy method(MMEM) used for studying ionospheric waves by previous work, we calculate the parameters of ionospheric waves by applying the MMEM to numerously temporally approximate and spatially close global-positioning-system radio occultation total electron content profile triples provided by the unique clustered satellites flight between years 2006 and 2007 right after the constellation observing system for meteorology, ionosphere, and climate(COSMIC) mission launch. The results show that the amplitude of ionospheric waves increases at the low and high latitudes(~0.15 TECU) and decreases in the mid-latitudes(~0.05 TECU). The vertical wavelength of the ionospheric waves increases in the mid-latitudes(e.g., ~50 km at altitudes of 200–250 km) and decreases at the low and high latitudes(e.g., ~35 km at altitudes of 200–250 km).The horizontal wavelength shows a similar result(e.g., ~1400 km in the mid-latitudes and ~800 km at the low and high latitudes).
文摘There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal component analysis, cluster analysis, factor analysis, pattern analysis, discriminant analysis, multivariate analysis of variance, correspondence analysis, canonical correlation analysis, redundancy analysis, genetic diversity analysis, and stability analysis, which involve in joint regression, additive main effects and multiplicative interactions, and genotype-by-environment interaction biplot analysis. The advanced statistical tools, such as non-parametric analysis of disease association, meta-analysis, Bayesian analysis, and decision theory, take an important place in analysis of disease dynamics. Disease forecasting methods by simulation models for plant diseases have a great potentiality in practical disease control strategies. Common mathematical tools such as monomolecular, exponential, logistic, Gompertz and linked differential equations take an important place in growth curve analysis of disease epidemics. The highly informative means of displaying a range of numerical data through construction of box and whisker plots has been suggested. The probable applications of recent advanced tools of linear and non-linear mixed models like the linear mixed model, generalized linear model, and generalized linear mixed models have been presented. The most recent technologies such as micro-array analysis, though cost effective, provide estimates of gene expressions for thousands of genes simultaneously and need attention by the molecular biologists. Some of these advanced tools can be well applied in different branches of rice research, including crop improvement, crop production, crop protection, social sciences as well as agricultural engineering. The rice research scientists should take advantage of these new opportunities adequately in adoption of the new highly potential advanced technologies while planning experimental designs, data collection, analysis and interpretation of their research data sets.
文摘Traffic tunnels include tunnel works for traffic and transport in the areas of railway, highway, and rail transit. With many mountains and nearly one fifth of the global population, China possesses numerous large cities and megapolises with rapidly growing economies and huge traffic demands. As a result, a great deal of railway, highway, and rail transit facilities are required in this country. In the past, the construction of these facilities mainly involved subgrade and bridge works; in recent years.
文摘The statistical map is usually used to indicate the quantitative features of various socio economic phenomena among regions on the base map of administrative divisions or on other base maps which connected with statistical unit. Making use of geographic information system (GIS) techniques, and supported by Auto CAD software, the author of this paper has put forward a practical method for making statistical map and developed a software (SMT) for the making of small scale statistical map using C language.
文摘The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods is required. In this paper, the methodological approach to collecting, structuring and publishing the methods, which have been used or developed by former or present adaptation initiatives, is described. The intention is to communicate achieved knowledge and thus support future users. A key component is the participation of users in the development process. Main elements of the approach are standardized, template-based descriptions of the methods including the specific applications, references, and method assessment. All contributions have been quality checked, sorted, and placed in a larger context. The result is a report on statistical methods which is freely available as printed or online version. Examples of how to use the methods are presented in this paper and are also included in the brochure.
文摘A novel approach to detect and filter out an unhealthy dataset from a matrix of datasets is developed, tested, and proved. The technique employs a new type of self organizing map called Accumulative Statistical Spread Map (ASSM) to establish the destructive and negative effect a dataset will have on the rest of the matrix if stayed within that matrix. The ASSM is supported by training a neural network engine, which will determine which dataset is responsible for its inability to learn, classify and predict. The carried out experiments proved that a neural system was not able to learn in the presence of such an unhealthy dataset that possessed some deviated characteristics, even though it was produced under the same conditions and through the same process as the rest of the datasets in the matrix, and hence, it should be disqualified, and either removed completely or transferred to another matrix. Such novel approach is very useful in pattern recognition of datasets and features that do not belong to their source and could be used as an effective tool to detect suspicious activities in many areas of secure filing, communication and data storage.
文摘Geomechanical data are never sufficient in quantity or adequately precise and accurate for design purposes in mining and civil engineering.The objective of this paper is to show the variability of rock properties at the sampled point in the roadway's roof,and then,how the statistical processing of the available geomechanical data can affect the results of numerical modelling of the roadway's stability.Four cases were applied in the numerical analysis,using average values(the most common in geomechanical data analysis),average minus standard deviation,median,and average value minus statistical error.The study show that different approach to the same geomechanical data set can change the modelling results considerably.The case shows that average minus standard deviation is the most conservative and least risky.It gives the displacements and yielded elements zone in four times broader range comparing to the average values scenario,which is the least conservative option.The two other cases need to be studied further.The results obtained from them are placed between most favorable and most adverse values.Taking the average values corrected by statistical error for the numerical analysis seems to be the best solution.Moreover,the confidence level can be adjusted depending on the object importance and the assumed risk level.
基金supported by the National Natural Science Foundation of China(U1931207,61602278 and 61702306)Sci.&Tech.Development Fund of Shandong Province of China(2016ZDJS02A11,ZR2017BF015 and ZR2017MF027)+1 种基金the Humanities and Social Science Research Project of the Ministry of Education(18YJAZH017)the Taishan Scholar Program of Shandong Province,and the Science and Technology Support Plan of Youth Innovation Team of Shandong Higher School(2019KJN024)。
文摘Predicting seeing of astronomical observations can provide hints of the quality of optical imaging in the near future,and facilitate flexible scheduling of observation tasks to maximize the use of astronomical observatories.Traditional approaches to seeing prediction mostly rely on regional weather models to capture the in-dome optical turbulence patterns.Thanks to the developing of data gathering and aggregation facilities of astronomical observatories in recent years,data-driven approaches are becoming increasingly feasible and attractive to predict astronomical seeing.This paper systematically investigates data-driven approaches to seeing prediction by leveraging various big data techniques,from traditional statistical modeling,machine learning to new emerging deep learning methods,on the monitoring data of the Large sky Area Multi-Object fiber Spectroscopic Telescope(LAMOST).The raw monitoring data are preprocessed to allow for big data modeling.Then we formulate the seeing prediction task under each type of modeling framework and develop seeing prediction models through using representative big data techniques,including ARIMA and Prophet for statistical modeling,MLP and XGBoost for machine learning,and LSTM,GRU and Transformer for deep learning.We perform empirical studies on the developed models with a variety of feature configurations,yielding notable insights into the applicability of big data techniques to the seeing prediction task.
文摘The most common way to analyze economics data is to use statistics software and spreadsheets.The paper presents opportunities of modern Geographical Information System (GIS) for analysis of marketing, statistical, and macroeconomic data. It considers existing tools and models and their applications in various sectors. The advantage is that the statistical data could be combined with geographic views, maps and also additional data derived from the GIS. As a result, a programming system is developed, using GIS for analysis of marketing, statistical, macroeconomic data, and risk assessment in real time and prevention. The system has been successfully implemented as web-based software application designed for use with a variety of hardware platforms (mobile devices, laptops, and desktop computers). The software is mainly written in the programming language Python, which offers a better structure and supports for the development of large applications. Optimization of the analysis, visualization of macroeconomic, and statistical data by region for different business research are achieved. The system is designed with Geographical Information System for settlements in their respective countries and regions. Information system integration with external software packages for statistical calculations and analysis is implemented in order to share data analyzing, processing, and forecasting. Technologies and processes for loading data from different sources and tools for data analysis are developed. The successfully developed system allows implementation of qualitative data analysis.
文摘This paper analyzes the application value of statistical analysis method of big data in economic management from the macro and micro perspectives,and analyzes its specific application from three aspects such as economic trends,industrial operations and marketing strategies.
基金This workis supported by national863project :No.2001AA422420 02
文摘Bearing-only passive tracking is regarded as a nonlinear hard tracking problem. There are still no completely good solutions to this problem until now. Based on current statistical model, the novel solution to this problem utilizing particle filter (PF) and the unscented Kalman filter (UKF) is proposed. The new solution adopts data fusion from two observers to increase the observability of passive tracking. It applies the residual resampling step to reduce the degeneracy of PF and it introduces the Markov Chain Monte Carlo methods (MCMC) to reduce the effect of the “sample impoverish”. Based on current statistical model, the EKF, the UKF and particle filter with various proposal distributions are compared in the passive tracking experiments with two observers. The simulation results demonstrate the good performance of the proposed new filtering methods with the novel techniques.
基金financial support of Isfahan University of Technology (IUT) for this research
文摘Natural soil-forming factors such as landforms, parent materials or biota lead to high variability in soil properties. However, there is not enough research quantifying which environmental factor(s) can be the most relevant to predicting soil properties at the catchment scale in semi-arid areas. Thus, this research aims to investigate the ability of multivariate statistical analyses to distinguish which soil properties follow a clear spatial pattern conditioned by specific environmental characteristics in a semi-arid region of Iran. To achieve this goal, we digitized parent materials and landforms by recent orthophotography. Also, we extracted ten topographical attributes and five remote sensing variables from a digital elevation model(DEM) and the Landsat Enhanced Thematic Mapper(ETM), respectively. These factors were contrasted for 334 soil samples(depth of 0–30 cm). Cluster analysis and soil maps reveal that Cluster 1 comprises of limestones, massive limestones and mixed deposits of conglomerates with low soil organic carbon(SOC) and clay contents, and Cluster 2 is composed of soils that originated from quaternary and early quaternary parent materials such as terraces, alluvial fans, lake deposits, and marls or conglomerates that register the highest SOC content and the lowest sand and silt contents. Further, it is confirmed that soils with the highest SOC and clay contents are located in wetlands, lagoons, alluvial fans and piedmonts, while soils with the lowest SOC and clay contents are located in dissected alluvial fans, eroded hills, rock outcrops and steep hills. The results of principal component analysis using the remote sensing data and topographical attributes identify five main components, which explain 73.3% of the total variability of soil properties. Environmental factors such as hillslope morphology and all of the remote sensing variables can largely explain SOC variability, but no significant correlation is found for soil texture and calcium carbonate equivalent contents. Therefore, we conclude that SOC can be considered as the best-predicted soil property in semi-arid regions.
基金Project(61620106002)supported by the National Natural Science Foundation of ChinaProject(2016YFB0100906)supported by the National Key R&D Program in China+1 种基金Project(2015364X16030)supported by the Information Technology Research Project of Ministry of Transport of ChinaProject(2242015K42132)supported by the Fundamental Sciences of Southeast University,China
文摘This work correlated the detailed work zone location and time data from the Wis LCS system with the five-min inductive loop detector data. One-sample percentile value test and two-sample Kolmogorov-Smirnov(K-S) test were applied to compare the speed and flow characteristics between work zone and non-work zone conditions. Furthermore, we analyzed the mobility characteristics of freeway work zones within the urban area of Milwaukee, WI, USA. More than 50% of investigated work zones have experienced speed reduction and 15%-30% is necessary reduced volumes. Speed reduction was more significant within and at the downstream of work zones than at the upstream.
文摘A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Statistical coupling analysis (SCA) is a statistical technique that uses evolutionary data of a protein family to measure correlation between distant functional sites and suggests allosteric communication. In proteins, very distant and small interactions between collections of amino acids provide the communication which can be important for signaling process. In this paper, we present the SCA of protein alignment of the esterase family (pfam ID: PF00756) containing the sequence of antigen 85C secreted by Mycobacterium tuberculosis to identify a subset of interacting residues. Clustering analysis of the pairwise correlation highlighted seven important residue positions in the esterase family alignments. These resi-dues were then mapped on the crystal structure of antigen 85C (PDB ID: 1DQZ). The mapping revealed corre-lation between 3 distant residues (Asp38, Leu123 and Met125) and suggests allosteric communication between them. This information can be used for a new drug against this fatal disease.
文摘A body frame composed of thin sheet metal is a crucial structure that determines the safety performance of a vehicle.Designing a correct weight and high-performance automotive body is an emerging engineering problem.To improve the performance of the automotive frame,we attempt to reconstruct its design criteria based on statistical and mechanical approaches.At first,a fundamental study on the frame strength is conducted and a cross-sectional shape optimization problem is developed for designing the cross-sectional shape of an automobile frame having a very high mass efficiency for strength.Shape optimization is carried out using the nonlinear finite element method and a meta-modeling-based genetic algorithm.Data analysis of the obtained set of optimal results is performed to identify the dominant design variables by employing the smoothing spline analysis of variance,the principal component analysis,and the self-organizing map technique.The relationship between the cross-sectional shape and the objective function is also analyzed by hierarchical clustering.A design guideline is obtained from these statistical approach results.A comparison between the statistically obtained design guideline and the conventional one based on the designers’experience is performed based on mechanical interpretation of the optimal cross-sectional frame.Finally,a mechanically reasonable new general-purpose design guideline is proposed for the cross-sectional shape of the automotive frame.
文摘Spatial downscaling methods are widely used for the production of bioclimatic variables(e.g. temperature and precipitation) in studies related to species ecological niche and drainage basin management and planning. This study applied three different statistical methods, i.e. the moving window regression(MWR), nonparametric multiplicative regression(NPMR), and generalized linear model(GLM), to downscale the annual mean temperature(Bio1) and annual precipitation(Bio12) in central Iran from coarse scale(1 km × 1 km) to fine scale(250 m ×250 m). Elevation, aspect, distance from sea and normalized difference vegetation index(NDVI) were used as covariates to create downscaled bioclimatic variables. Model assessment was performed by comparing model outcomes with observational data from weather stations. Coefficients of determination(R2), bias, and root-mean-square error(RMSE) were used to evaluate models and covariates. The elevation could effectively justify the changes in bioclimatic factors related to temperature and precipitation. Allthree models could downscale the mean annual temperature data with similar R2, RMSE, and bias values. The MWR had the best performance and highest accuracy in downscaling annual precipitation(R2=0.70; RMSE=123.44). In general, the two nonparametric models, i.e. MWR and NPMR, can be reliably used for the downscaling of bioclimatic variables which have wide applications in species distribution modeling.
文摘The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of k-shinglings, a k-shingling being defined as a sequence of k consecutive characters that are extracted from a text (k ≥ 1). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman et al. ([1]), denoted here as (RUM). The Jaccard index is the one being used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.
文摘The basic inference function of mathematical statistics, the score function, is a vector function. The author has introduced the scalar score, a scalar inference function, which reflects main features of a continuous probability distribution and which is simple. Its simplicity makes it possible to introduce new relevant numerical characteristics of continuous distributions. The t-mean and score variance are descriptions of distributions without the drawbacks of the mean and variance, which may not exist even in cases of regular distributions. Their sample counterparts appear to be alternative descriptions of the observed data. The scalar score itself appears to be a new mathematical tool, which could be used in solving traditional statistical problems for models far from the normal one, skewed and heavy-tailed.
文摘Within the framework of its Statistical Capacity Building Program the African Development Bank (AfDB) is supporting development and improvement of statistical business registers (SBRs) in African countries. As a first step, the AfDB prepared a document entitled Guidelines .for Building Statistical Business Registers in Africa, which describes SBR design, construction, introduction, use and maintenance. To support dissemination, interpretation and effective use of the Guidelines, the AfDB is now sponsoring a programme of review and recommendations for enhancements to SBRs in selected African national statistical offices. The paper outlines the content of the Guidelines and experiences in their application. The views expressed are those of the authors and ,do nnt nec'e^arilv reflect an official nosition of the AfDB.