A factor analysis was applied to soil geochemical data to define anomalies related to buried Pb-Zn mineralization.A favorable main factor with a strong association of the elements Zn,Cu and Pb,related to mineralizatio...A factor analysis was applied to soil geochemical data to define anomalies related to buried Pb-Zn mineralization.A favorable main factor with a strong association of the elements Zn,Cu and Pb,related to mineralization,was selected for interpretation.The median+2 MAD(median absolute deviation)method of exploratory data analysis(EDA)and C-A(concentration-area)fractal modeling were then applied to the Mahalanobis distance,as defined by Zn,Cu and Pb from the factor analysis to set the thresholds for defining multi-element anomalies.As a result,the median+2 MAD method more successfully identified the Pb-Zn mineralization than the C-A fractal model.The soil anomaly identified by the median+2 MAD method on the Mahalanobis distances defined by three principal elements(Zn,Cu and Pb)rather than thirteen elements(Co,Zn,Cu,V,Mo,Ni,Cr,Mn,Pb,Ba,Sr,Zr and Ti)was the more favorable reflection of the ore body.The identified soil geochemical anomalies were compared with the in situ economic Pb-Zn ore bodies for validation.The results showed that the median+2 MAD approach is capable of mapping both strong and weak geochemical anomalies related to buried Pb-Zn mineralization,which is therefore useful at the reconnaissance drilling stage.展开更多
Identifying the subcellular localization of proteins is particularly helpful in the functional annotation of gene products. In this study, we use Machine Learning and Exploratory Data Analysis (EDA) techniques to ex...Identifying the subcellular localization of proteins is particularly helpful in the functional annotation of gene products. In this study, we use Machine Learning and Exploratory Data Analysis (EDA) techniques to examine and characterize amino acid sequences of human proteins localized in nine cellular compartments. A dataset of 3,749 protein sequences representing human proteins was extracted from the SWISS-PROT database. Feature vectors were created to capture specific amino acid sequence characteristics. Relative to a Support Vector Machine, a Multi-layer Perceptron, and a Naive Bayes classifier, the C4.5 Decision Tree algorithm was the most consistent performer across all nine compartments in reliably predicting the subcellular localization of proteins based on their amino acid sequences (average Precision=0.88; average Sensitivity=0.86). Furthermore, EDA graphics characterized essential features of proteins in each compartment. As examples, proteins localized to the plasma membrane had higher proportions of hydrophobic amino acids; cytoplasmic proteins had higher proportions of neutral amino acids; and mitochondrial proteins had higher proportions of neutral amino acids and lower proportions of polar amino acids. These data showed that the C4.5 classifier and EDA tools can be effective for characterizing and predicting the subcellular localization of human proteins based on their amino acid sequences.展开更多
The Yellow River Basin of China is a key region that contains myriad interactions between human activities and natural environment.Industrialization and urbanization promote social-economic development,but they also h...The Yellow River Basin of China is a key region that contains myriad interactions between human activities and natural environment.Industrialization and urbanization promote social-economic development,but they also have generated a series of environmental and ecological issues in this basin.Previous researches have evaluated urban resilience at the national,regional,urban agglomeration,city,and prefecture levels,but not at the watershed level.To address this research gap and elevate the Yellow River Basin’s urban resilience level,we constructed an urban resilience evaluation index system from five dimensions:industrial resilience,social resilience,environmental resilience,technological resilience,and organizational resilience.The entropy weight method was used to comprehensively evaluate urban resilience in the Yellow River Basin.The exploratory spatial data analysis method was employed to study the spatiotemporal differences in urban resilience in the Yellow River Basin in 2010,2015,and 2020.Furthermore,the grey correlation analysis method was utilized to explore the influencing factors of these differences.The results of this study are as follows:(1)the overall level of urban resilience in the Yellow River Basin was relatively low but showed an increasing trend during 2010–2015,and significant spatial distribution differences were observed,with a higher resilience level in the eastern region and a low-medium resilience level in the western region;(2)the differences in urban resilience were noticeable,with industrial resilience and social resilience being relatively highly developed,whereas organizational resilience and environmental resilience were relatively weak;and(3)the correlation ranking of resilience influencing factors was as follows:science and technology level>administrative power>openness>market forces.This research can provide a basis for improving the resilience level of cities in the Yellow River Basin and contribute to the high-quality development of the region.展开更多
Urban resilience assesses a city’s ability to withstand unknown risks.Scholars are not comprehensive in assessing urban resilience,and they lack consideration of population resilience.This study investigated 110 pref...Urban resilience assesses a city’s ability to withstand unknown risks.Scholars are not comprehensive in assessing urban resilience,and they lack consideration of population resilience.This study investigated 110 prefecturelevel cities in the Yangtze River Economic Belt(YREB)as study areas.We calculated the YREB’s level of urban resilience based on the aspects of“economy-society-population-ecology-infrastructure”,which ensured that the comprehensive evaluation of urban resilience is complete and sufficient.The spatio-temporal evolution of urban resilience was analyzed using exploratory spatial data.Geodetectors were used to investigate the impact of several indicators,focusing on economic,social,population,ecological,and infrastructure factors,on urban resilience.The results showed that the urban resilience of the YREB has maintained a slow upward trend from 2005 to 2018,and the average urban resilience of the YREB has risen from 0.2442 to 0.2560.The resilience gap between cities in the study region increased initially and then decreased.The dominant factor in the spatial differentiation of urban resilience was the economic factors,followed by the population factors.Urban resilience has been clarified and an evaluation index system is constructed,which can provide an effective reference for the evaluation of urban resilience among countries around the world.Based on this,factors that optimize urban resilience are configured,and the regional and national sustainable development can be promoted.展开更多
Exploratory data analysis plays a major role in obtaining insights from data.Over the last two decades,researchers have proposed several visual data exploration tools that can assist with each step of the analysis pro...Exploratory data analysis plays a major role in obtaining insights from data.Over the last two decades,researchers have proposed several visual data exploration tools that can assist with each step of the analysis process.Nevertheless,in recent years,data analysis requirements have changed significantly.With constantly increasing size and types of data to be analyzed,scalability and analysis duration are now among the primary concerns of researchers.Moreover,in order to minimize the analysis cost,businesses are in need of data analysis tools that can be used with limited analytical knowledge.To address these challenges,traditional data exploration tools have evolved within the last few years.In this paper,with an in-depth analysis of an industrial tabular dataset,we identify a set of additional exploratory requirements for large datasets.Later,we present a comprehensive survey of the recent advancements in the emerging field of exploratory data analysis.We investigate 50 academic and non-academic visual data exploration tools with respect to their utility in the six fundamental steps of the exploratory data analysis process.We also examine the extent to which these modern data exploration tools fulfill the additional requirements for analyzing large datasets.Finally,we identify and present a set of research opportunities in the field of visual exploratory data analysis.展开更多
A significant Geographic Information Science(GIS)issue is closely related to spatial autocorrelation,a burning question in the phase of information extraction from the statistical analysis of georeferenced data.At pre...A significant Geographic Information Science(GIS)issue is closely related to spatial autocorrelation,a burning question in the phase of information extraction from the statistical analysis of georeferenced data.At present,spatial autocorrelation presents two types of measures:continuous and discrete.Is it possible to use Moran’s I and the Moran scatterplot with continuous data?Is it possible to use the same methodology with discrete data?A particular and cumbersome problem is the choice of the spatial-neighborhood matrix(W)for points data.This paper addresses these issues by introducing the concept of covariogram contiguity,where each weight is based on the variogram model for that particular dataset:(1)the variogram,whose range equals the distance with the highest Moran I value,defines the weights for points separated by less than the estimated range and(2)weights equal zero for points widely separated from the variogram range considered.After the W matrix is computed,the Moran location scatterplot is created in an iterative process.In accordance with various lag distances,Moran’s I is presented as a good search factor for the optimal neighborhood area.Uncertainty/transition regions are also emphasized.At the same time,a new Exploratory Spatial Data Analysis(ESDA)tool is developed,the Moran variance scatterplot,since the conventional Moran scatterplot is not sensitive to neighbor variance.This computer-mapping framework allows the study of spatial patterns,outliers,changeover areas,and trends in an ESDA process.All these tools were implemented in a free web e-Learning program for quantitative geographers called SAKWeb#(or,in the near future,myGeooffice.org).展开更多
Churn prediction is a common task for machine learning applications in business.In this paper,this task is adapted for solving problem of low efficiency of massive open online courses(only 5%of all the students finish...Churn prediction is a common task for machine learning applications in business.In this paper,this task is adapted for solving problem of low efficiency of massive open online courses(only 5%of all the students finish their course).The approach is presented on course“Methods and algorithms of the graph theory”held on national platform of online education in Russia.This paper includes all the steps to build an intelligent system to predict students who are active during the course,but not likely to finish it.The first part consists of constructing the right sample for prediction,EDA and choosing the most appropriate week of the course to make predictions on.The second part is about choosing the right metric and building models.Also,approach with using ensembles like stacking is proposed to increase the accuracy of predictions.As a result,a general approach to build a churn prediction model for online course is reviewed.This approach can be used for making the process of online education adaptive and intelligent for a separate student.展开更多
Rural development inequality is an important practical issue during the course of full establishment of a ′moderately well-off society′ in modern China,and the objective understanding and evaluation of the status an...Rural development inequality is an important practical issue during the course of full establishment of a ′moderately well-off society′ in modern China,and the objective understanding and evaluation of the status and regional inequality of rural development can provide scientific basis for ′building a new countryside′ and coordination development of rural-urban regions.Based on the county-level data of 2000,2005 and 2009,this paper examines the rural development inequality of Jilin Province in Northeast China by establishing a rural development index.The spatio-temporal dynamic patterns and domain factors are discussed by using the method of exploratory spatial data analysis and multi-regression model.The results are shown as follows.Firstly,most of the counties were in lower development level,which accounted for 58.3%,62.5% and 66.7% of the total counties in 2000,2005 and 2009,respectively.The characteristics of spatial inequality were very obvious at county level.For example,rural development level of Changchun Proper and the proper of seven prefecture-level cities were much higher than that of the surrounding regions.The counties in the eastern and northern Jilin Province were the lowest regions of rural development level,while the middle counties were the rapid growth areas in rural economy.Secondly,Moran′s I of rural development index(RDI) was 0.01,–0.16 and –0.06 in 2000,2005 and 2009,respectively,which indicated that spatial agglomeration of RDI was not obvious in Jilin Province,and took on the characteristic of random distribution.The counties of both the units and its adjacent units have higher development level(HH) were transferred from the western areas to the eastern areas,while the countries of both the units and its adjacent units have lower development level(LL) were diffused from the eastern to middle and western Jilin Province.Finally,the result of multi-regression analysis showed that the improvement of agricultural production condition,development of agricultural economics and the adjustment of industrial structure were the domain factors affecting rural development inequality of Jilin Province in the later ten years.展开更多
This paper principally focuses on the morphological differences,spatial pattern and regional types of rural settlements in Xuzhou City of Jiangsu Province in China.Using satellite images of Xuzhou City taken in 2007 a...This paper principally focuses on the morphological differences,spatial pattern and regional types of rural settlements in Xuzhou City of Jiangsu Province in China.Using satellite images of Xuzhou City taken in 2007 and 2008 and models of exploratory spatial data analysis(ESDA) and spatial metrics,the paper conducts a quantitative analysis of the morphological pattern of rural settlements,and finds significant characteristics.First,rural settlements in Xuzhou City are significantly agglomerated in terms of their spatial distribution;meanwhile,there is significant variation in the geographical density distribution.Second,the scale of rural settlements in Xuzhou City is larger than the average in Jiangsu Province,and the histogram of the scale data is more even and more like a gamma distribution.There are a significant high-value cluster in the scale distribution,and local negative correlation between the scale and density distribution of rural settlements in Xuzhou City.Third,the morphology of rural settlements in Xuzhou City shows relative regularity with good connection and integrity,but the spatial variation of the morphology is anisotropic.Finally,according to the characteristics of density,scale,and form of rural settlements,the rural settlements of Xuzhou City are divided into three types:A high-density and point-scattered type,a low-density and cluster-like type and a mass-like and sparse type.The research findings could be used as the scientific foundation for rural planning and community rebuilding,particularly in less-developed areas.展开更多
With the increasing effects of global climate change and fishing activities,the spatial distribution of the neon flying squid(Ommastrephes bartramii) is changing in the traditional fishing ground of 150°-160°...With the increasing effects of global climate change and fishing activities,the spatial distribution of the neon flying squid(Ommastrephes bartramii) is changing in the traditional fishing ground of 150°-160°E and 38°-45°N in the northwest Pacific Ocean.This research aims to identify the spatial hot and cold spots(i.e.spatial clusters) of O.bartramii to reveal its spatial structure using commercial fishery data from2007 to 2010 collected by Chinese mainland squid-j igging fleets.A relatively strongly-clustered distribution for O.bartramii was observed using an exploratory spatial data analysis(ESDA) method.The results show two hot spots and one cold spot in 2007 while only one hot and one cold spots were identified each year from2008 to 2010.The hot and cold spots in 2007 occupied 8.2%and 5.6%of the study area,respectively;these percentages for hot and cold spot areas were 5.8%and 3.1%in 2008,10.2%and 2.9%in 2009,and 16.4%and 11.9%in 2010,respectively.Nearly half(>45%) of the squid from 2007 to 2009 reported by Chinese fleets were caught in hot spot areas while this percentage reached its peak at 68.8%in 2010,indicating that the hot spot areas are central fishing grounds.A further change analysis shows the area centered at156°E/43.5°N was persistent as a hot spot over the whole period from 2007 to 2010.Furthermore,the hot spots were mainly identified in areas with sea surface temperature(SST) in the range of 15-20℃ around warm Kuroshio Currents as well as with the chlorophyll-a(chl-a) concentration above 0.3 mg/m^3.The outcome of this research improves our understanding of spatiotemporal hotspots and its variation for O.bartramii and is useful for sustainable exploitation,assessment,and management of this squid.展开更多
Quality of life(QOL) is a hotspot issue that has attracted increasing attention from the Chinese Government and scholars, it is also a vital issue that should be addressed during the cause of ′establishing overall we...Quality of life(QOL) is a hotspot issue that has attracted increasing attention from the Chinese Government and scholars, it is also a vital issue that should be addressed during the cause of ′establishing overall well-off society′. Northeast China is one of the most import old industrial bases in China, however, the industrial structure of heavy chemical industry and the development mode of ′production first, living last′ have leaded to series of social problems, which have also become a serious bottleneck to social stability and economic sustainable development. Through applying the methods of BP neural network, exploratory spatial data analysis(ESDA) and spatial regression model, this paper examines the space-time dynamics of QOL of the residents in Northeast China. We first investigate the indexes of QOL of the residents and then use ESDA methods to visualize its space-time relationship. We have found a spatial agglomeration of QOL of the residents in middle-southern Liaoning Province, central Jilin Province and Harbin-Qiqihar-Daqing area of Heilongjiang Province. Two third of the counties are low-low spatial correlation, and the correlative type of about 60% of the prefecture level areas keeps stable, indicating QOL of the residents in Northeast China shows a certain character of path dependence or spatial locked. We have also found that economic strength and development levels of service industry have positive and obvious effect on QOL of the residents, while the effect of such indexes as the social service level and the proportion of the tertiary industries are less.展开更多
This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure.Symbolic data is usually aggregated from large data sets and used to hide entry specific details an...This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure.Symbolic data is usually aggregated from large data sets and used to hide entry specific details and to transform huge amounts of data(like big data)into analyzable quantities.It is also used to offer an overview in places where general trends are more important than individual details.Symbolic data comes in many forms like intervals,histograms,categories and modal multi-valued objects.Symbolic data can also be considered as a distribution.Currently,the de facto visualization approach for symbolic data is zoomstars which has many limitations.The biggest limitation is that the default distributions(histograms)are not supported in 2D as additional dimension is required.This paper proposes several new improvements for zoomstars which would enable it to visualize histograms in 2D by using a quantile or an equivalent interval approach.In addition,several improvements for categorical and modal variables are proposed for a clearer indication of presented categories.Recommendations for different approaches to zoomstars are offered depending on the data type and the desired goal.Furthermore,an alternative approach that allows visualizing the whole data set in comprehensive table-like graph,called shape encoding,is proposed.These visualizations and their usefulness are verified with three symbolic data sets in exploratory data mining phase to identify trends,similar objects and important features,detecting outliers and discrepancies in the data.展开更多
The aim of this work is to describe and compare three exploratory chemometrical tools,principal components analysis,independent components analysis and common components analysis,the last one being a modification of t...The aim of this work is to describe and compare three exploratory chemometrical tools,principal components analysis,independent components analysis and common components analysis,the last one being a modification of the multi-block statistical method known as common components and specific weights analysis.The three methods were applied to a set of data to show the differences and similarities of the results obtained,highlighting their complementarity.展开更多
In 2007,China surpassed the USA to become the largest carbon emitter in the world.China has promised a 60%–65%reduction in carbon emissions per unit GDP by 2030,compared to the baseline of 2005.Therefore,it is import...In 2007,China surpassed the USA to become the largest carbon emitter in the world.China has promised a 60%–65%reduction in carbon emissions per unit GDP by 2030,compared to the baseline of 2005.Therefore,it is important to obtain accurate dynamic information on the spatial and temporal patterns of carbon emissions and carbon footprints to support formulating effective national carbon emission reduction policies.This study attempts to build a carbon emission panel data model that simulates carbon emissions in China from 2000–2013 using nighttime lighting data and carbon emission statistics data.By applying the Exploratory Spatial-Temporal Data Analysis(ESTDA)framework,this study conducted an analysis on the spatial patterns and dynamic spatial-temporal interactions of carbon footprints from 2001–2013.The improved Tapio decoupling model was adopted to investigate the levels of coupling or decoupling between the carbon emission load and economic growth in 336 prefecture-level units.The results show that,firstly,high accuracy was achieved by the model in simulating carbon emissions.Secondly,the total carbon footprints and carbon deficits across China increased with average annual growth rates of 4.82%and 5.72%,respectively.The overall carbon footprints and carbon deficits were larger in the North than that in the South.There were extremely significant spatial autocorrelation features in the carbon footprints of prefecture-level units.Thirdly,the relative lengths of the Local Indicators of Spatial Association(LISA)time paths were longer in the North than that in the South,and they increased from the coastal to the central and western regions.Lastly,the overall decoupling index was mainly a weak decoupling type,but the number of cities with this weak decoupling continued to decrease.The unsustainable development trend of China’s economic growth and carbon emission load will continue for some time.展开更多
How do people talk about COVID-19 online?To address this question,we offer an unsupervised framework that allows us to examine Twitter framings of the pandemic.Our approach employs a network-based exploration of socia...How do people talk about COVID-19 online?To address this question,we offer an unsupervised framework that allows us to examine Twitter framings of the pandemic.Our approach employs a network-based exploration of social media data to identify,categorize,and understand communication patterns about the novel coronavirus on Twitter.The simplest structure that emerges from our analysis is the distinction between the internal/personal,external/global,and generic threat framings of the pandemic.This structure replicates in different Twitter samples and is validated using the variation of information measure,reflecting the significance and stability of our findings.Such an exploratory study is useful for understanding the contours of the natural,non-random structure in this online space.We contend that this understanding of structure is necessary to address a host of causal,supervised,and related questions downstream.展开更多
According to the connotation and structure of science and technology resources and some relevant data of more than 286 cities at prefecture level and above during 2001-2010, using modified method--Data Envelopment Ana...According to the connotation and structure of science and technology resources and some relevant data of more than 286 cities at prefecture level and above during 2001-2010, using modified method--Data Envelopment Analysis (DEA), science and tech- nology (S&T) resource allocation efficiency of different cities in different periods has been figured out, which, uncovers the distributional difference and change law of S&T resource allocation efficiency from the time-space dimension. Based on that, this paper has analyzed and discussed the spatial distribution pattern and evolution trend of S&T resource allocation efficiency in different cities by virtue of the Exploratory Spatial Data Analysis (ESDA). It turned out that: (1) the average of S&T resource allocation efficiency in cities at prefecture level and above has always stayed at low levels, moreover, with repeated fluctuations between high and low, which shows a decreasing trend year by year. Besides, the gap between the East and the West is widening. (2) The asymmetrical distribution of S&T resource allocation effi- ciency presents a spatial pattern of successively decreasing from Eastern China, Central China to Western China. The cities whose S&T resource allocation efficiency are at higher level and high level take on a cluster distribution, which fits well with the 23 forming urban agglomerations in China. (3) The coupling degree between S&T resource allocation efficiency and economic environment assumes a certain positive correlation, but not completely the same. The differentiation of S&T resource allocation efficiency is common in regional devel- opment, whose existence and evolution are directly or indirectly influenced by and regarded as the reflection of many elements, such as geographical location, the natural endowment and environment of S&T resources and so on. (4) In the perspective of the evolution of spatial structure, S&T resource allocation efficiency of the cities at prefecture level and above shows a notable spatial autocorrelation, which in every period presents a positive correlation. The spatial distribution of S&T resource allocation efficiency in neighboring cities seems to be similar in group, which tends to escalate stepwise. Meanwhile, the whole differentiation of geographical space has a diminishing tendency. (5) Viewed from LISA agglomeration map of S&T resource allocation efficiency in different periods, four agglomeration types have changed differently in spatial location and the range of spatial agglomeration. And the conti- nuity of S&T resource allocation efficiency in geographical space is gradually increasing.展开更多
Influenced by globalization,rural transition in developed Western countries has experienced processes of productivism,post-productivism,and multifunctional development.By contrast,rural transition in most developing c...Influenced by globalization,rural transition in developed Western countries has experienced processes of productivism,post-productivism,and multifunctional development.By contrast,rural transition in most developing countries has been accompanied by rapid urbanization,which has become a core topic in geography research.As the world’s largest developing country,China has undergone profound development since the reform and opening-up.Moreover,rural spaces in some eastern coastal areas have entered the stage of reconstruction after decades of industrialization and urbanization.This paper takes Suzhou as the case area and measures the process of rural transition from 1990 to 2015 by constructing an index system.It then analyzes the characteristics of space-time evolution using exploratory spatial data analysis(ESDA)methods to reveal the influence of economic and social development on rural transition.The results show that rural transition,which generally entails the weakening of rurality and enhancing of urbanity on a macro scale,tends to be heterogeneous across different regions on a micro scale.This paper argues that multifunctionality will be the main future trend of rural transition in rapidly urbanizing areas.The experience in Suzhou could provide an example for establishing policies on sustainable development in rural spaces and achieving urban-rural co-governance.展开更多
It is urgent and important to explore the dynamic evolution in comprehensive transportation green efficiency(CTGE)in the context of green development.We constructed a social development index that reflects the social ...It is urgent and important to explore the dynamic evolution in comprehensive transportation green efficiency(CTGE)in the context of green development.We constructed a social development index that reflects the social benefits of transportation services,and incorporated it into the comprehensive transportation efficiency evaluation framework as an expected output.Based on the panel data of 30 regions in China from 2003-2018,the CTGE in China was measured using the slacks-based measure-data envelopment analysis(SBM-DEA)model.Further,the dynamic evolution trends of CTGE were determined using the spatial Markov model and exploratory spatio-temporal data analysis(ESTDA)technique from a spatio-temporal perspective.The results showed that the CTGE shows a U-shaped change trend but with an overall low level and significant regional differences.The state transition of CTGE has a strong spatial dependence,and there exists the phenomenon of“club convergence”.Neighbourhood background has a significant impact on the CTGE transition types,and the spatial spillover effect is pronounced.The CTGE has an obvious positive correlation and spatial agglomeration characteristics.The geometric characteristics of the LISA time path show that the evolution process of local spatial structure and local spatial dependence of China’s CTGE is stable,but the integration of spatial evolution is weak.The spatio-temporal transition results of LISA indicate that the CTGE has obvious transfer inertness and has certain path-dependence and spatial locking characteristics,which will become the major difficulty in improving the CTGE.展开更多
The aim of this paper is to study the spatialtemporal differentiation of industrial eco-efficiency in China. Using methods based on the data envelopment analysis (DEA) model and exploratory spatial data analysis (E...The aim of this paper is to study the spatialtemporal differentiation of industrial eco-efficiency in China. Using methods based on the data envelopment analysis (DEA) model and exploratory spatial data analysis (ESDA) and data from 1985, 1995, 2005, and 2008 of 30 provinces in China, the spatial-temporal pattern changes in industrial eco-efficiency are discussed. The results show that: first, the patterns of industrial eco-efficiency are dominated by clustering of relatively low efficiency provinces; second, spatial relationships between the industrial eco-efficiencies of different provinces changed slightly throughout the period and the provinces persistently exhibit spatial concentration of relatively low industrial eco-efficiency; finally, there is an obvious trend in the polarization of industrial eco-efficiency, i.e., the higher level spatial units are concentrated in eastern China, and the lower level spatial units are mainly in western and central China. (ESDA)展开更多
文摘A factor analysis was applied to soil geochemical data to define anomalies related to buried Pb-Zn mineralization.A favorable main factor with a strong association of the elements Zn,Cu and Pb,related to mineralization,was selected for interpretation.The median+2 MAD(median absolute deviation)method of exploratory data analysis(EDA)and C-A(concentration-area)fractal modeling were then applied to the Mahalanobis distance,as defined by Zn,Cu and Pb from the factor analysis to set the thresholds for defining multi-element anomalies.As a result,the median+2 MAD method more successfully identified the Pb-Zn mineralization than the C-A fractal model.The soil anomaly identified by the median+2 MAD method on the Mahalanobis distances defined by three principal elements(Zn,Cu and Pb)rather than thirteen elements(Co,Zn,Cu,V,Mo,Ni,Cr,Mn,Pb,Ba,Sr,Zr and Ti)was the more favorable reflection of the ore body.The identified soil geochemical anomalies were compared with the in situ economic Pb-Zn ore bodies for validation.The results showed that the median+2 MAD approach is capable of mapping both strong and weak geochemical anomalies related to buried Pb-Zn mineralization,which is therefore useful at the reconnaissance drilling stage.
文摘Identifying the subcellular localization of proteins is particularly helpful in the functional annotation of gene products. In this study, we use Machine Learning and Exploratory Data Analysis (EDA) techniques to examine and characterize amino acid sequences of human proteins localized in nine cellular compartments. A dataset of 3,749 protein sequences representing human proteins was extracted from the SWISS-PROT database. Feature vectors were created to capture specific amino acid sequence characteristics. Relative to a Support Vector Machine, a Multi-layer Perceptron, and a Naive Bayes classifier, the C4.5 Decision Tree algorithm was the most consistent performer across all nine compartments in reliably predicting the subcellular localization of proteins based on their amino acid sequences (average Precision=0.88; average Sensitivity=0.86). Furthermore, EDA graphics characterized essential features of proteins in each compartment. As examples, proteins localized to the plasma membrane had higher proportions of hydrophobic amino acids; cytoplasmic proteins had higher proportions of neutral amino acids; and mitochondrial proteins had higher proportions of neutral amino acids and lower proportions of polar amino acids. These data showed that the C4.5 classifier and EDA tools can be effective for characterizing and predicting the subcellular localization of human proteins based on their amino acid sequences.
基金supported by the Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences.
文摘The Yellow River Basin of China is a key region that contains myriad interactions between human activities and natural environment.Industrialization and urbanization promote social-economic development,but they also have generated a series of environmental and ecological issues in this basin.Previous researches have evaluated urban resilience at the national,regional,urban agglomeration,city,and prefecture levels,but not at the watershed level.To address this research gap and elevate the Yellow River Basin’s urban resilience level,we constructed an urban resilience evaluation index system from five dimensions:industrial resilience,social resilience,environmental resilience,technological resilience,and organizational resilience.The entropy weight method was used to comprehensively evaluate urban resilience in the Yellow River Basin.The exploratory spatial data analysis method was employed to study the spatiotemporal differences in urban resilience in the Yellow River Basin in 2010,2015,and 2020.Furthermore,the grey correlation analysis method was utilized to explore the influencing factors of these differences.The results of this study are as follows:(1)the overall level of urban resilience in the Yellow River Basin was relatively low but showed an increasing trend during 2010–2015,and significant spatial distribution differences were observed,with a higher resilience level in the eastern region and a low-medium resilience level in the western region;(2)the differences in urban resilience were noticeable,with industrial resilience and social resilience being relatively highly developed,whereas organizational resilience and environmental resilience were relatively weak;and(3)the correlation ranking of resilience influencing factors was as follows:science and technology level>administrative power>openness>market forces.This research can provide a basis for improving the resilience level of cities in the Yellow River Basin and contribute to the high-quality development of the region.
基金I would like to thank the National Natural Science Foundation of China(Grant No.42061041)for the funding.
文摘Urban resilience assesses a city’s ability to withstand unknown risks.Scholars are not comprehensive in assessing urban resilience,and they lack consideration of population resilience.This study investigated 110 prefecturelevel cities in the Yangtze River Economic Belt(YREB)as study areas.We calculated the YREB’s level of urban resilience based on the aspects of“economy-society-population-ecology-infrastructure”,which ensured that the comprehensive evaluation of urban resilience is complete and sufficient.The spatio-temporal evolution of urban resilience was analyzed using exploratory spatial data.Geodetectors were used to investigate the impact of several indicators,focusing on economic,social,population,ecological,and infrastructure factors,on urban resilience.The results showed that the urban resilience of the YREB has maintained a slow upward trend from 2005 to 2018,and the average urban resilience of the YREB has risen from 0.2442 to 0.2560.The resilience gap between cities in the study region increased initially and then decreased.The dominant factor in the spatial differentiation of urban resilience was the economic factors,followed by the population factors.Urban resilience has been clarified and an evaluation index system is constructed,which can provide an effective reference for the evaluation of urban resilience among countries around the world.Based on this,factors that optimize urban resilience are configured,and the regional and national sustainable development can be promoted.
文摘Exploratory data analysis plays a major role in obtaining insights from data.Over the last two decades,researchers have proposed several visual data exploration tools that can assist with each step of the analysis process.Nevertheless,in recent years,data analysis requirements have changed significantly.With constantly increasing size and types of data to be analyzed,scalability and analysis duration are now among the primary concerns of researchers.Moreover,in order to minimize the analysis cost,businesses are in need of data analysis tools that can be used with limited analytical knowledge.To address these challenges,traditional data exploration tools have evolved within the last few years.In this paper,with an in-depth analysis of an industrial tabular dataset,we identify a set of additional exploratory requirements for large datasets.Later,we present a comprehensive survey of the recent advancements in the emerging field of exploratory data analysis.We investigate 50 academic and non-academic visual data exploration tools with respect to their utility in the six fundamental steps of the exploratory data analysis process.We also examine the extent to which these modern data exploration tools fulfill the additional requirements for analyzing large datasets.Finally,we identify and present a set of research opportunities in the field of visual exploratory data analysis.
文摘A significant Geographic Information Science(GIS)issue is closely related to spatial autocorrelation,a burning question in the phase of information extraction from the statistical analysis of georeferenced data.At present,spatial autocorrelation presents two types of measures:continuous and discrete.Is it possible to use Moran’s I and the Moran scatterplot with continuous data?Is it possible to use the same methodology with discrete data?A particular and cumbersome problem is the choice of the spatial-neighborhood matrix(W)for points data.This paper addresses these issues by introducing the concept of covariogram contiguity,where each weight is based on the variogram model for that particular dataset:(1)the variogram,whose range equals the distance with the highest Moran I value,defines the weights for points separated by less than the estimated range and(2)weights equal zero for points widely separated from the variogram range considered.After the W matrix is computed,the Moran location scatterplot is created in an iterative process.In accordance with various lag distances,Moran’s I is presented as a good search factor for the optimal neighborhood area.Uncertainty/transition regions are also emphasized.At the same time,a new Exploratory Spatial Data Analysis(ESDA)tool is developed,the Moran variance scatterplot,since the conventional Moran scatterplot is not sensitive to neighbor variance.This computer-mapping framework allows the study of spatial patterns,outliers,changeover areas,and trends in an ESDA process.All these tools were implemented in a free web e-Learning program for quantitative geographers called SAKWeb#(or,in the near future,myGeooffice.org).
文摘Churn prediction is a common task for machine learning applications in business.In this paper,this task is adapted for solving problem of low efficiency of massive open online courses(only 5%of all the students finish their course).The approach is presented on course“Methods and algorithms of the graph theory”held on national platform of online education in Russia.This paper includes all the steps to build an intelligent system to predict students who are active during the course,but not likely to finish it.The first part consists of constructing the right sample for prediction,EDA and choosing the most appropriate week of the course to make predictions on.The second part is about choosing the right metric and building models.Also,approach with using ensembles like stacking is proposed to increase the accuracy of predictions.As a result,a general approach to build a churn prediction model for online course is reviewed.This approach can be used for making the process of online education adaptive and intelligent for a separate student.
基金Under the auspices of Key Research Program of Chinese Academy of Sciences(No.KZZD-EW-06-03KSZD-EW-Z-021-03)National Key Science and Technology Support Program of China(No.2008BAH31B06)
文摘Rural development inequality is an important practical issue during the course of full establishment of a ′moderately well-off society′ in modern China,and the objective understanding and evaluation of the status and regional inequality of rural development can provide scientific basis for ′building a new countryside′ and coordination development of rural-urban regions.Based on the county-level data of 2000,2005 and 2009,this paper examines the rural development inequality of Jilin Province in Northeast China by establishing a rural development index.The spatio-temporal dynamic patterns and domain factors are discussed by using the method of exploratory spatial data analysis and multi-regression model.The results are shown as follows.Firstly,most of the counties were in lower development level,which accounted for 58.3%,62.5% and 66.7% of the total counties in 2000,2005 and 2009,respectively.The characteristics of spatial inequality were very obvious at county level.For example,rural development level of Changchun Proper and the proper of seven prefecture-level cities were much higher than that of the surrounding regions.The counties in the eastern and northern Jilin Province were the lowest regions of rural development level,while the middle counties were the rapid growth areas in rural economy.Secondly,Moran′s I of rural development index(RDI) was 0.01,–0.16 and –0.06 in 2000,2005 and 2009,respectively,which indicated that spatial agglomeration of RDI was not obvious in Jilin Province,and took on the characteristic of random distribution.The counties of both the units and its adjacent units have higher development level(HH) were transferred from the western areas to the eastern areas,while the countries of both the units and its adjacent units have lower development level(LL) were diffused from the eastern to middle and western Jilin Province.Finally,the result of multi-regression analysis showed that the improvement of agricultural production condition,development of agricultural economics and the adjustment of industrial structure were the domain factors affecting rural development inequality of Jilin Province in the later ten years.
基金Under the auspices of National Natural Science Foundation of China(No.41071116)Humanity and Social ScienceFoundation of Ministry of Education(No.09YJC790225,11YJA630008)
文摘This paper principally focuses on the morphological differences,spatial pattern and regional types of rural settlements in Xuzhou City of Jiangsu Province in China.Using satellite images of Xuzhou City taken in 2007 and 2008 and models of exploratory spatial data analysis(ESDA) and spatial metrics,the paper conducts a quantitative analysis of the morphological pattern of rural settlements,and finds significant characteristics.First,rural settlements in Xuzhou City are significantly agglomerated in terms of their spatial distribution;meanwhile,there is significant variation in the geographical density distribution.Second,the scale of rural settlements in Xuzhou City is larger than the average in Jiangsu Province,and the histogram of the scale data is more even and more like a gamma distribution.There are a significant high-value cluster in the scale distribution,and local negative correlation between the scale and density distribution of rural settlements in Xuzhou City.Third,the morphology of rural settlements in Xuzhou City shows relative regularity with good connection and integrity,but the spatial variation of the morphology is anisotropic.Finally,according to the characteristics of density,scale,and form of rural settlements,the rural settlements of Xuzhou City are divided into three types:A high-density and point-scattered type,a low-density and cluster-like type and a mass-like and sparse type.The research findings could be used as the scientific foundation for rural planning and community rebuilding,particularly in less-developed areas.
基金Supported by the National Natural Science Foundation of China(Nos.41406146,41476129)the Natural Science Foundation of Shanghai Municipality(No.13ZR1419300)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(No.20123104120002)the Shanghai Universities First-Class Disciplines Project-Fisheries(A)
文摘With the increasing effects of global climate change and fishing activities,the spatial distribution of the neon flying squid(Ommastrephes bartramii) is changing in the traditional fishing ground of 150°-160°E and 38°-45°N in the northwest Pacific Ocean.This research aims to identify the spatial hot and cold spots(i.e.spatial clusters) of O.bartramii to reveal its spatial structure using commercial fishery data from2007 to 2010 collected by Chinese mainland squid-j igging fleets.A relatively strongly-clustered distribution for O.bartramii was observed using an exploratory spatial data analysis(ESDA) method.The results show two hot spots and one cold spot in 2007 while only one hot and one cold spots were identified each year from2008 to 2010.The hot and cold spots in 2007 occupied 8.2%and 5.6%of the study area,respectively;these percentages for hot and cold spot areas were 5.8%and 3.1%in 2008,10.2%and 2.9%in 2009,and 16.4%and 11.9%in 2010,respectively.Nearly half(>45%) of the squid from 2007 to 2009 reported by Chinese fleets were caught in hot spot areas while this percentage reached its peak at 68.8%in 2010,indicating that the hot spot areas are central fishing grounds.A further change analysis shows the area centered at156°E/43.5°N was persistent as a hot spot over the whole period from 2007 to 2010.Furthermore,the hot spots were mainly identified in areas with sea surface temperature(SST) in the range of 15-20℃ around warm Kuroshio Currents as well as with the chlorophyll-a(chl-a) concentration above 0.3 mg/m^3.The outcome of this research improves our understanding of spatiotemporal hotspots and its variation for O.bartramii and is useful for sustainable exploitation,assessment,and management of this squid.
基金Under the auspices of Key Research Program of Chinese Academic of Science(No.KZZD-EW-06-03,KSZD-EW-Z-021-03)Advantage Discipline Project of Hainan Normal University(No.305010048)+2 种基金Key Discipline Project of Hainan(No.3050107048)National Natural Science Foundation of China(No.41201160,41329001)Natural Science Foundation of Hainan Province(No.414189)
文摘Quality of life(QOL) is a hotspot issue that has attracted increasing attention from the Chinese Government and scholars, it is also a vital issue that should be addressed during the cause of ′establishing overall well-off society′. Northeast China is one of the most import old industrial bases in China, however, the industrial structure of heavy chemical industry and the development mode of ′production first, living last′ have leaded to series of social problems, which have also become a serious bottleneck to social stability and economic sustainable development. Through applying the methods of BP neural network, exploratory spatial data analysis(ESDA) and spatial regression model, this paper examines the space-time dynamics of QOL of the residents in Northeast China. We first investigate the indexes of QOL of the residents and then use ESDA methods to visualize its space-time relationship. We have found a spatial agglomeration of QOL of the residents in middle-southern Liaoning Province, central Jilin Province and Harbin-Qiqihar-Daqing area of Heilongjiang Province. Two third of the counties are low-low spatial correlation, and the correlative type of about 60% of the prefecture level areas keeps stable, indicating QOL of the residents in Northeast China shows a certain character of path dependence or spatial locked. We have also found that economic strength and development levels of service industry have positive and obvious effect on QOL of the residents, while the effect of such indexes as the social service level and the proportion of the tertiary industries are less.
文摘This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure.Symbolic data is usually aggregated from large data sets and used to hide entry specific details and to transform huge amounts of data(like big data)into analyzable quantities.It is also used to offer an overview in places where general trends are more important than individual details.Symbolic data comes in many forms like intervals,histograms,categories and modal multi-valued objects.Symbolic data can also be considered as a distribution.Currently,the de facto visualization approach for symbolic data is zoomstars which has many limitations.The biggest limitation is that the default distributions(histograms)are not supported in 2D as additional dimension is required.This paper proposes several new improvements for zoomstars which would enable it to visualize histograms in 2D by using a quantile or an equivalent interval approach.In addition,several improvements for categorical and modal variables are proposed for a clearer indication of presented categories.Recommendations for different approaches to zoomstars are offered depending on the data type and the desired goal.Furthermore,an alternative approach that allows visualizing the whole data set in comprehensive table-like graph,called shape encoding,is proposed.These visualizations and their usefulness are verified with three symbolic data sets in exploratory data mining phase to identify trends,similar objects and important features,detecting outliers and discrepancies in the data.
文摘The aim of this work is to describe and compare three exploratory chemometrical tools,principal components analysis,independent components analysis and common components analysis,the last one being a modification of the multi-block statistical method known as common components and specific weights analysis.The three methods were applied to a set of data to show the differences and similarities of the results obtained,highlighting their complementarity.
基金National Natural Science Foundation of China Youth Science Foundation ProjectNo.41701170+1 种基金National Natural Science Foundation of China,No.41661025,No.42071216Fundamental Research Funds for the Central Universities,No.18LZUJBWZY068。
文摘In 2007,China surpassed the USA to become the largest carbon emitter in the world.China has promised a 60%–65%reduction in carbon emissions per unit GDP by 2030,compared to the baseline of 2005.Therefore,it is important to obtain accurate dynamic information on the spatial and temporal patterns of carbon emissions and carbon footprints to support formulating effective national carbon emission reduction policies.This study attempts to build a carbon emission panel data model that simulates carbon emissions in China from 2000–2013 using nighttime lighting data and carbon emission statistics data.By applying the Exploratory Spatial-Temporal Data Analysis(ESTDA)framework,this study conducted an analysis on the spatial patterns and dynamic spatial-temporal interactions of carbon footprints from 2001–2013.The improved Tapio decoupling model was adopted to investigate the levels of coupling or decoupling between the carbon emission load and economic growth in 336 prefecture-level units.The results show that,firstly,high accuracy was achieved by the model in simulating carbon emissions.Secondly,the total carbon footprints and carbon deficits across China increased with average annual growth rates of 4.82%and 5.72%,respectively.The overall carbon footprints and carbon deficits were larger in the North than that in the South.There were extremely significant spatial autocorrelation features in the carbon footprints of prefecture-level units.Thirdly,the relative lengths of the Local Indicators of Spatial Association(LISA)time paths were longer in the North than that in the South,and they increased from the coastal to the central and western regions.Lastly,the overall decoupling index was mainly a weak decoupling type,but the number of cities with this weak decoupling continued to decrease.The unsustainable development trend of China’s economic growth and carbon emission load will continue for some time.
文摘How do people talk about COVID-19 online?To address this question,we offer an unsupervised framework that allows us to examine Twitter framings of the pandemic.Our approach employs a network-based exploration of social media data to identify,categorize,and understand communication patterns about the novel coronavirus on Twitter.The simplest structure that emerges from our analysis is the distinction between the internal/personal,external/global,and generic threat framings of the pandemic.This structure replicates in different Twitter samples and is validated using the variation of information measure,reflecting the significance and stability of our findings.Such an exploratory study is useful for understanding the contours of the natural,non-random structure in this online space.We contend that this understanding of structure is necessary to address a host of causal,supervised,and related questions downstream.
基金Key Projects of Philosophy of the Social Science funded by the Ministry of Education,No.11JD039National Key Public Bidding Project for Soft Science Research Plan,No.2012GXS1D002National Natural Science Foundation of China,No.41001083
文摘According to the connotation and structure of science and technology resources and some relevant data of more than 286 cities at prefecture level and above during 2001-2010, using modified method--Data Envelopment Analysis (DEA), science and tech- nology (S&T) resource allocation efficiency of different cities in different periods has been figured out, which, uncovers the distributional difference and change law of S&T resource allocation efficiency from the time-space dimension. Based on that, this paper has analyzed and discussed the spatial distribution pattern and evolution trend of S&T resource allocation efficiency in different cities by virtue of the Exploratory Spatial Data Analysis (ESDA). It turned out that: (1) the average of S&T resource allocation efficiency in cities at prefecture level and above has always stayed at low levels, moreover, with repeated fluctuations between high and low, which shows a decreasing trend year by year. Besides, the gap between the East and the West is widening. (2) The asymmetrical distribution of S&T resource allocation effi- ciency presents a spatial pattern of successively decreasing from Eastern China, Central China to Western China. The cities whose S&T resource allocation efficiency are at higher level and high level take on a cluster distribution, which fits well with the 23 forming urban agglomerations in China. (3) The coupling degree between S&T resource allocation efficiency and economic environment assumes a certain positive correlation, but not completely the same. The differentiation of S&T resource allocation efficiency is common in regional devel- opment, whose existence and evolution are directly or indirectly influenced by and regarded as the reflection of many elements, such as geographical location, the natural endowment and environment of S&T resources and so on. (4) In the perspective of the evolution of spatial structure, S&T resource allocation efficiency of the cities at prefecture level and above shows a notable spatial autocorrelation, which in every period presents a positive correlation. The spatial distribution of S&T resource allocation efficiency in neighboring cities seems to be similar in group, which tends to escalate stepwise. Meanwhile, the whole differentiation of geographical space has a diminishing tendency. (5) Viewed from LISA agglomeration map of S&T resource allocation efficiency in different periods, four agglomeration types have changed differently in spatial location and the range of spatial agglomeration. And the conti- nuity of S&T resource allocation efficiency in geographical space is gradually increasing.
基金National Social Science Foundation of China,No.21FSHB014National Natural Science Foundation of China,No.42001196。
文摘Influenced by globalization,rural transition in developed Western countries has experienced processes of productivism,post-productivism,and multifunctional development.By contrast,rural transition in most developing countries has been accompanied by rapid urbanization,which has become a core topic in geography research.As the world’s largest developing country,China has undergone profound development since the reform and opening-up.Moreover,rural spaces in some eastern coastal areas have entered the stage of reconstruction after decades of industrialization and urbanization.This paper takes Suzhou as the case area and measures the process of rural transition from 1990 to 2015 by constructing an index system.It then analyzes the characteristics of space-time evolution using exploratory spatial data analysis(ESDA)methods to reveal the influence of economic and social development on rural transition.The results show that rural transition,which generally entails the weakening of rurality and enhancing of urbanity on a macro scale,tends to be heterogeneous across different regions on a micro scale.This paper argues that multifunctionality will be the main future trend of rural transition in rapidly urbanizing areas.The experience in Suzhou could provide an example for establishing policies on sustainable development in rural spaces and achieving urban-rural co-governance.
基金National Key Research and Development Program of China(2019YFB1600400)National Natural Science Foundation of China(72174035)+2 种基金National Natural Science Foundation of China(71774018)Liaoning Revitalization Talents Program(XLYC2008030)Liaoning Provincial Natural Science Foundation Shipping Joint Foundation Program(2020-HYLH-20)。
文摘It is urgent and important to explore the dynamic evolution in comprehensive transportation green efficiency(CTGE)in the context of green development.We constructed a social development index that reflects the social benefits of transportation services,and incorporated it into the comprehensive transportation efficiency evaluation framework as an expected output.Based on the panel data of 30 regions in China from 2003-2018,the CTGE in China was measured using the slacks-based measure-data envelopment analysis(SBM-DEA)model.Further,the dynamic evolution trends of CTGE were determined using the spatial Markov model and exploratory spatio-temporal data analysis(ESTDA)technique from a spatio-temporal perspective.The results showed that the CTGE shows a U-shaped change trend but with an overall low level and significant regional differences.The state transition of CTGE has a strong spatial dependence,and there exists the phenomenon of“club convergence”.Neighbourhood background has a significant impact on the CTGE transition types,and the spatial spillover effect is pronounced.The CTGE has an obvious positive correlation and spatial agglomeration characteristics.The geometric characteristics of the LISA time path show that the evolution process of local spatial structure and local spatial dependence of China’s CTGE is stable,but the integration of spatial evolution is weak.The spatio-temporal transition results of LISA indicate that the CTGE has obvious transfer inertness and has certain path-dependence and spatial locking characteristics,which will become the major difficulty in improving the CTGE.
基金This work was supported by the Ministry of Environmental Production of China (No. 2110203) and the National Natural Science Foundation of China (Grant No. 41101138).
文摘The aim of this paper is to study the spatialtemporal differentiation of industrial eco-efficiency in China. Using methods based on the data envelopment analysis (DEA) model and exploratory spatial data analysis (ESDA) and data from 1985, 1995, 2005, and 2008 of 30 provinces in China, the spatial-temporal pattern changes in industrial eco-efficiency are discussed. The results show that: first, the patterns of industrial eco-efficiency are dominated by clustering of relatively low efficiency provinces; second, spatial relationships between the industrial eco-efficiencies of different provinces changed slightly throughout the period and the provinces persistently exhibit spatial concentration of relatively low industrial eco-efficiency; finally, there is an obvious trend in the polarization of industrial eco-efficiency, i.e., the higher level spatial units are concentrated in eastern China, and the lower level spatial units are mainly in western and central China. (ESDA)