The growth of geo-technologies and the development of methods for spatial data collection have resulted in large spatial data repositories that require techniques for spatial information extraction, in order to transf...The growth of geo-technologies and the development of methods for spatial data collection have resulted in large spatial data repositories that require techniques for spatial information extraction, in order to transform raw data into useful previously unknown information. However, due to the high complexity of spatial data mining, the need for spatial relationship comprehension and its characteristics, efforts have been directed towards improving algorithms in order to provide an increase of performance and quality of results. Likewise, several issues have been addressed to spatial data mining, including environmental management, which is the focus of this paper. The main original contribution of this work is the demonstration of spatial data mining using a novel algorithm with a multi-relational approach that was applied to a database related to water resource from a certain region of S^o Paulo State, Brazil, and the discussion about obtained results. Some characteristics involving the location of water resources and the profile of who is administering the water exploration were discovered and discussed.展开更多
The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates ...The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.展开更多
Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results conta...Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.展开更多
The authors designed the spatial data mining system for ore-forming prediction based on the theory and methods of data mining as well as the technique of spatial database,in combination with the characteristics of geo...The authors designed the spatial data mining system for ore-forming prediction based on the theory and methods of data mining as well as the technique of spatial database,in combination with the characteristics of geological information data.The system consists of data management,data mining and knowledge discovery,knowledge representation.It can syncretize multi-source geosciences data effectively,such as geology,geochemistry,geophysics,RS.The system digitized geological information data as data layer files which consist of the two numerical values,to store these files in the system database.According to the combination of the characters of geological information,metallogenic prognosis was realized,as an example from some area in Heilongjiang Province.The prospect area of hydrothermal copper deposit was determined.展开更多
In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships b...In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships between the attributes and the tuples, and most of the associations occur between the tuples, such as adjacent, intersection, overlap and other topological relationships. So the tasks of spatial data association rules mining include mining the relationships between attributes of spatial objects, which are called as vertical direction DM, and the relationships between the tuples, which are called as horizontal direction DM. This paper analyzes the storage models of spatial data, uses for reference the technologies of data mining in transaction DB, defines the spatial data association rule, including vertical direction association rule, horizontal direction association rule and two-direction association rule, discusses the measurement of spatial association rule interestingness, and puts forward the work flows of spatial association rule data mining. During two-direction spatial association rules mining, an algorithm is proposed to get non-spatial itemsets. By virtue of spatial analysis, the spatial relations were transferred into non-spatial associations and the non-spatial itemsets were gotten. Based on the non-spatial itemsets, the Apriori algorithm or other algorithms could be used to get the frequent itemsets and then the spatial association rules come into being. Using spatial DB, the spatial association rules were gotten to validate the algorithm, and the test results show that this algorithm is efficient and can mine the interesting spatial rules.展开更多
Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain a...Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which containeddata on physical environment (land cover, river, road and city center), socio-economic (income source, population, and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors.The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1m/s, 2m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
With the deployment of modern infrastructure for public transportation, several studies have analyzed movement patterns of people using smart card data and have characterized different areas. In this paper, we propose...With the deployment of modern infrastructure for public transportation, several studies have analyzed movement patterns of people using smart card data and have characterized different areas. In this paper, we propose the “movement purpose hypothesis” that each movement occurs from two causes: where the person is and what the person wants to do at a given moment. We formulate this hypothesis to a synthesis model in which two network graphs generate a movement network graph. Then we develop two novel-embedding models to assess the hypothesis, and demonstrate that the models obtain a vector representation of a geospatial area using movement patterns of people from large-scale smart card data. We conducted an experiment using smart card data for a large network of railroads in the Kansai region of Japan. We obtained a vector representation of each railroad station and each purpose using the developed embedding models. Results show that network embedding methods are suitable for a large-scale movement of data, and the developed models perform better than existing embedding methods in the task of multi-label classification for train stations on the purpose of use data set. Our proposed models can contribute to the prediction of people flows by discovering underlying representations of geospatial areas from mobility data.展开更多
By using multi-source and multi-temporal high resolution remote sensing data and related techniques of remote sensing and geographic information systems, this paper analyzes the spatial and temporal changes of land oc...By using multi-source and multi-temporal high resolution remote sensing data and related techniques of remote sensing and geographic information systems, this paper analyzes the spatial and temporal changes of land occupation caused by mine development in four mining areas of eastern Hubei Province from 2011 to 2014, including Chengchao-Tieshan iron-copper polymetallic deposit area, Daye-Yangxin iron-copper polymetallic deposit area, E-Nan mining area, and Wuxue-Yangxin non-metallic mining area along the Yangtze River. The results show that: In the research area, land occupation of energy mine exploitation is small and in scattered distribution, with coal mine occupying the largest area, showing a downward trend in four years; land occupation of metal mines is large and in centralized distribution, with iron mine and copper mine occupying the largest area, showing a downward trend in four years; non-metallic mines are large and in great quantity, with mines of limestone for building and limestone occupying the largest area, showing a upward trend in four years.展开更多
This paper summarizes a few spatial statistical analysis methods for to measuring spatial autocorrelation and spatial association, discusses the criteria for the identification of spatial association by the use of glo...This paper summarizes a few spatial statistical analysis methods for to measuring spatial autocorrelation and spatial association, discusses the criteria for the identification of spatial association by the use of global Moran Coefficient, Local Moran and Local Geary. Furthermore, a user-friendly statistical module, combining spatial statistical analysis methods with GIS visual techniques, is developed in Arcview using Avenue. An example is also given to show the usefulness of this module in identifying and quantifying the underlying spatial association patterns between economic units.展开更多
Spatial autocorrelation methodologies, including Global Moran’s I and Local Indicators of Spatial Association statistic (LISA), were used to describe and map spatial clusters of 13 leading malignant neoplasms in Taiw...Spatial autocorrelation methodologies, including Global Moran’s I and Local Indicators of Spatial Association statistic (LISA), were used to describe and map spatial clusters of 13 leading malignant neoplasms in Taiwan. A logistic regression fit model was also used to identify similar characteristics over time. Two time periods (1995-1998 and 2005-2008) were compared in an attempt to formulate common spatio-temporal risks. Spatial cluster patterns were identified using local spatial autocorrelation analysis. We found a significant spatio-temporal variation between the leading malignant neoplasms and well-documented spatial risk factors. For instance, in Taiwan, cancer of the oral cavity in males was found to be clustered in locations in central Taiwan, with distinct differences between the two time periods. Stomach cancer morbidity clustered in aboriginal townships, where the prevalence of Helicobacter pylori is high and even quite marked differences between the two time periods were found. A method which combines LISA statistics and logistic regression is an effective tool for the detection of space-time patterns with discontinuous data. Spatio-temporal mapping comparison helps to clarify issues such as the spatial aspects of both two time periods for leading malignant neoplasms. This helps planners to assess spatio-temporal risk factors, and to ascertain what would be the most advantageous types of health care policies for the planning and implementation of health care services. These issues can greatly affect the performance and effectiveness of health care services and also provide a clear outline for helping us to better understand the results in depth.展开更多
Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques which is the discove...Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques which is the discovery of interesting rela-tionships and characteristics that may exist implicitly in spatial databases. So far, a lot of spatial clustering algorithms have been proposed in many applications such as pattern recognition, data analysis, and image processing and so forth. However most of the well-known clustering algorithms have some drawbacks which will be presented later when ap-plied in large spatial databases. To overcome these limitations, in this paper we propose a robust spatial clustering algorithm named NSCABDT (Novel Spatial Clustering Algorithm Based on Delaunay Triangulation). Delaunay dia-gram is used for determining neighborhoods based on the neighborhood notion, spatial association rules and colloca-tions being defined. NSCABDT demonstrates several important advantages over the previous works. Firstly, it even discovers arbitrary shape of cluster distribution. Secondly, in order to execute NSCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that NSCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.展开更多
Spatial downscaling methods are widely used for the production of bioclimatic variables(e.g. temperature and precipitation) in studies related to species ecological niche and drainage basin management and planning. Th...Spatial downscaling methods are widely used for the production of bioclimatic variables(e.g. temperature and precipitation) in studies related to species ecological niche and drainage basin management and planning. This study applied three different statistical methods, i.e. the moving window regression(MWR), nonparametric multiplicative regression(NPMR), and generalized linear model(GLM), to downscale the annual mean temperature(Bio1) and annual precipitation(Bio12) in central Iran from coarse scale(1 km × 1 km) to fine scale(250 m ×250 m). Elevation, aspect, distance from sea and normalized difference vegetation index(NDVI) were used as covariates to create downscaled bioclimatic variables. Model assessment was performed by comparing model outcomes with observational data from weather stations. Coefficients of determination(R2), bias, and root-mean-square error(RMSE) were used to evaluate models and covariates. The elevation could effectively justify the changes in bioclimatic factors related to temperature and precipitation. Allthree models could downscale the mean annual temperature data with similar R2, RMSE, and bias values. The MWR had the best performance and highest accuracy in downscaling annual precipitation(R2=0.70; RMSE=123.44). In general, the two nonparametric models, i.e. MWR and NPMR, can be reliably used for the downscaling of bioclimatic variables which have wide applications in species distribution modeling.展开更多
Relative to hospitalized patient information, outpatient admission information is relatively simple. It only includes the patient admission time, place of residence and other information. Traditionally, the excavation...Relative to hospitalized patient information, outpatient admission information is relatively simple. It only includes the patient admission time, place of residence and other information. Traditionally, the excavation of this information is not sufficient. However, when a large number of patients admitted time and residence information combined to consider, and add some data mining technology, some of the previously ignored regular information is likely to be found. Using 5 years of data mining research and admission data from a paediatric department at a large women’s and children’s hospital in China, we found important fluctuation rules regarding admissions using wavelet analysis on hospital admission data among different scales of cyclical fluctuations. Method: Seasonal distribution of patient number was analysed based on Haar wavelet transformation, and level 3 and level 2 of wavelets were extracted out to fit the data. The distribution function of hospitalized patients was visualized by kernel density estimation. Using linear regression and ARIMA (autoregressive integrated moving average model) predict the seasonally number of patients in the future. Results: The data analysis demonstrates the total surge of inpatients was decomposed into one mother wavelet and five small wavelets, each of which represents different time frequency. Besides, as distance from hospital increases, the number of patients decreased exponentially. The seasonal factors are the largest time factor influencing the number changes of patients. Conclusion: By wavelet analysis and the improved prediction model, we could make forecast on the future inpatient number trend and prove factors such as geographic position is influential on inpatient amount. Additionally, the concept of data mining based on spatial distribution and spectral analysis could be applied to other aspects of social management.展开更多
The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clus...The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clustering algorithm is the distribution of data amongst the cluster nodes.This step governs the methodology and performance of the entire algorithm.Researchers typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the algorithm.However,these strategies are generic and are not tailor-made for any specific parallel clustering algorithm.In this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they employ.We also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution strategy.All of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load balancing.The experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.展开更多
Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clu...Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clusters at the same time.Many scientific communities have used the clustering algorithm from the perspective of density,which is one of the best methods in clustering.This study proposes a density-based spatial clustering of applications with noise(DBSCAN)algorithm based on the selected high-density areas by automatic fuzzy-DBSCAN(AFD)which works with the initialization of two parameters.AFD,by using fuzzy and DBSCAN features,is modeled by the selection of high-density areas and generates two parameters for merging and separating automatically.The two generated parameters provide a state of sub-cluster rules in the Cartesian coordinate system for the dataset.The model overcomes the problems of clustering such as morphology,overlapping,and the number of clusters in a dataset simultaneously.In the experiments,all algorithms are performed on eight data sets with 30 times of running.Three of them are related to overlapping real datasets and the rest are morphologic and synthetic datasets.It is demonstrated that the AFD algorithm outperforms other recently developed clustering algorithms.展开更多
文摘The growth of geo-technologies and the development of methods for spatial data collection have resulted in large spatial data repositories that require techniques for spatial information extraction, in order to transform raw data into useful previously unknown information. However, due to the high complexity of spatial data mining, the need for spatial relationship comprehension and its characteristics, efforts have been directed towards improving algorithms in order to provide an increase of performance and quality of results. Likewise, several issues have been addressed to spatial data mining, including environmental management, which is the focus of this paper. The main original contribution of this work is the demonstration of spatial data mining using a novel algorithm with a multi-relational approach that was applied to a database related to water resource from a certain region of S^o Paulo State, Brazil, and the discussion about obtained results. Some characteristics involving the location of water resources and the profile of who is administering the water exploration were discovered and discussed.
文摘The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.
基金Under the auspices of Special Fund of Ministry of Land and Resources of China in Public Interest(No.201511001)
文摘Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.
文摘The authors designed the spatial data mining system for ore-forming prediction based on the theory and methods of data mining as well as the technique of spatial database,in combination with the characteristics of geological information data.The system consists of data management,data mining and knowledge discovery,knowledge representation.It can syncretize multi-source geosciences data effectively,such as geology,geochemistry,geophysics,RS.The system digitized geological information data as data layer files which consist of the two numerical values,to store these files in the system database.According to the combination of the characters of geological information,metallogenic prognosis was realized,as an example from some area in Heilongjiang Province.The prospect area of hydrothermal copper deposit was determined.
基金The work is supported by Natural Science Foundatiion of Chongqing (No .CSTC 2005BB2065)
文摘In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships between the attributes and the tuples, and most of the associations occur between the tuples, such as adjacent, intersection, overlap and other topological relationships. So the tasks of spatial data association rules mining include mining the relationships between attributes of spatial objects, which are called as vertical direction DM, and the relationships between the tuples, which are called as horizontal direction DM. This paper analyzes the storage models of spatial data, uses for reference the technologies of data mining in transaction DB, defines the spatial data association rule, including vertical direction association rule, horizontal direction association rule and two-direction association rule, discusses the measurement of spatial association rule interestingness, and puts forward the work flows of spatial association rule data mining. During two-direction spatial association rules mining, an algorithm is proposed to get non-spatial itemsets. By virtue of spatial analysis, the spatial relations were transferred into non-spatial associations and the non-spatial itemsets were gotten. Based on the non-spatial itemsets, the Apriori algorithm or other algorithms could be used to get the frequent itemsets and then the spatial association rules come into being. Using spatial DB, the spatial association rules were gotten to validate the algorithm, and the test results show that this algorithm is efficient and can mine the interesting spatial rules.
文摘Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which containeddata on physical environment (land cover, river, road and city center), socio-economic (income source, population, and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors.The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1m/s, 2m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
文摘With the deployment of modern infrastructure for public transportation, several studies have analyzed movement patterns of people using smart card data and have characterized different areas. In this paper, we propose the “movement purpose hypothesis” that each movement occurs from two causes: where the person is and what the person wants to do at a given moment. We formulate this hypothesis to a synthesis model in which two network graphs generate a movement network graph. Then we develop two novel-embedding models to assess the hypothesis, and demonstrate that the models obtain a vector representation of a geospatial area using movement patterns of people from large-scale smart card data. We conducted an experiment using smart card data for a large network of railroads in the Kansai region of Japan. We obtained a vector representation of each railroad station and each purpose using the developed embedding models. Results show that network embedding methods are suitable for a large-scale movement of data, and the developed models perform better than existing embedding methods in the task of multi-label classification for train stations on the purpose of use data set. Our proposed models can contribute to the prediction of people flows by discovering underlying representations of geospatial areas from mobility data.
文摘By using multi-source and multi-temporal high resolution remote sensing data and related techniques of remote sensing and geographic information systems, this paper analyzes the spatial and temporal changes of land occupation caused by mine development in four mining areas of eastern Hubei Province from 2011 to 2014, including Chengchao-Tieshan iron-copper polymetallic deposit area, Daye-Yangxin iron-copper polymetallic deposit area, E-Nan mining area, and Wuxue-Yangxin non-metallic mining area along the Yangtze River. The results show that: In the research area, land occupation of energy mine exploitation is small and in scattered distribution, with coal mine occupying the largest area, showing a downward trend in four years; land occupation of metal mines is large and in centralized distribution, with iron mine and copper mine occupying the largest area, showing a downward trend in four years; non-metallic mines are large and in great quantity, with mines of limestone for building and limestone occupying the largest area, showing a upward trend in four years.
文摘This paper summarizes a few spatial statistical analysis methods for to measuring spatial autocorrelation and spatial association, discusses the criteria for the identification of spatial association by the use of global Moran Coefficient, Local Moran and Local Geary. Furthermore, a user-friendly statistical module, combining spatial statistical analysis methods with GIS visual techniques, is developed in Arcview using Avenue. An example is also given to show the usefulness of this module in identifying and quantifying the underlying spatial association patterns between economic units.
文摘Spatial autocorrelation methodologies, including Global Moran’s I and Local Indicators of Spatial Association statistic (LISA), were used to describe and map spatial clusters of 13 leading malignant neoplasms in Taiwan. A logistic regression fit model was also used to identify similar characteristics over time. Two time periods (1995-1998 and 2005-2008) were compared in an attempt to formulate common spatio-temporal risks. Spatial cluster patterns were identified using local spatial autocorrelation analysis. We found a significant spatio-temporal variation between the leading malignant neoplasms and well-documented spatial risk factors. For instance, in Taiwan, cancer of the oral cavity in males was found to be clustered in locations in central Taiwan, with distinct differences between the two time periods. Stomach cancer morbidity clustered in aboriginal townships, where the prevalence of Helicobacter pylori is high and even quite marked differences between the two time periods were found. A method which combines LISA statistics and logistic regression is an effective tool for the detection of space-time patterns with discontinuous data. Spatio-temporal mapping comparison helps to clarify issues such as the spatial aspects of both two time periods for leading malignant neoplasms. This helps planners to assess spatio-temporal risk factors, and to ascertain what would be the most advantageous types of health care policies for the planning and implementation of health care services. These issues can greatly affect the performance and effectiveness of health care services and also provide a clear outline for helping us to better understand the results in depth.
文摘Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques which is the discovery of interesting rela-tionships and characteristics that may exist implicitly in spatial databases. So far, a lot of spatial clustering algorithms have been proposed in many applications such as pattern recognition, data analysis, and image processing and so forth. However most of the well-known clustering algorithms have some drawbacks which will be presented later when ap-plied in large spatial databases. To overcome these limitations, in this paper we propose a robust spatial clustering algorithm named NSCABDT (Novel Spatial Clustering Algorithm Based on Delaunay Triangulation). Delaunay dia-gram is used for determining neighborhoods based on the neighborhood notion, spatial association rules and colloca-tions being defined. NSCABDT demonstrates several important advantages over the previous works. Firstly, it even discovers arbitrary shape of cluster distribution. Secondly, in order to execute NSCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that NSCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.
文摘Spatial downscaling methods are widely used for the production of bioclimatic variables(e.g. temperature and precipitation) in studies related to species ecological niche and drainage basin management and planning. This study applied three different statistical methods, i.e. the moving window regression(MWR), nonparametric multiplicative regression(NPMR), and generalized linear model(GLM), to downscale the annual mean temperature(Bio1) and annual precipitation(Bio12) in central Iran from coarse scale(1 km × 1 km) to fine scale(250 m ×250 m). Elevation, aspect, distance from sea and normalized difference vegetation index(NDVI) were used as covariates to create downscaled bioclimatic variables. Model assessment was performed by comparing model outcomes with observational data from weather stations. Coefficients of determination(R2), bias, and root-mean-square error(RMSE) were used to evaluate models and covariates. The elevation could effectively justify the changes in bioclimatic factors related to temperature and precipitation. Allthree models could downscale the mean annual temperature data with similar R2, RMSE, and bias values. The MWR had the best performance and highest accuracy in downscaling annual precipitation(R2=0.70; RMSE=123.44). In general, the two nonparametric models, i.e. MWR and NPMR, can be reliably used for the downscaling of bioclimatic variables which have wide applications in species distribution modeling.
文摘Relative to hospitalized patient information, outpatient admission information is relatively simple. It only includes the patient admission time, place of residence and other information. Traditionally, the excavation of this information is not sufficient. However, when a large number of patients admitted time and residence information combined to consider, and add some data mining technology, some of the previously ignored regular information is likely to be found. Using 5 years of data mining research and admission data from a paediatric department at a large women’s and children’s hospital in China, we found important fluctuation rules regarding admissions using wavelet analysis on hospital admission data among different scales of cyclical fluctuations. Method: Seasonal distribution of patient number was analysed based on Haar wavelet transformation, and level 3 and level 2 of wavelets were extracted out to fit the data. The distribution function of hospitalized patients was visualized by kernel density estimation. Using linear regression and ARIMA (autoregressive integrated moving average model) predict the seasonally number of patients in the future. Results: The data analysis demonstrates the total surge of inpatients was decomposed into one mother wavelet and five small wavelets, each of which represents different time frequency. Besides, as distance from hospital increases, the number of patients decreased exponentially. The seasonal factors are the largest time factor influencing the number changes of patients. Conclusion: By wavelet analysis and the improved prediction model, we could make forecast on the future inpatient number trend and prove factors such as geographic position is influential on inpatient amount. Additionally, the concept of data mining based on spatial distribution and spectral analysis could be applied to other aspects of social management.
文摘The liquid crystal television spatial light modulator (LCTVSLM) characterized is usable in optical processing applications,e.g.,optical pattern recognition,associative memory, optical computing,correlation detection and optical data processing systems.The array performance and real-time optical correlation applications are reviewed.
文摘The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clustering algorithm is the distribution of data amongst the cluster nodes.This step governs the methodology and performance of the entire algorithm.Researchers typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the algorithm.However,these strategies are generic and are not tailor-made for any specific parallel clustering algorithm.In this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they employ.We also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution strategy.All of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load balancing.The experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.
文摘Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clusters at the same time.Many scientific communities have used the clustering algorithm from the perspective of density,which is one of the best methods in clustering.This study proposes a density-based spatial clustering of applications with noise(DBSCAN)algorithm based on the selected high-density areas by automatic fuzzy-DBSCAN(AFD)which works with the initialization of two parameters.AFD,by using fuzzy and DBSCAN features,is modeled by the selection of high-density areas and generates two parameters for merging and separating automatically.The two generated parameters provide a state of sub-cluster rules in the Cartesian coordinate system for the dataset.The model overcomes the problems of clustering such as morphology,overlapping,and the number of clusters in a dataset simultaneously.In the experiments,all algorithms are performed on eight data sets with 30 times of running.Three of them are related to overlapping real datasets and the rest are morphologic and synthetic datasets.It is demonstrated that the AFD algorithm outperforms other recently developed clustering algorithms.