The growth of geo-technologies and the development of methods for spatial data collection have resulted in large spatial data repositories that require techniques for spatial information extraction, in order to transf...The growth of geo-technologies and the development of methods for spatial data collection have resulted in large spatial data repositories that require techniques for spatial information extraction, in order to transform raw data into useful previously unknown information. However, due to the high complexity of spatial data mining, the need for spatial relationship comprehension and its characteristics, efforts have been directed towards improving algorithms in order to provide an increase of performance and quality of results. Likewise, several issues have been addressed to spatial data mining, including environmental management, which is the focus of this paper. The main original contribution of this work is the demonstration of spatial data mining using a novel algorithm with a multi-relational approach that was applied to a database related to water resource from a certain region of S^o Paulo State, Brazil, and the discussion about obtained results. Some characteristics involving the location of water resources and the profile of who is administering the water exploration were discovered and discussed.展开更多
The authors designed the spatial data mining system for ore-forming prediction based on the theory and methods of data mining as well as the technique of spatial database,in combination with the characteristics of geo...The authors designed the spatial data mining system for ore-forming prediction based on the theory and methods of data mining as well as the technique of spatial database,in combination with the characteristics of geological information data.The system consists of data management,data mining and knowledge discovery,knowledge representation.It can syncretize multi-source geosciences data effectively,such as geology,geochemistry,geophysics,RS.The system digitized geological information data as data layer files which consist of the two numerical values,to store these files in the system database.According to the combination of the characters of geological information,metallogenic prognosis was realized,as an example from some area in Heilongjiang Province.The prospect area of hydrothermal copper deposit was determined.展开更多
The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates ...The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.展开更多
Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results conta...Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.展开更多
The integration of remote sensing (RS) with geographical information system (GIS) is a hotspot in geographical information science.A good database structure is important to the integration of RS with GIS,which should ...The integration of remote sensing (RS) with geographical information system (GIS) is a hotspot in geographical information science.A good database structure is important to the integration of RS with GIS,which should be beneficial to the complete integration of RS with GIS,able to deal with the disagreement between the resolution of remote sensing images and the precision of GIS data,and also helpful to the knowledge discovery and exploitation.In this paper,the database structure storing the spatial data based on semantic network is presented.This database structure has several advantages.Firstly,the spatial data is stored as raster data with space index,so the image processing can be done directly on the GIS data that is stored hierarchically according to the distinguishing precision.Secondly,the simple objects are aggregated into complex ones.Thirdly,because we use the indexing tree to depict the relationship of aggregation and the indexing pictures expressed by 2_D strings to describe the topology structure of the objects,the concepts of surrounding and region are expressed clearly and the semantic content of the landscape can be illustrated well.All the factors that affect the recognition of the objects are depicted in the factor space,which provides a uniform mathematical frame for the fusion of the semantic and non_semantic information.Lastly,the object node,knowledge node and the indexing node are integrated into one node.This feature enhances the ability of system in knowledge expressing,intelligent inference and association.The application shows that this database structure can benefit the interpretation of remote sensing image with the information of GIS.展开更多
The advanced data mining technologies and the large quantities of remotely sensed Imagery provide a data mining opportunity with high potential for useful results. Extracting interesting patterns and rules from data s...The advanced data mining technologies and the large quantities of remotely sensed Imagery provide a data mining opportunity with high potential for useful results. Extracting interesting patterns and rules from data sets composed of images and associated ground data can be of importance in object identification, community planning, resource discovery and other areas. In this paper, a data field is presented to express the observed spatial objects and conduct behavior mining on them. First, most of the important aspects are discussed on behavior mining and its implications for the future of data mining. Furthermore, an ideal framework of the behavior mining system is proposed in the network environment. Second, the model of behavior mining is given on the observed spatial objects, including the objects described by the first feature data field and the main feature data field by means of the potential function. Finally, a case study about object identification in public is given and analyzed. The experimental results show that the new model is feasible in behavior mining.展开更多
Disaster weather forecasting is becoming increasingly important. In this paper, the trajectories of Mesoscale Convective Systems (MCSs) were automatically tracked over the Chinese Tibetan Plateau using Geostationary...Disaster weather forecasting is becoming increasingly important. In this paper, the trajectories of Mesoscale Convective Systems (MCSs) were automatically tracked over the Chinese Tibetan Plateau using Geostationary Meteorological Satellite (GMS) brightness temperature (Tbb) from June to August 1998, and the MCSs are classified according to their movement direction. Based on these, spatial data mining methods are used to study the relationships between MCSs trajectories and their environmental physical field values. Results indicate that at 400hPa level, the trajectories of MCSs moving across the 105°E boundary are less influenced by water vapor flux divergence, vertical wind velocity, reIative humidity and K index. In addition, if the gravity central longitude locations of MCSs are between 104°E and 105°E, then geopotential height and wind divergence are two main factors in movement causation. On the other hand, at 500hPa level, the trajectories of MCSs in a north-east direction are mainly influenced by K index and water vapor flux divergence when their central locations are less than 104°E. However, the MCSs moving in an east and south-east direction are influenced by a few correlation factors at this level.展开更多
Smart refueling can reduce costs and lower the possibility of an emergency. Refueling intelligence can only be obtained by mining historical refueling behaviors from big data, however, without devices, such as fuel ta...Smart refueling can reduce costs and lower the possibility of an emergency. Refueling intelligence can only be obtained by mining historical refueling behaviors from big data, however, without devices, such as fuel tank cursors, and cooperation from drivers, these behaviors are hard to detect. Thus, detecting refueling behaviors from big dala derived from easy-to-approach trajectories is one of/he most efficient retrieve evidences for research of refueling behaviors. In this paper, we describe a complete procecdure for detecting refoeling behavior in big data derived from freight trajectories. This procedure involves the inte- gration of spatial data mining and machine-learning techniques. The key pall of the methodology is a pattern detector that extends the naive Bayes classifier. By draw'ing on the spatial and temporal characteristics of freight trajectories, refileling behaviors can be identified with high accuracy. Fu,lher, we present a refueling prediction and recommendation system to show how our refueling detector can be used practically in big data. Our experimetlts on real trajeclories show that our refueling detector is accurate, and the system performs well.展开更多
The development of global informatization and its integration with industrialization symbolizes that human society has entered into the big data era.This article covers seven new characteristics of Geomatics(i.e.ubiqu...The development of global informatization and its integration with industrialization symbolizes that human society has entered into the big data era.This article covers seven new characteristics of Geomatics(i.e.ubiquitous sensor,multi-dimensional dynamics,integration via networking,full automation in real time,from sensing to recognition,crowdsourcing and volunteered geographic information,and serviceoriented science),and puts forward the corresponding critical technical challenges in the construction of integrated space-air-ground geospatial networks.Through the discussions outlined in this paper,we propose a new development stage of Geomatics entitled‘Connected Geomatics,’which is defined as a multi-disciplinary science and technology that uses systematic approaches and integrates methods of spatio-temporal data acquisition,information extraction,network management,knowledge discovery,and spatial sensing and recognition,as well as intelligent location-based services pertaining to any physical objects and human activities on the earth.It is envisioned that the advancement of Geomatics will make a great contribution to human sustainable development.展开更多
A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their constructio...A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their construction.Of particular debate are the choice and specification of input variables,with the objective of identifying inputs that add value but also aim for model parsimony.Within this context,our paper introduces a principal component analysis(PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables.The proposed methodology is exemplified in the context of variables from the UK 2011 Census,and its output compared to the Office for National Statistics 2011 Output Area Classification(2011 OAC).Through the implementation of the proposed methodology,the quality of the cluster assignment was improved relative to 2011 OAC,manifested by a lower total withincluster sum of square score.Across the UK,more than 70.2%of the Output Areas(OAs)occupied by the newly created classification(i.e.AVS-OAC)outperform the 2011 OAC,with particularly strong performance within Scotland and Wales.展开更多
文摘The growth of geo-technologies and the development of methods for spatial data collection have resulted in large spatial data repositories that require techniques for spatial information extraction, in order to transform raw data into useful previously unknown information. However, due to the high complexity of spatial data mining, the need for spatial relationship comprehension and its characteristics, efforts have been directed towards improving algorithms in order to provide an increase of performance and quality of results. Likewise, several issues have been addressed to spatial data mining, including environmental management, which is the focus of this paper. The main original contribution of this work is the demonstration of spatial data mining using a novel algorithm with a multi-relational approach that was applied to a database related to water resource from a certain region of S^o Paulo State, Brazil, and the discussion about obtained results. Some characteristics involving the location of water resources and the profile of who is administering the water exploration were discovered and discussed.
文摘The authors designed the spatial data mining system for ore-forming prediction based on the theory and methods of data mining as well as the technique of spatial database,in combination with the characteristics of geological information data.The system consists of data management,data mining and knowledge discovery,knowledge representation.It can syncretize multi-source geosciences data effectively,such as geology,geochemistry,geophysics,RS.The system digitized geological information data as data layer files which consist of the two numerical values,to store these files in the system database.According to the combination of the characters of geological information,metallogenic prognosis was realized,as an example from some area in Heilongjiang Province.The prospect area of hydrothermal copper deposit was determined.
文摘The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.
基金Under the auspices of Special Fund of Ministry of Land and Resources of China in Public Interest(No.201511001)
文摘Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.
文摘The integration of remote sensing (RS) with geographical information system (GIS) is a hotspot in geographical information science.A good database structure is important to the integration of RS with GIS,which should be beneficial to the complete integration of RS with GIS,able to deal with the disagreement between the resolution of remote sensing images and the precision of GIS data,and also helpful to the knowledge discovery and exploitation.In this paper,the database structure storing the spatial data based on semantic network is presented.This database structure has several advantages.Firstly,the spatial data is stored as raster data with space index,so the image processing can be done directly on the GIS data that is stored hierarchically according to the distinguishing precision.Secondly,the simple objects are aggregated into complex ones.Thirdly,because we use the indexing tree to depict the relationship of aggregation and the indexing pictures expressed by 2_D strings to describe the topology structure of the objects,the concepts of surrounding and region are expressed clearly and the semantic content of the landscape can be illustrated well.All the factors that affect the recognition of the objects are depicted in the factor space,which provides a uniform mathematical frame for the fusion of the semantic and non_semantic information.Lastly,the object node,knowledge node and the indexing node are integrated into one node.This feature enhances the ability of system in knowledge expressing,intelligent inference and association.The application shows that this database structure can benefit the interpretation of remote sensing image with the information of GIS.
基金Supported by the National 973 Program of China(No.2006CB701305,No.2007CB310804)the National Natural Science Fundation of China(No.60743001)+1 种基金the Best National Thesis Fundation (No.2005047)the National New Century Excellent Talent Fundation (No.NCET-06-0618)
文摘The advanced data mining technologies and the large quantities of remotely sensed Imagery provide a data mining opportunity with high potential for useful results. Extracting interesting patterns and rules from data sets composed of images and associated ground data can be of importance in object identification, community planning, resource discovery and other areas. In this paper, a data field is presented to express the observed spatial objects and conduct behavior mining on them. First, most of the important aspects are discussed on behavior mining and its implications for the future of data mining. Furthermore, an ideal framework of the behavior mining system is proposed in the network environment. Second, the model of behavior mining is given on the observed spatial objects, including the objects described by the first feature data field and the main feature data field by means of the potential function. Finally, a case study about object identification in public is given and analyzed. The experimental results show that the new model is feasible in behavior mining.
文摘Disaster weather forecasting is becoming increasingly important. In this paper, the trajectories of Mesoscale Convective Systems (MCSs) were automatically tracked over the Chinese Tibetan Plateau using Geostationary Meteorological Satellite (GMS) brightness temperature (Tbb) from June to August 1998, and the MCSs are classified according to their movement direction. Based on these, spatial data mining methods are used to study the relationships between MCSs trajectories and their environmental physical field values. Results indicate that at 400hPa level, the trajectories of MCSs moving across the 105°E boundary are less influenced by water vapor flux divergence, vertical wind velocity, reIative humidity and K index. In addition, if the gravity central longitude locations of MCSs are between 104°E and 105°E, then geopotential height and wind divergence are two main factors in movement causation. On the other hand, at 500hPa level, the trajectories of MCSs in a north-east direction are mainly influenced by K index and water vapor flux divergence when their central locations are less than 104°E. However, the MCSs moving in an east and south-east direction are influenced by a few correlation factors at this level.
基金supported by a grant from the Science Technology and Innovation Committee of Shenzhen Municipality
文摘Smart refueling can reduce costs and lower the possibility of an emergency. Refueling intelligence can only be obtained by mining historical refueling behaviors from big data, however, without devices, such as fuel tank cursors, and cooperation from drivers, these behaviors are hard to detect. Thus, detecting refueling behaviors from big dala derived from easy-to-approach trajectories is one of/he most efficient retrieve evidences for research of refueling behaviors. In this paper, we describe a complete procecdure for detecting refoeling behavior in big data derived from freight trajectories. This procedure involves the inte- gration of spatial data mining and machine-learning techniques. The key pall of the methodology is a pattern detector that extends the naive Bayes classifier. By draw'ing on the spatial and temporal characteristics of freight trajectories, refileling behaviors can be identified with high accuracy. Fu,lher, we present a refueling prediction and recommendation system to show how our refueling detector can be used practically in big data. Our experimetlts on real trajeclories show that our refueling detector is accurate, and the system performs well.
基金supported by the National Natural Science Foundation of China(NSFC)[grant numbers 41501383,91438203]China Postdoctoral Science Foundation[grant number 2014M562006]+1 种基金Natural Science Foundation of Hubei Province[grant number 2015CFB330]Fundamental Research Funds for the Central Universities[grant number 2042016kf0163].
文摘The development of global informatization and its integration with industrialization symbolizes that human society has entered into the big data era.This article covers seven new characteristics of Geomatics(i.e.ubiquitous sensor,multi-dimensional dynamics,integration via networking,full automation in real time,from sensing to recognition,crowdsourcing and volunteered geographic information,and serviceoriented science),and puts forward the corresponding critical technical challenges in the construction of integrated space-air-ground geospatial networks.Through the discussions outlined in this paper,we propose a new development stage of Geomatics entitled‘Connected Geomatics,’which is defined as a multi-disciplinary science and technology that uses systematic approaches and integrates methods of spatio-temporal data acquisition,information extraction,network management,knowledge discovery,and spatial sensing and recognition,as well as intelligent location-based services pertaining to any physical objects and human activities on the earth.It is envisioned that the advancement of Geomatics will make a great contribution to human sustainable development.
文摘A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their construction.Of particular debate are the choice and specification of input variables,with the objective of identifying inputs that add value but also aim for model parsimony.Within this context,our paper introduces a principal component analysis(PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables.The proposed methodology is exemplified in the context of variables from the UK 2011 Census,and its output compared to the Office for National Statistics 2011 Output Area Classification(2011 OAC).Through the implementation of the proposed methodology,the quality of the cluster assignment was improved relative to 2011 OAC,manifested by a lower total withincluster sum of square score.Across the UK,more than 70.2%of the Output Areas(OAs)occupied by the newly created classification(i.e.AVS-OAC)outperform the 2011 OAC,with particularly strong performance within Scotland and Wales.