A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data...A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and makes the following contributions: 1) the concept of the top-k prob- abilistic prevalent co-locations based on a possible world model is defined; 2) a framework for discovering the top- k probabilistic prevalent co-locations is set up; 3) a matrix method is proposed to improve the computation of the preva- lence probability of a top-k candidate, and two pruning rules of the matrix block are given to accelerate the search for ex- act solutions; 4) a polynomial matrix is developed to further speed up the top-k candidate refinement process; 5) an ap- proximate algorithm with compensation factor is introduced so that relatively large quantity of data can be processed quickly. The efficiency of our proposed algorithms as well as the accuracy of the approximation algorithms is evaluated with an extensive set of experiments using both synthetic and real uncertain data sets.展开更多
To find disaster relevant social media messages,current approaches utilize natural language processing methods or machine learning algorithms relying on text only,which have not been perfected due to the variability a...To find disaster relevant social media messages,current approaches utilize natural language processing methods or machine learning algorithms relying on text only,which have not been perfected due to the variability and uncertainty in the language used on social media and ignoring the geographic context of the messages when posted.Meanwhile,a disaster relevant social media message is highly sensitive to its posting location and time.However,limited studies exist to explore what spatial features and the extent of how temporal,and especially spatial features can aid text classification.This paper proposes a geographic context-aware text mining method to incorporate spatial and temporal information derived from social media and authoritative datasets,along with the text information,for classifying disaster relevant social media posts.This work designed and demonstrated how diverse types of spatial and temporal features can be derived from spatial data,and then used to enhance text mining.The deep learning-based method and commonly used machine learning algorithms,assessed the accuracy of the enhanced text-mining method.The performance results of different classification models generated by various combinations of textual,spatial,and temporal features indicate that additional spatial and temporal features help improve the overall accuracy of the classification.展开更多
Smart refueling can reduce costs and lower the possibility of an emergency. Refueling intelligence can only be obtained by mining historical refueling behaviors from big data, however, without devices, such as fuel ta...Smart refueling can reduce costs and lower the possibility of an emergency. Refueling intelligence can only be obtained by mining historical refueling behaviors from big data, however, without devices, such as fuel tank cursors, and cooperation from drivers, these behaviors are hard to detect. Thus, detecting refueling behaviors from big dala derived from easy-to-approach trajectories is one of/he most efficient retrieve evidences for research of refueling behaviors. In this paper, we describe a complete procecdure for detecting refoeling behavior in big data derived from freight trajectories. This procedure involves the inte- gration of spatial data mining and machine-learning techniques. The key pall of the methodology is a pattern detector that extends the naive Bayes classifier. By draw'ing on the spatial and temporal characteristics of freight trajectories, refileling behaviors can be identified with high accuracy. Fu,lher, we present a refueling prediction and recommendation system to show how our refueling detector can be used practically in big data. Our experimetlts on real trajeclories show that our refueling detector is accurate, and the system performs well.展开更多
A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their constructio...A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their construction.Of particular debate are the choice and specification of input variables,with the objective of identifying inputs that add value but also aim for model parsimony.Within this context,our paper introduces a principal component analysis(PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables.The proposed methodology is exemplified in the context of variables from the UK 2011 Census,and its output compared to the Office for National Statistics 2011 Output Area Classification(2011 OAC).Through the implementation of the proposed methodology,the quality of the cluster assignment was improved relative to 2011 OAC,manifested by a lower total withincluster sum of square score.Across the UK,more than 70.2%of the Output Areas(OAs)occupied by the newly created classification(i.e.AVS-OAC)outperform the 2011 OAC,with particularly strong performance within Scotland and Wales.展开更多
The development of global informatization and its integration with industrialization symbolizes that human society has entered into the big data era.This article covers seven new characteristics of Geomatics(i.e.ubiqu...The development of global informatization and its integration with industrialization symbolizes that human society has entered into the big data era.This article covers seven new characteristics of Geomatics(i.e.ubiquitous sensor,multi-dimensional dynamics,integration via networking,full automation in real time,from sensing to recognition,crowdsourcing and volunteered geographic information,and serviceoriented science),and puts forward the corresponding critical technical challenges in the construction of integrated space-air-ground geospatial networks.Through the discussions outlined in this paper,we propose a new development stage of Geomatics entitled‘Connected Geomatics,’which is defined as a multi-disciplinary science and technology that uses systematic approaches and integrates methods of spatio-temporal data acquisition,information extraction,network management,knowledge discovery,and spatial sensing and recognition,as well as intelligent location-based services pertaining to any physical objects and human activities on the earth.It is envisioned that the advancement of Geomatics will make a great contribution to human sustainable development.展开更多
文摘A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and makes the following contributions: 1) the concept of the top-k prob- abilistic prevalent co-locations based on a possible world model is defined; 2) a framework for discovering the top- k probabilistic prevalent co-locations is set up; 3) a matrix method is proposed to improve the computation of the preva- lence probability of a top-k candidate, and two pruning rules of the matrix block are given to accelerate the search for ex- act solutions; 4) a polynomial matrix is developed to further speed up the top-k candidate refinement process; 5) an ap- proximate algorithm with compensation factor is introduced so that relatively large quantity of data can be processed quickly. The efficiency of our proposed algorithms as well as the accuracy of the approximation algorithms is evaluated with an extensive set of experiments using both synthetic and real uncertain data sets.
基金the funding support from the Vilas Associates Competition Award at University of Wisconsin-Madison(UW-Madison)the National Science Foundation[grant number 1940091].
文摘To find disaster relevant social media messages,current approaches utilize natural language processing methods or machine learning algorithms relying on text only,which have not been perfected due to the variability and uncertainty in the language used on social media and ignoring the geographic context of the messages when posted.Meanwhile,a disaster relevant social media message is highly sensitive to its posting location and time.However,limited studies exist to explore what spatial features and the extent of how temporal,and especially spatial features can aid text classification.This paper proposes a geographic context-aware text mining method to incorporate spatial and temporal information derived from social media and authoritative datasets,along with the text information,for classifying disaster relevant social media posts.This work designed and demonstrated how diverse types of spatial and temporal features can be derived from spatial data,and then used to enhance text mining.The deep learning-based method and commonly used machine learning algorithms,assessed the accuracy of the enhanced text-mining method.The performance results of different classification models generated by various combinations of textual,spatial,and temporal features indicate that additional spatial and temporal features help improve the overall accuracy of the classification.
基金supported by a grant from the Science Technology and Innovation Committee of Shenzhen Municipality
文摘Smart refueling can reduce costs and lower the possibility of an emergency. Refueling intelligence can only be obtained by mining historical refueling behaviors from big data, however, without devices, such as fuel tank cursors, and cooperation from drivers, these behaviors are hard to detect. Thus, detecting refueling behaviors from big dala derived from easy-to-approach trajectories is one of/he most efficient retrieve evidences for research of refueling behaviors. In this paper, we describe a complete procecdure for detecting refoeling behavior in big data derived from freight trajectories. This procedure involves the inte- gration of spatial data mining and machine-learning techniques. The key pall of the methodology is a pattern detector that extends the naive Bayes classifier. By draw'ing on the spatial and temporal characteristics of freight trajectories, refileling behaviors can be identified with high accuracy. Fu,lher, we present a refueling prediction and recommendation system to show how our refueling detector can be used practically in big data. Our experimetlts on real trajeclories show that our refueling detector is accurate, and the system performs well.
文摘A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their construction.Of particular debate are the choice and specification of input variables,with the objective of identifying inputs that add value but also aim for model parsimony.Within this context,our paper introduces a principal component analysis(PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables.The proposed methodology is exemplified in the context of variables from the UK 2011 Census,and its output compared to the Office for National Statistics 2011 Output Area Classification(2011 OAC).Through the implementation of the proposed methodology,the quality of the cluster assignment was improved relative to 2011 OAC,manifested by a lower total withincluster sum of square score.Across the UK,more than 70.2%of the Output Areas(OAs)occupied by the newly created classification(i.e.AVS-OAC)outperform the 2011 OAC,with particularly strong performance within Scotland and Wales.
基金supported by the National Natural Science Foundation of China(NSFC)[grant numbers 41501383,91438203]China Postdoctoral Science Foundation[grant number 2014M562006]+1 种基金Natural Science Foundation of Hubei Province[grant number 2015CFB330]Fundamental Research Funds for the Central Universities[grant number 2042016kf0163].
文摘The development of global informatization and its integration with industrialization symbolizes that human society has entered into the big data era.This article covers seven new characteristics of Geomatics(i.e.ubiquitous sensor,multi-dimensional dynamics,integration via networking,full automation in real time,from sensing to recognition,crowdsourcing and volunteered geographic information,and serviceoriented science),and puts forward the corresponding critical technical challenges in the construction of integrated space-air-ground geospatial networks.Through the discussions outlined in this paper,we propose a new development stage of Geomatics entitled‘Connected Geomatics,’which is defined as a multi-disciplinary science and technology that uses systematic approaches and integrates methods of spatio-temporal data acquisition,information extraction,network management,knowledge discovery,and spatial sensing and recognition,as well as intelligent location-based services pertaining to any physical objects and human activities on the earth.It is envisioned that the advancement of Geomatics will make a great contribution to human sustainable development.