Current search engines in most geospatial data portals tend to induce users to focus on one single-data characteristic dimension(e.g.popularity and release date).This approach largely fails to take account of users’m...Current search engines in most geospatial data portals tend to induce users to focus on one single-data characteristic dimension(e.g.popularity and release date).This approach largely fails to take account of users’multidimensional preferences for geospatial data,and hence may likely result in a less than optimal user experience in discovering the most applicable dataset.This study reports a machine learning framework to address the ranking challenge,the fundamental obstacle in geospatial data discovery,by(1)identifying a number of ranking features of geospatial data to represent users’multidimensional preferences by considering semantics,user behavior,spatial similarity,and static dataset metadata attributes;(2)applying a machine learning method to automatically learn a ranking function;and(3)proposing a system architecture to combine existing search-oriented open source software,semantic knowledge base,ranking feature extraction,and machine learning algorithm.Results show that the machine learning approach outperforms other methods,in terms of both precision at K and normalized discounted cumulative gain.As an early attempt of utilizing machine learning to improve the search ranking in the geospatial domain,we expect this work to set an example for further research and open the door towards intelligent geospatial data discovery.展开更多
基金NSF I/UCRC:[Grant Number IIP-1338925]NSF EarthCube:[Grant Number ICER-1540998]NASA AIST Program:[Grant Number NNX15AM85G].
文摘Current search engines in most geospatial data portals tend to induce users to focus on one single-data characteristic dimension(e.g.popularity and release date).This approach largely fails to take account of users’multidimensional preferences for geospatial data,and hence may likely result in a less than optimal user experience in discovering the most applicable dataset.This study reports a machine learning framework to address the ranking challenge,the fundamental obstacle in geospatial data discovery,by(1)identifying a number of ranking features of geospatial data to represent users’multidimensional preferences by considering semantics,user behavior,spatial similarity,and static dataset metadata attributes;(2)applying a machine learning method to automatically learn a ranking function;and(3)proposing a system architecture to combine existing search-oriented open source software,semantic knowledge base,ranking feature extraction,and machine learning algorithm.Results show that the machine learning approach outperforms other methods,in terms of both precision at K and normalized discounted cumulative gain.As an early attempt of utilizing machine learning to improve the search ranking in the geospatial domain,we expect this work to set an example for further research and open the door towards intelligent geospatial data discovery.