摘要
Accurate home location is increasingly important for urban computing. Existing methods either rely on continuous(and expensive) Global Positioning System(GPS) data or suffer from poor accuracy. In particular, the sparse and noisy nature of social media data poses serious challenges in pinpointing where people live at scale. We revisit this research topic and infer home location within 100 m×100 m squares at 70% accuracy for 76% and 71%of active users in New York City and the Bay Area, respectively. To the best of our knowledge, this is the first time home location has been detected at such a fine granularity using sparse and noisy data. Since people spend a large portion of their time at home, our model enables novel applications. As an example, we focus on modeling people's health at scale by linking their home locations with publicly available statistics, such as education disparity. Results in multiple geographic regions demonstrate both the effectiveness and added value of our home localization method and reveal insights that eluded earlier studies. In addition, we are able to discover the real buzz in the communities where people live.
Accurate home location is increasingly important continuous (and expensive) Global Positioning System (GPS) for urban computing. Existing methods either rely on data or suffer from poor accuracy. In particular, the sparse and noisy nature of social media data poses serious challenges in pinpointing where people live at scale. We revisit this research topic and infer home location within 100 mx100 m squares at 70% accuracy for 76% and 71% of active users in New York City and the Bay Area, respectively. To the best of our knowledge, this is the first time home location has been detected at such a fine granularity using sparse and noisy data. Since people spend a large portion of their time at home, our model enables novel applications. As an example, we focus on modeling people's health at scale by linking their home locations with publicly available statistics, such as education disparity. Results in mtdtiple geographic regions demonstrate both the effectiveness and added value of our home localization method and reveal insights that eluded earlier studies. In addition, we are able to discover the real buzz in the communities where people live.
基金
Project supported by the Goergen Institute for Data Science,New York State and the Xerox Foundation