ABSTRACT
Given the steadily increasing amount of geographic information on the Web, there is a strong need for suitable methods in exploratory data analysis that can be used to efficiently describe the characteristics of such large-scale, often noisy datasets. Existing methods in spatial data mining focus primarily on mining patterns describing spatial proximity relationships such as co-location patterns or spatial associations rules.
In this paper, we present a novel approach to describe the spatial characteristics of geographic information sources comprised of instances of geographic features. Using the concept of interaction characteristics of geographic features, similarities in how features are distributed in space can be computed and interesting patterns of similar features in the datasets regarding their geographic semantics (landmark, local, regional, global) can be determined. For this, we employ clustering techniques of spatial distance statistics.
We discuss the properties of our method and detail a comprehensive evaluation using publicly available datasets (Flickr, Twitter, OpenStreeMap). We demonstrate the feasibility of identifying groups of geographic features with distinct geographic semantics, which then can be used to select subsets of features for subsequent learning tasks or to compare different datasets.
- G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. Proceedings of RecSys '08, 335--336, 2008. Google ScholarDigital Library
- A. Baddeley. Spatstat: An R package for analyzing spatial point patterns. Journal of Statistical Software, 12(6):1--42, 2005.Google ScholarCross Ref
- A. Baddeley. Modeling Strategies. In Handbook of Spatial Statistics, 339--369. CRC Press, 2010.Google Scholar
- A. Baddeley. Multivariate and Marked Point Processes. In Handbook of Spatial Statistics, 371--402. CRC Press, 2010.Google Scholar
- L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. Proceeding of CIKM '09, 523--532, 2009. Google ScholarDigital Library
- D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the Worlds Photos. In Proceedings of WWW 09, 761--770, 2009. Google ScholarDigital Library
- P. J. Diggle. Nonparametric Methods In Handbook of Spatial Statistics, 299--316. CRC Press, 2010.Google Scholar
- Y. Huang, J. Pei, and H. Xiong. Mining Co-location Patterns with Rare Events from Spatial Data Sets. Geoinformatica, 10(3):239--260, 2005. Google ScholarDigital Library
- Y. Huang, S. Shekhar, and H. Xiong. Discovering Colocation Patterns from Spatial Data Sets: A General Approach. IEEE Transactions on Knowledge and Data Engineering, 16(12):1472--1485, 2004. Google ScholarDigital Library
- V. Isham. Spatial Point Process Models. In Handbook of Spatial Statistics, 283--298. CRC Press, 2010.Google Scholar
- L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How Flickr Helps us Make Sense of the World. In Proceedings of MM '07, 631--640, 2007. Google ScholarDigital Library
- C. Kessler, K. Janowicz, and M. Bishr. An Agenda for the Next Generation Gazetteer. In Proceedings of SIGSPATIAL '09, 91--100, 2009. Google ScholarDigital Library
- S. S. Lee and D. Mcleod. Tag-Geotag Correlation in Social Networks. In Proceeding of the 2008 ACM Workshop on Search in Social Media, 59--66, 2008. Google ScholarDigital Library
- H. Ling and K. Okada. An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5):840--853, 2007. Google ScholarDigital Library
- Q. Mei, C. Liu, and H. Su. A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In Proceedings of WWW '07, 533--542, 2007. Google ScholarDigital Library
- T. Rattenbury and M. Naaman. Methods for extracting place semantics from Flickr tags. ACM Transactions on the Web, 3(1):1--30, Jan. 2009. Google ScholarDigital Library
- B. D. Ripley. Modelling Spatial Patterns. Journal of the Royal Statistical Society - Series B, 39(2):172--212, 1977.Google Scholar
- B. D. Ripley. Spatial Statistics. John Wiley & Sons, 1981.Google Scholar
- P. Serdyukov, V. Murdock, and R. van Zwol. Placing Flickr photos on a map. In Proceedings of SIGIR '09, 484--491, 2009. Google ScholarDigital Library
- S. Shekar and Y. Huang. Discovering Spatial Co-location Patterns: A Summary of Results. In Proceedings of SSTD '01, 236--256, 2001. Google ScholarDigital Library
- M.-C. van Lieshout. Spatial Point Process Theory. In Handbook of Spatial Statistics, 263--282. CRC Press, 2010.Google Scholar
- C. Wang, J. Wang, X. Xie, and W.-Y. Ma. Mining geographic knowledge using location aware topic model. In Proc. of GIR '07, 65--70. ACM Press, 2007. Google ScholarDigital Library
- H. Xiong, S. Shekhar, Y. Huang, V. Kumar, X. Ma, and J. S. Yoo. A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects. In Proc. of SDM '04, 78--89, 2004.Google ScholarCross Ref
- Z. Yin, L. Cao, J. Han, C. Zhai, and T. Huang. Geographical Topic Discovery and Comparison. In Proceedings of WWW '11, 247--256, 2011. Google ScholarDigital Library
- J. S. Yoo and S. Shekhar. A Joinless Approach for Mining Spatial Colocation Patterns. IEEE Transactions on Knowledge and Data Engineering, 18(10):1323--1337, 2006. Google ScholarDigital Library
Index Terms
- Exploration and comparison of geographic information sources using distance statistics
Recommendations
Semantic-based pruning of redundant and uninteresting frequent geographic patterns
In geographic association rule mining many patterns are either redundant or contain well known geographic domain associations explicitly represented in knowledge resources such as geographic database schemas and geo-ontologies. Existing spatial ...
Reducing uninteresting spatial association rules in geographic databases using background knowledge: a summary of results
Many association rule-mining algorithms have been proposed in the last few years. Their main drawback is the huge amount of generated patterns. In spatial association rule mining, besides the large amount of rules, many are well-known geographic domain ...
Where to Place Your Next Restaurant?: Optimal Restaurant Placement via Leveraging User-Generated Reviews
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementWhen opening a new restaurant, geographical placement is of prime importance in determining whether it will thrive. Although some methods have been developed to assess the attractiveness of candidate locations for a restaurant, the accuracy is limited ...
Comments