skip to main content
10.1145/2093973.2094017acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Exploration and comparison of geographic information sources using distance statistics

Published:01 November 2011Publication History

ABSTRACT

Given the steadily increasing amount of geographic information on the Web, there is a strong need for suitable methods in exploratory data analysis that can be used to efficiently describe the characteristics of such large-scale, often noisy datasets. Existing methods in spatial data mining focus primarily on mining patterns describing spatial proximity relationships such as co-location patterns or spatial associations rules.

In this paper, we present a novel approach to describe the spatial characteristics of geographic information sources comprised of instances of geographic features. Using the concept of interaction characteristics of geographic features, similarities in how features are distributed in space can be computed and interesting patterns of similar features in the datasets regarding their geographic semantics (landmark, local, regional, global) can be determined. For this, we employ clustering techniques of spatial distance statistics.

We discuss the properties of our method and detail a comprehensive evaluation using publicly available datasets (Flickr, Twitter, OpenStreeMap). We demonstrate the feasibility of identifying groups of geographic features with distinct geographic semantics, which then can be used to select subsets of features for subsequent learning tasks or to compare different datasets.

References

  1. G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. Proceedings of RecSys '08, 335--336, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Baddeley. Spatstat: An R package for analyzing spatial point patterns. Journal of Statistical Software, 12(6):1--42, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Baddeley. Modeling Strategies. In Handbook of Spatial Statistics, 339--369. CRC Press, 2010.Google ScholarGoogle Scholar
  4. A. Baddeley. Multivariate and Marked Point Processes. In Handbook of Spatial Statistics, 371--402. CRC Press, 2010.Google ScholarGoogle Scholar
  5. L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. Proceeding of CIKM '09, 523--532, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the Worlds Photos. In Proceedings of WWW 09, 761--770, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. J. Diggle. Nonparametric Methods In Handbook of Spatial Statistics, 299--316. CRC Press, 2010.Google ScholarGoogle Scholar
  8. Y. Huang, J. Pei, and H. Xiong. Mining Co-location Patterns with Rare Events from Spatial Data Sets. Geoinformatica, 10(3):239--260, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Huang, S. Shekhar, and H. Xiong. Discovering Colocation Patterns from Spatial Data Sets: A General Approach. IEEE Transactions on Knowledge and Data Engineering, 16(12):1472--1485, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Isham. Spatial Point Process Models. In Handbook of Spatial Statistics, 283--298. CRC Press, 2010.Google ScholarGoogle Scholar
  11. L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How Flickr Helps us Make Sense of the World. In Proceedings of MM '07, 631--640, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Kessler, K. Janowicz, and M. Bishr. An Agenda for the Next Generation Gazetteer. In Proceedings of SIGSPATIAL '09, 91--100, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. S. Lee and D. Mcleod. Tag-Geotag Correlation in Social Networks. In Proceeding of the 2008 ACM Workshop on Search in Social Media, 59--66, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Ling and K. Okada. An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5):840--853, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Q. Mei, C. Liu, and H. Su. A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In Proceedings of WWW '07, 533--542, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Rattenbury and M. Naaman. Methods for extracting place semantics from Flickr tags. ACM Transactions on the Web, 3(1):1--30, Jan. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. D. Ripley. Modelling Spatial Patterns. Journal of the Royal Statistical Society - Series B, 39(2):172--212, 1977.Google ScholarGoogle Scholar
  18. B. D. Ripley. Spatial Statistics. John Wiley & Sons, 1981.Google ScholarGoogle Scholar
  19. P. Serdyukov, V. Murdock, and R. van Zwol. Placing Flickr photos on a map. In Proceedings of SIGIR '09, 484--491, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Shekar and Y. Huang. Discovering Spatial Co-location Patterns: A Summary of Results. In Proceedings of SSTD '01, 236--256, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M.-C. van Lieshout. Spatial Point Process Theory. In Handbook of Spatial Statistics, 263--282. CRC Press, 2010.Google ScholarGoogle Scholar
  22. C. Wang, J. Wang, X. Xie, and W.-Y. Ma. Mining geographic knowledge using location aware topic model. In Proc. of GIR '07, 65--70. ACM Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Xiong, S. Shekhar, Y. Huang, V. Kumar, X. Ma, and J. S. Yoo. A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects. In Proc. of SDM '04, 78--89, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  24. Z. Yin, L. Cao, J. Han, C. Zhai, and T. Huang. Geographical Topic Discovery and Comparison. In Proceedings of WWW '11, 247--256, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. S. Yoo and S. Shekhar. A Joinless Approach for Mining Spatial Colocation Patterns. IEEE Transactions on Knowledge and Data Engineering, 18(10):1323--1337, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploration and comparison of geographic information sources using distance statistics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        GIS '11: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
        November 2011
        559 pages
        ISBN:9781450310314
        DOI:10.1145/2093973

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 November 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate220of1,116submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader