skip to main content
10.1145/3018661.3018670acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Generating Illustrative Snippets for Open Data on the Web

Published:02 February 2017Publication History

ABSTRACT

To embrace the open data movement, increasingly many datasets have been published on the Web to be reused. Users, when assessing the usefulness of an unfamiliar dataset, need means to quickly inspect its contents. To satisfy the needs, we propose to automatically extract an optimal small portion from a dataset, called a snippet, to concisely illustrate the contents of the dataset. We consider the quality of a snippet from three aspects: coverage, familiarity, and cohesion, which are jointly formulated in a new combinatorial optimization problem called the maximum-weight-and-coverage connected graph problem (MwcCG). We give a constant-factor approximation algorithm for this NP-hard problem, and experiment with our solution on real-world datasets. Our quantitative analysis and user study show that our approach outperforms a baseline approach.

References

  1. . Bai, R. Delbru, and G. Tummarello. RDF snippets for Semantic Web search engines. In phProc. O™, part II, pages 1304--1318, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. . Basse, F. Gandon, I. Mirbel, and M. Lo. DFS-based frequent graph pattern extraction to characterize the content of RDF triple stores. In phProc. WebSci, April 2010.Google ScholarGoogle Scholar
  3. . Benedetti, L. Po, and S. Bergamaschi. A visual summary for Linked Open Data sources. In phProc. ISWC Posters & Demos, pages 173--176, October 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. . Campinas, R. Delbru, and G. Tummarello. Efficiency and precision trade-offs in graph summary algorithms. In phProc. IDEAS, pages 38--47, October 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. . Cheng, C. Jin, and Y. Qu. HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization. In phProc. IJCAI, pages 3705--3711, July 2016.Google ScholarGoogle Scholar
  6. . Cheng, T. Tran, and Y. Qu. RELIN: Relatedness and informativeness-based centrality for entity summarization. In phProc. ISWC, pages 114--129, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. . Cheng, D. Xu, and Y. Qu. C3DGoogle ScholarGoogle Scholar
  8. P: A summarization method for interactive entity resolution. phJ. Web Semant., 35:202--213, December 2015.Google ScholarGoogle Scholar
  9. . Cheng, D. Xu, and Y. Qu. Summarizing entity descriptions for effective and efficient human-centered entity linking. In phProc. WWW, pages 184--194, May 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. . Dolby, A. Fokoue, A. Kalyanpur, A. Kershenbaum, E. Schonberg, K. Srinivas, and L. Ma. Scalable semantic retrieval through summarization and refinement. In phProc. AAAI, pages 299--304, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. . Gunaratna, K. Thirunarayan, and A. Sheth. FACES: Diversity-aware entity summarization using incremental hierarchical conceptual clustering. In phProc. AAAI, pages 116--122, January 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. .S. Hochbaum and A. Pathria. Node-optimal connected k-subgraphs. Manuscript, August 1994.Google ScholarGoogle Scholar
  13. . Hübler, H.-P. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis algorithms for representative subgraph sampling. In phProc. ICDM, pages 283--292, December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. . Khatchadourian and M.P. Consens. ExpLOD: Summary-based exploration of interlinking and RDF usage in the Linked Open Data cloud. In phProc. ESWC, pages 272--287, May--June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. .F. Lee and D.R. Dooly. Decomposition algorithms for the maximum-weight connected graph problem. phNav. Res. Logist., 45(8):817--837, December 1998.Google ScholarGoogle ScholarCross RefCross Ref
  16. . Leskovec and C. Faloutsos. Sampling from large graphs. In phProc. KDD, pages 631--636, August 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. . Liu and Y. Chen. Differentiating search results on structured data. phACM Trans. Database Syst., 37(1):4, February 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. . Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. phTechnical Report, Stanford InfoLab, 1999.Google ScholarGoogle Scholar
  19. . Presutti, L. Aroyo, A. Adamou, B. Schopman, A. Gangemi, and G. Schreiber. Extracting core knowledge from Linked Data. In phProc. COLD, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. . Rietveld, R. Hoekstra, S. Schlobach, and C. Guéret. Structural properties as proxy for semantic relevance in RDF graph sampling. In phProc. ISWC, pages 81--96, October 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. . Thalhammer, I. Toma, A.J. Roa-Valverde, and D. Fensel. Leveraging usage data for Linked Data movie entity summarization. In phProc. USEWOD, April 2012.Google ScholarGoogle Scholar
  22. . Tian, R.A. Hankins, and J.M. Patel. Efficient aggregation for graph summarization. In phProc. SIGMOD, pages 567--580, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. . Troullinou, H. Kondylakis, E. Daskalaki, and D. Plexousakis. RDF digest: Efficient summarization of RDF/S KBs. In phProc. ESWC, pages 119--134, May--June 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. . Zhang, G. Cheng, and Y. Qu. Ontology summarization based on RDF sentence graph. In phProc. WWW, pages 707--716, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generating Illustrative Snippets for Open Data on the Web

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
            February 2017
            868 pages
            ISBN:9781450346757
            DOI:10.1145/3018661

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 February 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader