ABSTRACT
To embrace the open data movement, increasingly many datasets have been published on the Web to be reused. Users, when assessing the usefulness of an unfamiliar dataset, need means to quickly inspect its contents. To satisfy the needs, we propose to automatically extract an optimal small portion from a dataset, called a snippet, to concisely illustrate the contents of the dataset. We consider the quality of a snippet from three aspects: coverage, familiarity, and cohesion, which are jointly formulated in a new combinatorial optimization problem called the maximum-weight-and-coverage connected graph problem (MwcCG). We give a constant-factor approximation algorithm for this NP-hard problem, and experiment with our solution on real-world datasets. Our quantitative analysis and user study show that our approach outperforms a baseline approach.
- . Bai, R. Delbru, and G. Tummarello. RDF snippets for Semantic Web search engines. In phProc. O™, part II, pages 1304--1318, November 2008. Google ScholarDigital Library
- . Basse, F. Gandon, I. Mirbel, and M. Lo. DFS-based frequent graph pattern extraction to characterize the content of RDF triple stores. In phProc. WebSci, April 2010.Google Scholar
- . Benedetti, L. Po, and S. Bergamaschi. A visual summary for Linked Open Data sources. In phProc. ISWC Posters & Demos, pages 173--176, October 2014. Google ScholarDigital Library
- . Campinas, R. Delbru, and G. Tummarello. Efficiency and precision trade-offs in graph summary algorithms. In phProc. IDEAS, pages 38--47, October 2013. Google ScholarDigital Library
- . Cheng, C. Jin, and Y. Qu. HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization. In phProc. IJCAI, pages 3705--3711, July 2016.Google Scholar
- . Cheng, T. Tran, and Y. Qu. RELIN: Relatedness and informativeness-based centrality for entity summarization. In phProc. ISWC, pages 114--129, October 2011. Google ScholarDigital Library
- . Cheng, D. Xu, and Y. Qu. C3DGoogle Scholar
- P: A summarization method for interactive entity resolution. phJ. Web Semant., 35:202--213, December 2015.Google Scholar
- . Cheng, D. Xu, and Y. Qu. Summarizing entity descriptions for effective and efficient human-centered entity linking. In phProc. WWW, pages 184--194, May 2015. Google ScholarDigital Library
- . Dolby, A. Fokoue, A. Kalyanpur, A. Kershenbaum, E. Schonberg, K. Srinivas, and L. Ma. Scalable semantic retrieval through summarization and refinement. In phProc. AAAI, pages 299--304, July 2007. Google ScholarDigital Library
- . Gunaratna, K. Thirunarayan, and A. Sheth. FACES: Diversity-aware entity summarization using incremental hierarchical conceptual clustering. In phProc. AAAI, pages 116--122, January 2015. Google ScholarDigital Library
- .S. Hochbaum and A. Pathria. Node-optimal connected k-subgraphs. Manuscript, August 1994.Google Scholar
- . Hübler, H.-P. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis algorithms for representative subgraph sampling. In phProc. ICDM, pages 283--292, December 2008. Google ScholarDigital Library
- . Khatchadourian and M.P. Consens. ExpLOD: Summary-based exploration of interlinking and RDF usage in the Linked Open Data cloud. In phProc. ESWC, pages 272--287, May--June 2010. Google ScholarDigital Library
- .F. Lee and D.R. Dooly. Decomposition algorithms for the maximum-weight connected graph problem. phNav. Res. Logist., 45(8):817--837, December 1998.Google ScholarCross Ref
- . Leskovec and C. Faloutsos. Sampling from large graphs. In phProc. KDD, pages 631--636, August 2006. Google ScholarDigital Library
- . Liu and Y. Chen. Differentiating search results on structured data. phACM Trans. Database Syst., 37(1):4, February 2012. Google ScholarDigital Library
- . Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. phTechnical Report, Stanford InfoLab, 1999.Google Scholar
- . Presutti, L. Aroyo, A. Adamou, B. Schopman, A. Gangemi, and G. Schreiber. Extracting core knowledge from Linked Data. In phProc. COLD, October 2011. Google ScholarDigital Library
- . Rietveld, R. Hoekstra, S. Schlobach, and C. Guéret. Structural properties as proxy for semantic relevance in RDF graph sampling. In phProc. ISWC, pages 81--96, October 2014. Google ScholarDigital Library
- . Thalhammer, I. Toma, A.J. Roa-Valverde, and D. Fensel. Leveraging usage data for Linked Data movie entity summarization. In phProc. USEWOD, April 2012.Google Scholar
- . Tian, R.A. Hankins, and J.M. Patel. Efficient aggregation for graph summarization. In phProc. SIGMOD, pages 567--580, June 2008. Google ScholarDigital Library
- . Troullinou, H. Kondylakis, E. Daskalaki, and D. Plexousakis. RDF digest: Efficient summarization of RDF/S KBs. In phProc. ESWC, pages 119--134, May--June 2015. Google ScholarDigital Library
- . Zhang, G. Cheng, and Y. Qu. Ontology summarization based on RDF sentence graph. In phProc. WWW, pages 707--716, May 2007. Google ScholarDigital Library
Index Terms
- Generating Illustrative Snippets for Open Data on the Web
Recommendations
Towards More Usable Dataset Search: From Query Characterization to Snippet Generation
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementReusing published datasets on the Web is of great interest to researchers and developers. Their data needs may be met by submitting queries to a dataset search engine to retrieve relevant datasets. In this ongoing work towards developing a more usable ...
Fast and Practical Snippet Generation for RDF Datasets
Triple-structured open data creates value in many ways. However, the reuse of datasets is still challenging. Users feel difficult to assess the usefulness of a large dataset containing thousands or millions of triples. To satisfy the needs, existing ...
Auditing the Partisanship of Google Search Snippets
WWW '19: The World Wide Web ConferenceThe text snippets presented in web search results provide users with a slice of page content that they can quickly scan to help inform their click decisions. However, little is known about how these snippets are generated or how they relate to a user's ...
Comments