poster

A distributed placement service for graph-structured and tree-structured data

Authors:
Gregory Buehrer

Microsoft, Redmond, CA, USA

Microsoft, Redmond, CA, USA
View Profile

,
Srinivasan Parthasarathy

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

,
Shirish Tatikonda

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingJanuary 2010Pages 355–356https://doi.org/10.1145/1693453.1693511

Published:09 January 2010Publication History

PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 355–356

ABSTRACT

Effective data placement strategies can enhance the performance of data-intensive applications implemented on high end computing clusters. Such strategies can have a significant impact in localizing the computation, in minimizing synchronization (communication) costs, in enhancing reliability (via strategic replication policies), and in ensuring a balanced workload or enhancing the available bandwidth from massive storage devices (e.g. disk arrays).

Existing work has largely targeted the placement of relatively simple data types or entities (e.g. elements, vectors, sets, and arrays). Here we investigate several hash-based distributed data placement methods targeting tree- and graph- structured data, and develop a locality enhancing placement service for large cluster systems. Target applications include the placement of a single large graph (e.g. Web graph), a single large tree (e.g. large XML file), a forest of graphs or trees (e.g. XML database) and other specialized graph data types - bi-partite (query-click graphs), directed acyclic graphs etc. We empirically evaluate our service by demonstrating its use in improving mining executions for pattern discovery, nearest neighbor searching, graph computations, and applications that combine link and content analysis.

References

A. Broder et al. Min-wise independent permutations (extended abstract). In phSTOC, pages 327--336, 1998. Google ScholarDigital Library
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In phWSDM, pages 95--106, 2008. Google ScholarDigital Library
G. Buehrer et al. Toward terabyte pattern mining: an architecture-conscious solution. In phPPOPP, pages 2--12, 2007. Google ScholarDigital Library
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In phSTOC, pages 604--613, 1998. Google ScholarDigital Library
S. Parthasarathy et al. Parallel Data Mining for Association Rules on Shared-Memory Systems. In phKAIS, 3 (1): 1--29, 2001. Google ScholarDigital Library
S. Tatikonda and S. Parthasarathy. Hashing Tree-Structured Data: Methods and Applications. phin ICDE (to appear), 2009.Google Scholar

Index Terms

A distributed placement service for graph-structured and tree-structured data
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

A distributed placement service for graph-structured and tree-structured data
PPoPP '10

Effective data placement strategies can enhance the performance of data-intensive applications implemented on high end computing clusters. Such strategies can have a significant impact in localizing the computation, in minimizing synchronization (...
Read More
Tree-structured data placement scheme with cluster-aided top-down transmission in erasure-coded distributed storage systems
Abstract
In erasure-coded distributed storage systems, the rapid completion of data placement process is very critical to maintain system performance, where the process is defined as to insert coded blocks into a set of redundant storage nodes. ...
Read More
Design of ETL Tool for Structured Data Based on Data Warehouse
CSAE '20: Proceedings of the 4th International Conference on Computer Science and Application Engineering

This paper takes the current business system of a mobile communication-equipment-chain sales-service-company as an example, and analyzes the problem that the data from multiple data sources cannot directly be loaded into the data warehouse by the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
January 2010
372 pages
ISBN:9781605588773
DOI:10.1145/1693453
General Chairs:
R. Govindarajan
Indian Institute of Science
,
David Padua
UIUC
,
Program Chair:
Mary Hall
University of Utah
ACM SIGPLAN Notices Volume 45, Issue 5
PPoPP '10
May 2010
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1837853
Issue’s Table of Contents
Copyright © 2010 Copyright held by author(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 January 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data placement
distributed computing
structured data
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate230of1,014submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 360
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A distributed placement service for graph-structured and tree-structured data

PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

ABSTRACT

References

Cited By

Index Terms

Recommendations

A distributed placement service for graph-structured and tree-structured data

Tree-structured data placement scheme with cluster-aided top-down transmission in erasure-coded distributed storage systems

Design of ETL Tool for Structured Data Based on Data Warehouse