skip to main content
research-article
Free Access

The Pathologies of Big Data: Scale up your datasets enough and all your apps will come undone. What are the typical problems and where do the bottlenecks generally surface?

Published:01 July 2009Publication History
Skip Abstract Section

Abstract

What is "big data" anyway? Gigabytes? Terabytes? Petabytes? A brief personal memory may provide some perspective. In the late 1980s at Columbia University I had the chance to play around with what at the time was a truly enormous "disk": the IBM 3850 MSS (Mass Storage System). The MSS was actually a fully automatic robotic tape library and associated staging disks to make random access, if not exactly instantaneous, at least fully transparent. In Columbia’s configuration, it stored a total of around 100 GB. It was already on its way out by the time I got my hands on it, but in its heyday, the early to mid-1980s, it had been used to support access by social scientists to what was unquestionably "big data" at the time: the entire 1980 U.S. Census database.

References

  1. Codd, E. F. 1970. A relational model for large shared data banks. Communications of the ACM 13(6): 377-387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. IBM 3850 Mass Storage System; http://www.columbia.edu/acis/history/mss.html.Google ScholarGoogle Scholar
  3. IBM Archives: IBM 3380 direct access storage device; http://www-03.ibm.com/ibm/history/exhibits/storage/storage_3380.html.Google ScholarGoogle Scholar
  4. Kimball, R. 1996. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. New York: John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Litke, A. M., et al. 2004. What does the eye tell the brain? Development of a system for the large-scale recording of retinal output activity. IEEE Transactions on Nuclear Science 51(4): 1434-1440.Google ScholarGoogle ScholarCross RefCross Ref
  6. PostgreSQL: The world's most advanced open source database; http://www.postgresql.org.Google ScholarGoogle Scholar
  7. The R Project for Statistical Computing; http://www.r-project.org.Google ScholarGoogle Scholar
  8. Sloan Digital Sky Survey; http://www.sdss.org.Google ScholarGoogle Scholar
  9. Throughput and Interface Performance. Tom's Winter 2008 Hard Drive Guide; http://www.tomshardware.com/reviews/hdd-terabyte-1tb,2077-11.html.Google ScholarGoogle Scholar
  10. WLCG (Worldwide LHC Computing Grid); http://lcg.web.cern.ch/LCG/public/.Google ScholarGoogle Scholar
  11. Zero-One-Infinity Rule; http://www.catb.org/~esr/jargon/html/Z/Zero-One-Infinity-Rule.html.Google ScholarGoogle Scholar

Index Terms

  1. The Pathologies of Big Data: Scale up your datasets enough and all your apps will come undone. What are the typical problems and where do the bottlenecks generally surface?

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Queue
      Queue  Volume 7, Issue 6
      Data
      July 2009
      34 pages
      ISSN:1542-7730
      EISSN:1542-7749
      DOI:10.1145/1563821
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 July 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Popular
      • Editor picked

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format