Abstract
What is "big data" anyway? Gigabytes? Terabytes? Petabytes? A brief personal memory may provide some perspective. In the late 1980s at Columbia University I had the chance to play around with what at the time was a truly enormous "disk": the IBM 3850 MSS (Mass Storage System). The MSS was actually a fully automatic robotic tape library and associated staging disks to make random access, if not exactly instantaneous, at least fully transparent. In Columbia’s configuration, it stored a total of around 100 GB. It was already on its way out by the time I got my hands on it, but in its heyday, the early to mid-1980s, it had been used to support access by social scientists to what was unquestionably "big data" at the time: the entire 1980 U.S. Census database.
- Codd, E. F. 1970. A relational model for large shared data banks. Communications of the ACM 13(6): 377-387. Google ScholarDigital Library
- IBM 3850 Mass Storage System; http://www.columbia.edu/acis/history/mss.html.Google Scholar
- IBM Archives: IBM 3380 direct access storage device; http://www-03.ibm.com/ibm/history/exhibits/storage/storage_3380.html.Google Scholar
- Kimball, R. 1996. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. New York: John Wiley & Sons. Google ScholarDigital Library
- Litke, A. M., et al. 2004. What does the eye tell the brain? Development of a system for the large-scale recording of retinal output activity. IEEE Transactions on Nuclear Science 51(4): 1434-1440.Google ScholarCross Ref
- PostgreSQL: The world's most advanced open source database; http://www.postgresql.org.Google Scholar
- The R Project for Statistical Computing; http://www.r-project.org.Google Scholar
- Sloan Digital Sky Survey; http://www.sdss.org.Google Scholar
- Throughput and Interface Performance. Tom's Winter 2008 Hard Drive Guide; http://www.tomshardware.com/reviews/hdd-terabyte-1tb,2077-11.html.Google Scholar
- WLCG (Worldwide LHC Computing Grid); http://lcg.web.cern.ch/LCG/public/.Google Scholar
- Zero-One-Infinity Rule; http://www.catb.org/~esr/jargon/html/Z/Zero-One-Infinity-Rule.html.Google Scholar
Index Terms
- The Pathologies of Big Data: Scale up your datasets enough and all your apps will come undone. What are the typical problems and where do the bottlenecks generally surface?
Comments