ABSTRACT
We describe our experiences teaching MapReduce in a large undergraduate lecture course using public cloud services. Using the cloud, every student could carry out scalability benchmarking assignments on realistic hardware, which would have been impossible otherwise. Over two semesters, over 500 students took our course. We believe this is the first large-scale demonstration that it is feasible to use pay-as-you-go billing in the Cloud for a large undergraduate course. Modest instructor effort was sufficient to prevent students from overspending. Average per-pupil expenses in the Cloud were under $45, less than half our available grant funding. Students were excited by the assignment: 90% said they thought it should be retained in future course offerings.
- M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, et al. Above the Clouds: A Berkeley View of Cloud Computing. Technical Report 2009--28, UC Berkeley, 2009.Google Scholar
- R. A. Brown. Hadoop at home: large-scale computing at a small college. In SIGCSE, 2009. Google ScholarDigital Library
- Cloudera, inc. Configuring and Running CDH Cloud Scripts. Retrieved August 31, 2011 from https://ccp.cloudera.com/display/CDH2DOC/Configuring+and+Running+CDH+Cloud+Scripts, 2011.Google Scholar
- A. Couch. Comp150 CPA. Retrieved August 21, 2011 from http://www.cs.tufts.edu/comp/150CPA/, 2011.Google Scholar
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, Volume 51(Issue 1):107--113, 2008. Google ScholarDigital Library
- P. Garrity, T. Yates, R. Brown, and E. Shoop. WebMapReduce: an accessible and adaptable tool for teaching map-reduce computing. In SIGCSE, 2011. Google ScholarDigital Library
- J. Hirai, S. Raghavan, H. Garcia-Molina, and H. Paepcke. WebBase: A repository of web pages. In WWW, May 2000. Google ScholarDigital Library
- M. Johnson, R. H. Liao, A. Rasmussen, R. Sridharan, D. D. Garcia, and B. Harvey. Infusing Parallelism into Introductory Computer Science Curriculum using MapReduce. Technical Report EECS-2008--34, UC Berkeley, 2008.Google Scholar
- A. Kimball, S. Michels-Slettvet, and C. Bisciglia. Cluster computing for web-scale data processing. In SIGCSE, 2008. Google ScholarDigital Library
- J. Lin. Data-Intensive Information Processing Applications. Retrieved August 21, 2011 from http://www.umiacs.umd.edu/ jimmylin/cloud-2010-Spring/info.html, 2011.Google Scholar
- D. J. Malan. Moving cs50 into the cloud. J. Comput. Small Coll., 25:111--120, June 2010. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- C. Shaoul and C. Westbury. A usenet corpus. Retrieved August 21, 2011 from http://www.psych.ualberta.ca/ westburylab/downloads/usenetcorpus.download.html, May 2011.Google Scholar
Index Terms
- Experiences teaching MapReduce in the cloud
Recommendations
Using clouds for MapReduce measurement assignments
We describe our experiences teaching MapReduce in a large undergraduate lecture course using public cloud services and the standard Hadoop API. Using the standard API, students directly experienced the quality of industrial big-data tools. Using the ...
Cloud computing: developing contemporary computer science curriculum for a cloud-first future
ITiCSE 2018 Companion: Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science EducationCloud Computing adoption has seen significant growth over the last five years. It offers a diverse range of scalable and redundant service deployment models, including Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), Software-as-a-...
Cloud computing: developing contemporary computer science curriculum for a cloud-first future
ITiCSE 2018: Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science EducationCloud Computing has gained significant momentum in the last five years and is regarded as a paradigm shift away from traditional 'silo' based computing. It is no longer seen as a niche area of technology, offering a diverse range of scalable and ...
Comments