18-845: Internet Services
Carnegie Mellon University, Spring 2021

Syllabus (pdf) | Critiques | Individual Project (IP) | Group Project (GP)

1. Instructor

Prof. David O'Hallaron, droh@andrew.cmu.edu, GHC 7517
Office hours: Mon 5:00-6:00pm EST (or by appt.) via Zoom

TA: Shrivar Bhuwalka, sbhuwalk@andrew.cmu.edu
Office hours: Tue 3:00-4:00pm EST (or by appt.) via Zoom

2. Organization

Class times: Mon and Wed, 3:20-4:40pm, remote and synchronous via Zoom.
Web page: http://www.ece.cmu.edu/~ece845
Class mailing list: 18-845@cs.cmu.edu (After the roster stabilizes)

Canvas: We will not be using Canvas.
Piazza: We will not be using Piazza.
Course directory: /afs/ece/class/ece845

3. Reference material

There is no required textbook for 18-845. The following are standard references for Linux programming and network programming:
  • Michael Kerrisk, The Linux Programming Interface: A Linux and UNIX System Programming Handbook, No Starch Press, 2010.
  • W. Richard Stevens, Bill Fenner, Andrew M. Rudoff Unix Network Programming: The Sockets Networking API, Volume 1 (3rd Edition), Prentice Hall, 2003.
The CS:APP3e text, which is available in the campus bookstore and on permanent reserve in the Engineering library, covers system-level programming topics such as dynamic linking, process control, Unix I/O, the sockets interface, writing Web servers, and application level concurrency and synchronization:

4. Linux cluster resources

  • Andrew cluster: linux.andrew.cmu.edu
    • RHEL, 64-bit, login using your Andrew credentials
  • SCS Gates cluster: ghc{26..86}.ghc.andrew.cmu.edu.
    • RHEL, 64-bit, login using your Andrew credentials
    • Machines ghc{26..46} contain NVIDIA GeForc GTX 1080 GPUs. The Wikipedia entry for GeForce 10 GPUs provides useful information about this model of GPU. They support CUDA compute capability 6.1.
  • ECE cluster: ece{000-031}.ece.local.cmu.edu
    • SuSE, 64 bit, login using your ECE credentials
    • See here for details. Contact help@ece.cmu.edu for help with accounts.

5. Course schedule (final version)

Legend: IP: individual project, GP: group project

Class Date Day Topic Projects Discussion Leader
1 02/01 Mon Intro and welcome Dave O'Hallaron
2 02/03 Wed System design principles IP out Dave O'Hallaron
3 02/08 Mon Server design basics Dave O'Hallaron
4 02/10 Wed Event delivery mechanisms Dave O'Hallaron
5 02/15 Mon Motivating application: Google Search Nevin Zheng
6 02/17 Wed Stragglers: The tail at scale Abha Agrawal
7 02/22 Mon Consistent hashing: Chord Sebastian Montiel
8 02/24 Wed Google file system (GFS)/Colossus IP due 11:59pm Aditya Bhawkar
9 03/1 Mon Data processing: MapReduce Vishesh Goyal
10 03/03 Wed Classical replication: Paxos Shrivar Bhuwalka
11 03/08 Mon Modern replication: Raft Viha Gupta
12 03/10 Wed Modern replication: Aegean Chaoran Zhang
13 03/15 Mon Lock services: Chubby Nevin Zheng
14 03/17 Wed Distributed stores: BigTable [GP abstracts due] Yuxiang Zi
15 03/22 Mon Distributed stores: Spanner Utkarsh Murarka
16 03/24 Wed Distributed stores: memcached Shubham Gupta
17 03/29 Mon Distributed stores: Dynamo Zhongyan Huang
18 03/31 Wed DRAM-based storage: RAMCloud Griffin Della Grotte
19 04/05 Mon No class ---
20 04/07 Wed Datacenter management: Borg [GP oral mid-term reports due, in class] Dave O'Hallaron
21 04/12 Mon Warehouse-scale computing Young Hun Oh
22 04/14 Wed Virtual machines: VMWare Shrivar Bhuwalka
23 04/19 Mon Virtual machines: Xen Carlos Gonzalez
24 04/21 Wed Virtual machines: Live Migration Dave O'Hallaron
25 04/26 Mon No class - GP prep ---
26 04/28 Wed No class - GP prep GP reports due Thur, 4/29/21, 11:59pm ---
27 05/3 Mon No class - GP prep GP reviews due ---
28 05/5 Wed GP Poster session via Zoom all
05/9 Sun GP final reports due Sunday 5/9, 11:59pm ---

6. Detailed course schedule (final version)

Students who are not leading the discussion for a particular class should prepare a single 1-page critique. Unless explictly noted, the critique should cover all papers with a "*".

Bring a hardcopy (no email) of your critique with you to class and give it to the instructor after class. The instructor will grade it and return it to you next class.

Class 1: Welcome and intro

Class 2: System design principles

  • Note: Your critique should list three other examples (not discussed by the authors) of end-to-end arguments in system design.
  • *J. Saltzer, D. Reed, and D. Clark, End-to-End Arguments in System Design, ACM Transactions on Computer Systems, Vol 2, No 4, Nov, 1984. (pdf)

Class 3: Server design: Basics

  • Note: Please write a single critique covering both papers.
  • *V. Pai, P. Druschel, and W. Zwaenepoel, Flash: An efficient and portable Web server, Proceedings of the USENIX 1999 Annual Technical Conference, 1999. (pdf)
  • *Tim Brecht , David Pariag, and Louay Gammo, accept()able Strategies for Improving Web Server Performance, Proceedings of the USENIX 2004 Annual Technical Conference, June, 2004. (pdf)

Class 4: Server design: Event delivery mechanisms

  • *Gaurav Banga, Jeff Mogul and Peter Druschel, A scalable and explicit event delivery mechanism for UNIX, in the Proceedings of the USENIX 1999 Technical Conference, June 1999. (pdf)
  • David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla, Comparing the Performance of Web Server Architectures, EuroSys 2007, Lisbon, Portugal, March, 2007. (pdf)

Class 5: Motivating application: Google search

  • Note: Please write a single critique covering both papers.
  • *Sergey Brin and Larry Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Seventh International World Wide Web Conference / Computer Networks 30(1-7): 107-117. 1998. (pdf)
  • *Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, The PageRank Citation Ranking: Bringing Order to the Web, 1998. (pdf)
  • Ian Rogers, The Google Pagerank Algorithm and How It Works, May, 2002. (html)

Class 6: Stragglers: Tail at scale

  • *Jeffrey Dean and Luiz Andre Barroso, The Tail at Scale, in Communications of the ACM, Feb. 2013, (pdf)

Class 7: Consistent hashing: Chord

  • *I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, in SIGCOMM'01, Aug. 2001, (pdf)

Class 8: Google file system (GFS)/Colossus

  • Note: Please write a single critique covering both papers.
  • *Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System, in Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)
  • *Kirk McKusick and Sean Quinlan, GFS: Evolution on Fast-Forward, CACM, March, 2010. (html)

Class 9: Data processing: MapReduce

  • *J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in Proceedings of Sixth Symposium on Operating System Design and Implementation, December, 2004. (pdf)
  • J. Summers, The Friendship that Made Google Huge, New Yorker, Dec 3, 2018. Beautiful article about the 20-year friendship between Jeff Dean and Sanjay Ghemawat and the unique pair programming approach they've used to build some of Google's most important systems. (html)

Class 10: Classical replicaton: Paxos

  • *Tushar Chandra, Robert Griesemer, Joshua Redstone, Paxos Made Live - An Engineering Perspective, in ACM Symposium on Principles of Distributed Computing (PODC '07), Aug, 2007. (pdf)
  • Michael Swift, "Paxos, Agreement, Consensus", Lecture notes for CS 739, Spring 2012, Univ of Wisc, A clear and concise description of the algorithm and its behavior under various scenarios (pdf)
  • Angus MacDonald, Paxos by Example, Web post, 2018. (html). Helpful step-by-step example with multiple leaders.
  • Leslie Lamport, Paxos Made Simple, ACM SIGACT News (Distributed Computing Column) 32, 4 (December 2001) 51-58. (pdf)

Class 11: Modern replication: Raft

  • Diego Ongaro and John Ousterhout, In Search of an Understandable Consensus Algorithm, USENIX, 2014. (pdf)

Class 12: Modern replication: Aegean

  • *R. Aksoy and M. Dapritsos, Aegean: Replication beyond the client-server model, in SOSP'19, October, 2019. (pdf)

Class 13: Lock services: Chubby

  • *M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in Proceedings of the Seventh Symposium on Operating System Design and Implementation (OSDI'06), December, 2006. (pdf)

Class 14: Distributed Stores: BigTable

  • *F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A Distributed Storage System for Structured Data, in Proceedings of the Seventh Symposium on Operating System Design and Implementation (OSDI'06), December, 2006. (pdf)

Class 15: Distributed stores: Spanner

  • *J. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford, Spanner: Google's Globally-Distributed Database, OSDI'12, 2012, Jay Lepreau Best Paper Award. (pdf)

Class 16: Distributed stores: memcached

  • *R. Nishtala et al, Scaling Memcache at Facebook, NSDI '13. (pdf)

Class 17: Distributed stores: Dynamo

  • *G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramnian, P. Vosshal, and W. Vogels, Dynamo: Amazon's Highly Available Key-value Store, SOSP'07, Oct. 2007, (pdf)

Class 18: DRAM-based storage: RAMCloud

  • *D. Ongaro, S. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum, Fast Crash Recovery in RAMCloud, SOSP'11, Oct. 2011 (pdf)

Class 20: Datacenter management

  • *Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes, Large-scale cluster management at Google with Borg, EuroSys 2015, Bordeaux, France. (pdf)

Class 21: Warehouse-scale computing

  • Note: Write a single critique covering Chapters 1-3, and 5.
  • *Luiz Andre Barrosa, Urs Holzle, and Parthasarathy Ranganathan, The Datacenter as a Computer: Designing Warehouse-scale Machines, Third Edition, Morgan & Claypool, 2018. (pdf)

Class 22: Virtual machines: VMWare

  • *K. Adams, and O. Agesen, A Comparison of Software and Hardware Techniques for x86 Virtualization, In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS'06), 2006. (pdf)
  • *G. Neiger, A. Santoni, F. Leung, D. Rodgers, R. Uhlig, "Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization", Intel Technology Journal, Aug, 2006. Please skip all discussion of the Itaniums VT-i (pdf)

Class 23: Virtual machines: Xen

  • *P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfiel, Xen and the Art of Virtualization, In Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)

Class 24: Virtual machines: Live migration

  • *Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hanseny, Eric July, Christian Limpach, Ian Pratt, Andrew Warfield, Live Migration of Virtual Machines, NSDI '05, 2005. (pdf)

Class 25: No class

Class 26: No class

Class 276: No class

Class 28: GP poster session