18-845: Internet Services
Carnegie Mellon University, Spring 2022

Syllabus (pdf) | Critiques | Individual Project (IP) | Group Project (GP)

1. Instructor

Prof. David O'Hallaron, droh@andrew.cmu.edu, GHC 7517
Office hours: Mon 4:30pm-5:30pm EST (or by appt.) Zoom

TA: Megan Strauss, mstrauss@andrew.cmu.edu
Office hours: Thu 12:00pm-1:00pm EST (or by appt.) via Zoom n

2. Organization

Class times: Mon and Wed, 2:30-3:50pm, WeH 4623

Note: We wil be remote and synchronous the remainder of the semester via Zoom.


Web page: http://www.ece.cmu.edu/~ece845
Class mailing list: 18-845@cs.cmu.edu (After the roster stabilizes)

Canvas: We will not be using Canvas.
Piazza: We will not be using Piazza.
Course directory: /afs/ece/class/ece845

3. Reference material

There is no required textbook for 18-845. The following are standard references for Linux programming and network programming:
  • Michael Kerrisk, The Linux Programming Interface: A Linux and UNIX System Programming Handbook, No Starch Press, 2010.
  • W. Richard Stevens, Bill Fenner, Andrew M. Rudoff Unix Network Programming: The Sockets Networking API, Volume 1 (3rd Edition), Prentice Hall, 2003.
The CS:APP3e text, which is available in the campus bookstore and on permanent reserve in the Engineering library, covers system-level programming topics such as dynamic linking, process control, Unix I/O, the sockets interface, writing Web servers, and application level concurrency and synchronization:

4. Linux cluster resources

  • Andrew cluster: linux.andrew.cmu.edu
    • RHEL, 64-bit, login using your Andrew credentials
  • SCS Gates cluster: ghc{26..86}.ghc.andrew.cmu.edu.
    • RHEL, 64-bit, login using your Andrew credentials
    • Machines ghc{26..46} contain NVIDIA GeForc GTX 1080 GPUs. The Wikipedia entry for GeForce 10 GPUs provides useful information about this model of GPU. They support CUDA compute capability 6.1.
  • ECE cluster: ece{000-031}.ece.local.cmu.edu
    • SuSE, 64 bit, login using your ECE credentials
    • See here for details. Contact help@ece.cmu.edu for help with accounts.

5. Course schedule (final)

Legend: IP: individual project, GP: group project

Class Date Day Topic Projects Discussion Leader
1 01/19 Wed Intro and welcome Dave O'Hallaron
2 01/24 Mon System design principles IP out Dave O'Hallaron
3 01/26 Wed Server design basics Dave O'Hallaron
4 01/31 Mon Event delivery mechanisms Dave O'Hallaron
5 02/02 Wed Motivating application: Google Search Gwyneth Chen
6 02/07 Mon Stragglers: The tail at scale Supreet Deshpande
7 02/09 Wed Consistent hashing: Chord Heng Wang
8 02/14 Mon Google file system (GFS)/Colossus IP due 11:59pm Yuchen Wu
9 02/16 Wed Data processing: MapReduce Ge Song
10 02/21 Mon Classical replication: Paxos Zhengyu Hu, Haoyu Pang
11 02/23 Wed Modern replication: Raft Zhiyu (Scarlett) Song
12 02/28 Mon Modern replication: Aegean Zefeng Wang
13 03/02 Wed Lock services: Chubby Haoyue Zhao
14 03/07 Mon No class - Spring break ---
15 03/09 Wed No class - Spring break ---
16 03/14 Mon Distributed stores: BigTable Funmbi Jaiyeola
17 03/16 Wed Distributed stores: Spanner GP abstracts Haoyu (Gary) Zhang
18 03/21 Mon Distributed stores: memcached Chengkai Li
19 03/23 Wed Distributed stores: Dynamo Yihua (Alex) Pu
20 03/28 Mon DRAM-based storage: RAMCloud Chien-Hung (Daniel) Wang
21 03/30 Wed Datacenter management: Borg Minghao Liu
22 04/04 Mon Warehouse-scale computing [GP oral mid-term reports due, in class] Yuanlin Wu
23 04/06 Wed Virtual machines: VMWare Tianya Chen
24 04/11 Mon Virtual machines: Xen Haoren (Heron) Deng
25 04/13 Wed Virtual machines: Live Migration Weihao (Wesley) Ye
26 04/18 Mon No class - GP prep ---
27 04/20 Wed No class - GP prep GP final reports due Thur 4/21 11:59pm ---
28 04/25 Mon No class - GP prep ---
29 04/27 Wed GP critiques due, 11:59 pm (Poster session cancelled) --
05/1 Sun GP camera-ready reports due Sunday 5/1, 11:59pm ---

6. Detailed course schedule (final)

Students who are not leading the discussion for a particular class should prepare a single 1-page critique. Unless explictly noted, the critique should cover all papers with a "*". Please upload your critique to Gradescope before class.

Class 1: Welcome and intro

Class 2: System design principles

  • Note: Your critique should list three other examples (not discussed by the authors) of end-to-end arguments in system design.
  • *J. Saltzer, D. Reed, and D. Clark, End-to-End Arguments in System Design, ACM Transactions on Computer Systems, Vol 2, No 4, Nov, 1984. (pdf)

Class 3: Server design: Basics

  • Note: Please write a single critique covering both papers.
  • *V. Pai, P. Druschel, and W. Zwaenepoel, Flash: An efficient and portable Web server, Proceedings of the USENIX 1999 Annual Technical Conference, 1999. (pdf)
  • *Tim Brecht , David Pariag, and Louay Gammo, accept()able Strategies for Improving Web Server Performance, Proceedings of the USENIX 2004 Annual Technical Conference, June, 2004. (pdf)

Class 4: Server design: Event delivery mechanisms

  • *Gaurav Banga, Jeff Mogul and Peter Druschel, A scalable and explicit event delivery mechanism for UNIX, in the Proceedings of the USENIX 1999 Technical Conference, June 1999. (pdf)

Class 5: Motivating application: Google search

  • Note: Please write a single critique covering both papers.
  • *Sergey Brin and Larry Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Seventh International World Wide Web Conference / Computer Networks 30(1-7): 107-117. 1998. (pdf)
  • *Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, The PageRank Citation Ranking: Bringing Order to the Web, 1998. (pdf)
  • Ian Rogers, The Google Pagerank Algorithm and How It Works, May, 2002. (html)

Class 6: Stragglers: Tail at scale

  • *Jeffrey Dean and Luiz Andre Barroso, The Tail at Scale, in Communications of the ACM, Feb. 2013, (pdf)

Class 7: Consistent hashing: Chord

  • *I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, in SIGCOMM01, Aug. 2001, (pdf)

Class 8: Google file system (GFS)/Colossus

  • Note: Please write a single critique covering both papers.
  • *Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System, in Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)
  • *Kirk McKusick and Sean Quinlan, GFS: Evolution on Fast-Forward, CACM, March, 2010. (html)

Class 9: Data processing: MapReduce

  • *J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in Proceedings of Sixth Symposium on Operating System Design and Implementation, December, 2004. (pdf)
  • J. Summers, The Friendship that Made Google Huge, New Yorker, Dec 3, 2018. Beautiful article about the 20-year friendship between Jeff Dean and Sanjay Ghemawat and the unique pair programming approach they've used to build some of Google's most important systems. (html)

Class 10: Classical replicaton: Paxos

  • *Tushar Chandra, Robert Griesemer, Joshua Redstone, Paxos Made Live - An Engineering Perspective, in ACM Symposium on Principles of Distributed Computing (PODC '07), Aug, 2007. (pdf)
  • Michael Swift, "Paxos, Agreement, Consensus", Lecture notes for CS 739, Spring 2012, Univ of Wisc, A clear and concise description of the algorithm and its behavior under various scenarios (pdf)
  • Angus MacDonald, Paxos by Example, Web post, 2018. (html). Helpful step-by-step example with multiple leaders.
  • Leslie Lamport, Paxos Made Simple, ACM SIGACT News (Distributed Computing Column) 32, 4 (December 2001) 51-58. (pdf)

Class 11: Modern replication: Raft

  • Diego Ongaro and John Ousterhout, In Search of an Understandable Consensus Algorithm, USENIX, 2014. (pdf)

Class 12: Modern replication: Aegean

  • *R. Aksoy and M. Dapritsos, Aegean: Replication beyond the client-server model, in SOSP'19, October, 2019. (pdf)

Class 13: Lock services: Chubby

  • *M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in Proceedings of the Seventh Symposium on Operating System Design and Implementation (OSDI'06), December, 2006. (pdf)

Class 14: No class - Spring break

Class 15: No class - Spring break

Class 16: Distributed Stores: BigTable

  • *F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A Distributed Storage System for Structured Data, in Proceedings of the Seventh Symposium on Operating System Design and Implementation (OSDI'06), December, 2006. (pdf)

Class 17: Distributed stores: Spanner

  • *J. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford, Spanner: Google's Globally-Distributed Database, OSDI'12, 2012, Jay Lepreau Best Paper Award. (pdf)

Class 18: Distributed stores: memcached

  • *R. Nishtala et al, Scaling Memcache at Facebook, NSDI '13. (pdf)

Class 19: Distributed stores: Dynamo

  • *G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramnian, P. Vosshal, and W. Vogels, Dynamo: Amazon's Highly Available Key-value Store, SOSP'07, Oct. 2007, (pdf)

Class 20: DRAM-based storage: RAMCloud

  • *D. Ongaro, S. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum, Fast Crash Recovery in RAMCloud, SOSP'11, Oct. 2011 (pdf) Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout, Log-structured Memory for DRAM-based Storage, FAST'14. Awarded best paper. (pdf)

Class 21: Datacenter management

  • *Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes, Large-scale cluster management at Google with Borg, EuroSys 2015, Bordeaux, France. (pdf)

Class 22: Warehouse-scale computing

  • Note: Write a single critique covering Chapters 1-3, and 5.
  • *Luiz Andre Barrosa, Urs Holzle, and Parthasarathy Ranganathan, The Datacenter as a Computer: Designing Warehouse-scale Machines, Third Edition, Morgan & Claypool, 2018. (pdf)

Class 23: Virtual machines: VMWare

  • *K. Adams, and O. Agesen, A Comparison of Software and Hardware Techniques for x86 Virtualization, In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS'06), 2006. (pdf)
  • G. Neiger, A. Santoni, F. Leung, D. Rodgers, R. Uhlig, "Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization", Intel Technology Journal, Aug, 2006. Please skip all discussion of the Itaniums VT-i (pdf)

Class 24: Virtual machines: Xen

  • *P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfiel, Xen and the Art of Virtualization, In Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)

Class 25: Virtual machines: Live migration

  • *Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hanseny, Eric July, Christian Limpach, Ian Pratt, Andrew Warfield, Live Migration of Virtual Machines, NSDI '05, 2005. (pdf)

Class 26: No class

Class 27: No class

Class 28: No class

Class 29: GP poster session