18-845: Internet Services
Carnegie Mellon University, Spring 2022
Syllabus (pdf)
|
Critiques
|
Individual Project (IP)
|
Group Project (GP)
1. Instructor
Prof. David O'Hallaron,
droh@andrew.cmu.edu, GHC 7517
Office hours: Mon 4:30pm-5:30pm EST (or by appt.) Zoom
TA: Megan Strauss, mstrauss@andrew.cmu.edu
Office hours: Thu 12:00pm-1:00pm EST (or by appt.)
via Zoom
n
2. Organization
Class times: Mon and Wed, 2:30-3:50pm, WeH 4623
Note: We wil be remote and synchronous the remainder of the semester via
Zoom.
Web page: http://www.ece.cmu.edu/~ece845
Class mailing list: 18-845@cs.cmu.edu (After the roster stabilizes)
Canvas: We will not be using Canvas.
Piazza: We will not be using Piazza.
Course directory: /afs/ece/class/ece845
3. Reference material
There is no required textbook for 18-845.
The following are standard references for Linux programming and network programming:
-
Michael Kerrisk, The Linux Programming Interface: A Linux and UNIX
System Programming Handbook, No Starch Press, 2010.
- W. Richard Stevens, Bill Fenner, Andrew M. Rudoff Unix Network
Programming: The Sockets Networking API, Volume 1 (3rd Edition),
Prentice Hall, 2003.
The CS:APP3e text, which is
available in the campus bookstore and on permanent reserve in the
Engineering library, covers system-level programming topics such as
dynamic linking, process control, Unix I/O, the sockets interface,
writing Web servers, and application level concurrency and
synchronization:
4. Linux cluster resources
- Andrew cluster: linux.andrew.cmu.edu
- RHEL, 64-bit, login using your Andrew credentials
- SCS Gates cluster: ghc{26..86}.ghc.andrew.cmu.edu.
- RHEL, 64-bit, login using your Andrew credentials
- Machines ghc{26..46} contain NVIDIA GeForc GTX 1080 GPUs. The
Wikipedia entry for GeForce 10 GPUs provides useful information about
this model of GPU. They support CUDA compute capability 6.1.
- ECE cluster: ece{000-031}.ece.local.cmu.edu
- SuSE, 64 bit, login using your ECE credentials
- See here for details. Contact help@ece.cmu.edu for help with accounts.
5. Course schedule (final)
Legend:
IP: individual project,
GP: group project
6. Detailed course schedule (final)
Students who are not leading the discussion for a particular class
should prepare a single 1-page critique. Unless
explictly noted, the critique should cover all papers with a "*".
Please upload your critique to Gradescope before class.
Class 1: Welcome and intro
Class 2: System design principles
- Note: Your critique should list three other examples (not discussed
by the authors) of end-to-end arguments in system design.
- *J. Saltzer, D. Reed, and D. Clark,
End-to-End Arguments in System Design,
ACM Transactions on Computer Systems, Vol 2, No 4, Nov, 1984.
(pdf)
Class 3: Server design: Basics
- Note: Please write a single critique covering both papers.
-
*V. Pai, P. Druschel, and W. Zwaenepoel, Flash: An efficient and portable
Web server, Proceedings of the USENIX 1999 Annual Technical Conference,
1999. (pdf)
-
*Tim Brecht , David Pariag, and Louay Gammo,
accept()able Strategies for Improving Web Server Performance,
Proceedings of the USENIX 2004 Annual Technical Conference, June, 2004.
(pdf)
Class 4: Server design: Event delivery mechanisms
-
*Gaurav Banga, Jeff Mogul and Peter Druschel, A scalable and
explicit event delivery mechanism for UNIX, in the Proceedings of
the USENIX 1999 Technical Conference, June 1999. (pdf)
Class 5: Motivating application: Google search
- Note: Please write a single critique covering both papers.
-
*Sergey Brin and Larry Page,
The Anatomy of a Large-Scale Hypertextual Web Search Engine,
Seventh International World
Wide Web Conference / Computer Networks 30(1-7): 107-117. 1998.
(pdf)
-
*Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd,
The PageRank Citation Ranking: Bringing Order to the Web,
1998.
(pdf)
- Ian Rogers, The Google Pagerank Algorithm and How It Works, May, 2002.
(html)
Class 6: Stragglers: Tail at scale
-
*Jeffrey Dean and Luiz Andre Barroso,
The Tail at Scale, in Communications of the ACM, Feb. 2013,
(pdf)
Class 7: Consistent hashing: Chord
-
*I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan,
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, in SIGCOMM01, Aug. 2001,
(pdf)
Class 8: Google file system (GFS)/Colossus
- Note: Please write a single critique covering both papers.
-
*Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung,
The Google File System, in Proceedings of the
19th ACM Symposium on Operating Systems Principles,
October, 2003.
(pdf)
-
*Kirk McKusick and Sean Quinlan, GFS: Evolution on Fast-Forward, CACM, March, 2010.
(html)
Class 9: Data processing: MapReduce
-
*J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large
Clusters, in Proceedings of Sixth Symposium on Operating System Design and
Implementation, December, 2004.
(pdf)
-
J. Summers, The Friendship that Made Google Huge, New Yorker,
Dec 3, 2018. Beautiful article about the 20-year friendship between
Jeff Dean and Sanjay Ghemawat and the unique pair programming approach
they've used to build some of Google's most important systems.
(html)
Class 10: Classical replicaton: Paxos
-
*Tushar Chandra, Robert Griesemer, Joshua Redstone, Paxos Made Live
- An Engineering Perspective, in ACM Symposium on Principles of
Distributed Computing (PODC '07), Aug, 2007.
(pdf)
- Michael Swift, "Paxos, Agreement, Consensus", Lecture notes for CS
739, Spring 2012, Univ of Wisc, A clear and concise description of
the algorithm and its behavior under various scenarios
(pdf)
- Angus MacDonald, Paxos by Example, Web post, 2018.
(html).
Helpful step-by-step example with multiple leaders.
-
Leslie Lamport, Paxos Made Simple, ACM SIGACT News (Distributed
Computing Column) 32, 4 (December 2001) 51-58.
(pdf)
Class 11: Modern replication: Raft
-
Diego Ongaro and John Ousterhout, In Search of an Understandable
Consensus Algorithm, USENIX, 2014.
(pdf)
Class 12: Modern replication: Aegean
-
*R. Aksoy and M. Dapritsos, Aegean: Replication beyond the client-server model,
in SOSP'19, October, 2019.
(pdf)
Class 13: Lock services: Chubby
-
*M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in
Proceedings of the Seventh Symposium on Operating System Design and Implementation (OSDI'06),
December, 2006.
(pdf)
Class 14: No class - Spring break
Class 15: No class - Spring break
Class 16: Distributed Stores: BigTable
-
*F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows,
T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A Distributed Storage
System for Structured Data, in Proceedings of the Seventh Symposium on
Operating System Design and Implementation (OSDI'06), December, 2006.
(pdf)
Class 17: Distributed stores: Spanner
- *J. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman,
S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh,
S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle,
S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor,
R. Wang, and D. Woodford,
Spanner: Google's Globally-Distributed Database,
OSDI'12, 2012, Jay Lepreau Best Paper Award.
(pdf)
Class 18: Distributed stores: memcached
-
*R. Nishtala et al,
Scaling Memcache at Facebook, NSDI '13.
(pdf)
Class 19: Distributed stores: Dynamo
-
*G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramnian, P. Vosshal, and W. Vogels,
Dynamo: Amazon's Highly Available Key-value Store,
SOSP'07, Oct. 2007, (pdf)
Class 20: DRAM-based storage: RAMCloud
-
*D. Ongaro, S. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum,
Fast Crash Recovery in RAMCloud, SOSP'11, Oct. 2011
(pdf)
Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout,
Log-structured Memory for DRAM-based Storage, FAST'14. Awarded best paper.
(pdf)
Class 21: Datacenter management
-
*Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer,
Eric Tune, John Wilkes,
Large-scale cluster management at Google with Borg,
EuroSys 2015, Bordeaux, France.
(pdf)
Class 22: Warehouse-scale computing
- Note: Write a single critique covering Chapters 1-3, and 5.
-
*Luiz Andre Barrosa, Urs Holzle, and Parthasarathy Ranganathan, The Datacenter as a
Computer: Designing Warehouse-scale Machines, Third Edition, Morgan & Claypool, 2018.
(pdf)
Class 23: Virtual machines: VMWare
-
*K. Adams, and O. Agesen, A Comparison of Software and Hardware
Techniques for x86 Virtualization, In Proceedings of the 12th
international conference on Architectural support for programming
languages and operating systems (ASPLOS'06), 2006.
(pdf)
-
G. Neiger, A. Santoni, F. Leung, D. Rodgers, R. Uhlig, "Intel
Virtualization Technology: Hardware Support for Efficient Processor
Virtualization", Intel Technology Journal, Aug, 2006.
Please skip all discussion of the Itaniums VT-i
(pdf)
Class 24: Virtual machines: Xen
- *P. Barham, B. Dragovic, K. Fraser, S. Hand,
T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfiel,
Xen and the Art of Virtualization, In
Proceedings of the 19th ACM Symposium on Operating Systems Principles,
October, 2003.
(pdf)
Class 25: Virtual machines: Live migration
-
*Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hanseny, Eric
July, Christian Limpach, Ian Pratt, Andrew Warfield, Live Migration of
Virtual Machines, NSDI '05, 2005.
(pdf)
Class 26: No class
Class 27: No class
Class 28: No class
Class 29: GP poster session
|