18-845 Individual Project (IP)

Assigned: Mon, Jan 24, 2022
Due: 11:59pm, Mon, Feb 14, 2022

Note: in the following, $coursedir refers to /afs/ece/class/ece845.

Intro

For this project, you will design and implement your own protocol for serving dynamic Web content. The purpose is to give you some practical context when we study the research issues.

Description

The project has three parts:
  • Part I: Implement a baseline concurrent Web server in C (recommended) or another language of your choice (be careful).

  • Part II: Design an efficient protocol for serving dynamic content, and then implement an optimized version of the baseline concurrent server that uses your protocol.

  • Part III: Evaluate the performance of your baseline and optimized servers, characterizing the performance improvement of your new server.

Part I: Baseline concurrent server

Here are the requirements for the baseline server:
  • Implements HTTP/1.0 GET requests for static and dynamic content.
  • Assumes one connection per request (no persistent connections).
  • Uses the CGI protocol (as implemented by the Tiny server) to serve dynamic content.
  • Serves HTML (.html), image (.gif and .jpg), and text (.txt) files.
  • Accepts a single command-line argument: the port to listen on.
  • Implements concurrency using either threads or I/O multiplexing.

Part II: Optimized concurrent server

The idea here is to improve the performance of your baseline server by replacing the standard CGI protocol with a protocol of your own design. This is entirely open-ended. Anything goes.

There are a number of existing standards for this kind of thing, such as ISAPI (Microsoft), NSAPI (Netscape), and fast-cgi. However, I would encourage you to forget about these and start from first principles. Design something that is simple and fast. A good design is likely to include some combination of dynamic linking, pre-threading, and code caching.

Note: Pre-existing libraries that provide optimized dynamic content, such as FastCGI or equivalent, will not be accepted. Your job is to design and implement such a library.

Part III: Evaluation of baseline and optimized servers

In this part, you will evaluate how well your baseline and optimized servers can serve dynamic content. Some options are to compare request throughputs the performance as measured on the server, and/or latencies as measured from the client. The goal is to convince your TA that your approach is significantly faster than CGI. Be prepared to defend the metrics you use to evaluate performance.

Some tools for performance benchmarking servers, in rough order of quality, based on reports from 18-845 students:

  • Gatling and Seige. Both get favorable comments from 18-845 students.
  • ApacheBench (ab). Note: the ab program is bundled with the Apache Web server. You can download this from the Apache site. A standalone version is available here. Documentation is available here.
  • httperf. Note: This tool hasn't been updated for awhile and tends to fail on Ubuntu and Debian systems. However, students report that it works reliably on Red Hat systems, such as the Andrew Linux machines (linux.andrew.cmu.edu and ghcX.ghc.andrew.cmu.edu machines). Feel free to use these. Another solution is to apply for a free Red Hat AWS micro instance and run httperf there.
  • autobench. Note: this is a wrapper around httperf.

Handin instructions

Tar up the directory containing your solution in a file called "ANDREWID.tar", where ANDREWID is your Andrew login name, and copy it to $coursedir/ip/handin. You have list and insert privileges only in this directory. If you need to hand in twice, put a number after later handins, e.g., ANDREWID-1.tar, ANDREWID-2.tar, and so on.

IP evaluation

Evaluation of the IP will be done by a live demo with your TA. Please arrange your demo time with your TA. The projects are open-ended and so is the evaluation. Here are the rough guidelines:
  • Baseline server (35%). The goal here is just to get it working serving multiple clients concurrently using standard CGI (as implemented by the Tiny web server).
  • Optimized server (35%). The idea here is to come up with a design that attacks the biggest overheads associated with running CGI programs: fork and exec.
  • Evaluation (30%). The idea here is to develop a testing infrastructure and workloads that will allow you to compare the performance of the baseline server against the optimized server. You will be demonstrating this testing infrastructure to your TA. It's your responsibility to come up with a convincing evaluation methodology and testing infrastructure.

Sources of information

  • Students in a graduate class are expected to debug their own programs. The staff is delighted to discuss design issues with you, but please don't ask them to debug your programs.
  • The 15-213 textbook, known as the CS:APP book, contains all of the programming information that you need to complete the project, covering dynamic linking (ch 7.11), process control (ch 8.4), Unix signals (ch 8.5), Unix I/O (ch 10), network programming (ch 11), CGI protocol (ch 11.5), and application-level concurrency and synchronization (ch 12). The E&S library in Wean Hall has multiple copies on reserve.
  • Numerous code examples, including the csapp.c and csapp.h files, and the Tiny Web server, are available from the CS:APP Student Web Site.
  • Refer to the HTTP 1.0 specification for questions about HTTP.
  • Volume 1 of Stevens is also an excellent reference for advanced topics in sockets programming.
  • Consider using tools such as curl, wget, nc (netcat), or telnet as the client to do basic debugging of your server.