18-845 Group Project (GP)

Important dates

  1. Wed, Mar 13 (11:59pm): GP abstracts due
    • Email your abstract to droh@cs.cmu.edu (plain text in email body)

  2. Mon, Apr 1: GP oral status reports due, in class
    • In class

  3. Thu, Apr 18 (11:59pm) : GP final reports due
    • Email your completed report (pdf format only) to droh@cs.cmu.edu

  4. Mon, Apr 22 (11:59) : GP reviews due
    • Email your assigned anonymized critiques (one PDF file per critique) to droh@cs.cmu.edu

  5. Sun, Apr 28 (11:59pm): Final camera-ready GP reports due
    • Email your report (pdf format) to droh@cs.cmu.edu

1. Instructions for Preparing Your GP abstracts

  • The abstract must contain the following parts:
    • Title
    • Authors
    • One or two paragraphs describing the question(s) you want to answer, and what will do to answer the question(s).
    • A paragraph describing the expected result (What do you hope to learn? What conclusions do you hope to draw?)

2. Instructions for Delivering Your GP Mid-Term Oral Report

    n
  • Each group will give a brief report during class on their project.

3. Instructions for Preparing and Submitting Your GP Reports

  • Reports are limited to 10 pages (this is a hard limit).
  • Font size must be at least 10pt (but 11pt is even better).
  • Reports must follow the official ACM Proceedings format. Use the 10pt Latex template provided here, or the 10pt Word template provided here.
  • Reports should include somthing like the following sections:
    • Abstract - A paragraph that summarizes the problem and the results.
    • Introduction - Sets the context, describes the problem, and describes your solution.
    • Description - One or more sections that describes the problem and your approach to the solution in detail.
    • Evaluation - A section that quantitatively evaluates your ideas.
    • Related work - Compare and contrast related work. Don't just enumerate.
    • Summary and Conclusions - Summarize what you did and what interesting things you learned from the project.
  • Send your reports to droh@cs.cmu.edu.

4. Instructions for Reviewing Your Classmates' Reports

  • Each report will be formally reviewed by three reviewers: your instructor and two classmates randomly chosen by your instructor. Thus, every student will receive three reviews of their project. The two student reviews should be anonymous.
  • Your instructors will evaluate the quality of your reviews as part of your overall project score.
  • Use the same review template you used for your critiques during the semester.
  • Send each review as a separate PDF file attachment to droh@cs.cmu.edu. Don't forget to remove your name.

Hints for Coming Up with a Topic

If you are currently working on a masters or Ph.D. thesis, we encourage you to pursue a topic that is directly related to your thesis research. It's OK (ideal in fact!) to use the group project as a way to make progress on your thesis.

There are two basic approaches you can use for your group research projects:

  • Develop a new idea or a new twist on an existing idea, and then do enough evaluation to serve as a proof of concept.
  • Do an extensive evaluation of an existing idea that gives you some insight into the advantages or disadvantages of that idea.

Here are examples of some project ideas from previous years. Feel free to use any of these for inspiration:

Here are some other ideas for topics (in no particular order):

  • Evaluation of performance issues in high-speed non-blocking servers.The idea here is to build a high speed server that never blocks on I/O (such as the Flash server from Rice) and then do extensive micro-evaluation of its performance in order to understand the extent of the performance gain that is possible from such non-blocking servers.

  • Threads vs events in high speed servers. We've seen a number of conflicting conclusions in our readings. Which approach is better? Compare and contrast the performance implications of kernel-level threads, cooperatively scheduled user-level threads, and event systems based on select().

  • Monitoring in a non-cooperative environment. Stefan Savage at UCSD has developed a powerful technique for estimating end-to-end bandwidths and packet-loss between hosts, where the remote host is not cooperative in the sense that it would be impossible to get an account on the machine (e.g., the Yahoo server). Savage's approach is to exploit the behavior of TCP (which all servers must implement to the specification) to gain information about the effective bandwidth from the server to the client. For this project, you might apply this general idea in some new context, or use Savage's method for estimating packet loss in the context of a larger application. For example, would it be possible to use Savage's technique to build a client-side performance monitoring system that, for a given HTTP transaction, would isolate the network transmission time from the server processing time and determine which is the bottleneck?

  • Attaching geographical locations to IP addresses in the context of a world-wide disaster monitoring system. When natural disasters such as earthquakes occur, it is very difficult to make accurate estimates of the geographical extent and severity of the damage because the communication infrastructure disappears. However, hosts that provide Internet services are always turned on, so the lack of response from those systems contains some information. The idea is to build a system that would sample hosts in earthquake prone regions on a continual basis. Each sample is a bit vector, one bit per host. Some interesting issues are assigning IP addresses to geographical locations, developing a hierarchical scheme to aggregate response bit vectors, and developing analysis techniques of the response bit vectors to distinguish transients (e.g. localized power failures or normal host downtime) from real damage.

  • Scalable search engines. Current search engines are not scalable because all of the work is done at the remote server site. As a result, the servers are not able to perform much computation when they satisfy a request, typically a quick lookup of an inverted index. As a result, single-word queries, which directly index the database on the server, typically work pretty well, but multiple word queries often give poor results. The idea here is to investigate the following question: Can we improve the performance of search engines such as Google by doing some additional work on the client?