18-845 Group Project (GP)

Important dates

Wed, Mar 13 (11:59pm): GP abstracts due
- Email your abstract to droh@cs.cmu.edu (plain text in email body)
Mon, Apr 1: GP oral status reports due, in class
- In class
Thu, Apr 18 (11:59pm) : GP final reports due
- Email your completed report (pdf format only) to droh@cs.cmu.edu
Mon, Apr 22 (11:59) : GP reviews due
- Email your assigned anonymized critiques (one PDF file per critique) to droh@cs.cmu.edu
Sun, Apr 28 (11:59pm): Final camera-ready GP reports due
- Email your report (pdf format) to droh@cs.cmu.edu

1. Instructions for Preparing Your GP abstracts

The abstract must contain the following parts:
- Title
- Authors
- One or two paragraphs describing the question(s) you want to answer, and what will do to answer the question(s).
- A paragraph describing the expected result (What do you hope to learn? What conclusions do you hope to draw?)

2. Instructions for Delivering Your GP Mid-Term Oral Report

Each group will give a brief report during class on their project.

3. Instructions for Preparing and Submitting Your GP Reports

Reports are limited to 10 pages (this is a hard limit).
Font size must be at least 10pt (but 11pt is even better).
Reports must follow the official ACM Proceedings format. Use the 10pt Latex template provided here, or the 10pt Word template provided here.
Reports should include somthing like the following sections:
- Abstract - A paragraph that summarizes the problem and the results.
- Introduction - Sets the context, describes the problem, and describes your solution.
- Description - One or more sections that describes the problem and your approach to the solution in detail.
- Evaluation - A section that quantitatively evaluates your ideas.
- Related work - Compare and contrast related work. Don't just enumerate.
- Summary and Conclusions - Summarize what you did and what interesting things you learned from the project.
Send your reports to droh@cs.cmu.edu.

4. Instructions for Reviewing Your Classmates' Reports

Each report will be formally reviewed by three reviewers: your instructor and two classmates randomly chosen by your instructor. Thus, every student will receive three reviews of their project. The two student reviews should be anonymous.
Your instructors will evaluate the quality of your reviews as part of your overall project score.
Use the same review template you used for your critiques during the semester.
Send each review as a separate PDF file attachment to droh@cs.cmu.edu. Don't forget to remove your name.

Hints for Coming Up with a Topic

If you are currently working on a masters or Ph.D. thesis, we encourage you to pursue a topic that is directly related to your thesis research. It's OK (ideal in fact!) to use the group project as a way to make progress on your thesis.

There are two basic approaches you can use for your group research projects:

Develop a new idea or a new twist on an existing idea, and then do enough evaluation to serve as a proof of concept.
Do an extensive evaluation of an existing idea that gives you some insight into the advantages or disadvantages of that idea.

Here are examples of some project ideas from previous years. Feel free to use any of these for inspiration:

2018
- Murtadha Aljubran and Alex James, Evaluation of Pseudo-Relevance Feedback Using Wikipedia
- Alex Yu, Comparative Study of Containers and Virtual Machines
- Danqi Huang and Hang Gong, Spark Application Evaluation and Performance Boosting
- Nnamdi Stanley Adom, Du Zhang, and Zoe (Zhiyuan) Lin, IT: A Blockchain-Based Lightening Interbank Transaction system
- Shaoxuan Yang and Gayatri Ravi Kamat, Comparative Study of Web Server Frameworks
- Kiran Pandit and Ian Van Stralen, Implementing Intelligent Caching in Clipper: Low latency machine learning
- Karan Dhabalia, NEX.IO: A Scalable Crowd-Sourced Computing Architecture Using Idle Resources
- Shrikant Giridhar and Cindy Zhang, kqueue versus epoll: A Performance Comparison
- Zeyuan Tan and Shuangning Liu, Indexing in Cloud File Systems
- Souptik Sen, Mystique : An adaptive Straggler mitigation technique for Distributed Neural Network training
2017
- Aditi Deshpande and Paanini Navileka, Impact of Size-Based Scheduling on Web Server Performance
- Vamshi Reddy Konagari and Anise Ghorbani, An Experimental Evaluation of Google’s QUIC Protocol
- Christopher Wei, Machine Learning User Web Interactions
- Heron Yang and Kung-Hsien Yu, Utilization-focused Evaluation on Serverless Architectures
- Priya Avhad and Sphoorti Joglekar, Evaluation of Enhancement to Chord Routing Algorithm
- Reid Long, Effect of Language Runtime on Web Server Performance
- Soumyaroop Dutta and Arushi Grove, Priority-based Dynamic Replacement Cache
- Yanjun Lin and Yuchen Deng, Highly-Available, Replicated DNS Server System With Multi-Paxos Implementation
2016
- Anirudh Nambiar and Nihar Joshi, A Proxy Providing Stronger Consistency with Amazon S3
- Adhish Ramkumar and Robert Maratos, The Anatomy of a Small-Scale Hypertextual Web Search Engine
- Barun Halder, Implementation and Evaluation of Raft in C and Go
- Jenna MacCarley, Implementation and Evaluation of the Chord Distributed Hash Table Protocol
- Sudhir Ravi and Suril Dhruv, Analysis of Strong Consistency Models in Distributed Key-Value Stores
- Mohammed Suhail Rehman, geo-replay: Webserver Load Testing with Geographically Distributed Request Patterns
2015
- Anuj Patel, Stream Processing Design and Implementation
- Darsh Shah and Yifan Li, Improving Consistency in SolrCloud
- Debjani Biswas and Mrigesh Kalvani, Moina: A Secure, Auditable, Scalable Instant Messaging System
- Jiayu Liu, Implementing the Raft Consensus Algorithm
- Sean Klein and Advaya Krishna, Building Raft - An Exploration of Test-Driven Development in Go
- Omkar Gawde and Sameera Padhye, Comparing Web Server Architectures: Events vs Threads
- Vinaykumar Bhat and Durga Kamuju, Improving the Deduplication Performance in ZFS
- Yijie Ma, A Cloud Filesystem With Locking Service
2014
- Amod Jaltade and Aditya Jaltade, SRPT Scheduling in Web Servers
- Ashish Kaila, Elango Jagadeesan, Pratik Shah, Evaluation of Different Strategies for Request Distribution in Cluster-based Web Servers
- Atreyee Maiti, Support for Distributed Transactions with Anti-Caching in Main-Memory Distributed Databases
- Art Chang, Implementation and Evaluation of a High-speed Non-blocking Server
- Tejas Wanjari and Ravi Chandra Bandlamudi, IoTa: A Lean and Efficient Approach to the Internet-of-Things
- Tim Palko, Orb: Scalable collection and real-time analysis of critical system data
- Yuchen Song, RESTful Family-level NAS Service on Android Platform
2013
- Maxim Buevich, Respawn: A Distributed Multi-resolution Time-series Datastore
- Pradyumna Agrawal and Chinmay Kamat, OCache: Onion caching on Web Servers
- Gouri Joshi and Puneet Pruthi, PeerCache - Caching Strategies for a P2P File System (P2PFS)
- Lock Thepdusith, Cluster-Enabled SILT
- Robert Walzer, SmartOffloader: Efficient Cloud Offloading Decisions for Mobile Devices
2012
- Alexander Loria and Piyush Sharma, MapReduce on a Heterogeneous Environment
- Nathan D. Mickulicz, Problem Detection and Diagnosis for Large-Scale Mobile Wireless Video Streaming Applications
- Jipeng Han and Xia Wu, MapReduce Application and Evaluation
- Hsueh-Hao Chang and Chi Zhang, Attaching geographical locations to IP addresses in the context of a world-wide disaster monitoring system
- Chen Wang and Abraham Levkoy, Cloud-Based Video on Demand: Fast, Parallelized VoD Across Multiple Platforms
- Anand Suresh, A Multi-Threaded Event-Driven Web Server/Framework
- Georges Chamcham and Alok Shankar, Threads vs. Events in High Speed Servers
- Shrikant Mether and Prajakta Karandikar, Analyzing Hadoop namenode scalability and availability in multi-namenode configuration.
- Xianzhe Liang and Xuan Zhang, MapReduce: Efficient Graph Algorithms
- Samartha Chandrashekar and Shekhar Suman, Realizing deterministic performance on Linux
- Kai Liu and Yilun Cui, An event-based http Long-polling Server
2011
- Christopher Peplin, Massively Distributed Monitoring
- Vishal Patel, An Evaluation of the Google SPDY Protocol
- Glenn Stroz, An Evaluation of Traffic Shaping Implementations on a Consumer Router
- Joseph Greco, Virtualization of Linux on Hyper-V
2007
- Michael Ho, VM forking in Xen
- Hiroshi Isozaki and Syed Wasif Haider, Implementing DELETE function in a trusted P2P network
- Orathai Sukwong, User-Friendly Process Behavior Monitoring Tool
- Mark Hairgrove, Capriccio for Multiprocessors
- Ajay Surie, Enabling Opportunistic use of Transient Thin-Clients in Internet Suspend/Resume
- Ryan Frishberg and Keetaek Hong, Examining performance characteristics of HTTP persistent connections and pipelining and modifying the server scheduling algorithm to punish bad netizens
- Dan Granahan and Eric Tang, MapReduce: An Investigation of Sorting
- Adrian Ng, Activity Tracing Component: Preprocessing Traces in a Distributed Storage System
- Supriya Kher, Xiaohui Wang, and Shivani Kirubanandan, C Implementation of Google's MapReduce: A Simplified Data Processing On Large Clusters
- Joseph C Laws Efficient Memory Transport for ISR Systems
2006
- Chutika Udomsinn, Determination of Replication Degree
- Himanshu Khurana and Clive Leung, Evaluating methods of peer-to-peer keyword search
- Benjamin Gilbert, Xen and the Art of On-Demand Mobility
- Sachin Kulkarni and Joseph Mou, ISR Disk Performance with Xen and iSCSI
- Sean O'Loughlin, Attaching geographical locations to IP addresses in the context of a world-wide disaster monitoring system
- Rahul Iyer, Parallel NFS implementation for an Object based Storage Backend
- Theta Maxino, Connecting Embedded and Enterprise Systems
- Hwi (Paul) H. Cheong, Implementation and evaluation of high-speed non-blocking server
- Keisuke Ito, Distributed Annotations Database Performance Measurements
- Abhinav Mishra & Gesly George, Exploring a peer-to-peer protocol for streaming content
2005
- Ian Kalinowski and Woon Ho Jung, Pocket ISR: A Live USB-Bootable Version of Internet Suspend/Resume
- Supiti Buranawatanachoke and Kanat Tangwongsan, A CAS Storage System for ISR
- Tudor Dumitras, Estimating the Confidence in the QoS Guarantees of Internet Services
- Charles Fry, Scalability of Fleet Object Store
- John Bucy, Understanding OpenAFS performance
- Srikant Varadan, Guarav Mehta, and Gautam Kedia, ISR Ballooning Study
- Bruce Kao and Eric Li, A P2P Network for Content Distribution
- Pranav Goel and Rajat Venkatesh, A CAS Storage System for ISR
- Andrew Widdowson, A Live CD-Bootable Version of Internet Suspend/Resume
2004
- Andrew Boyer, ClusterSim: a Flexible E-Commerce Cluster Simulation
- Harvey Vrsalovic, Reconfigurable Metric-Driven Peer-to-Peer Object Serving
- Matthew Brown, A Parametric Study of End System Multicast Trees in Reliable File Transfer
- Debmallo Shayon Ghosh and Philippe M. Wilson, A Comparative Analysis of Peer-to-Peer Reputation Systems
- Francisco Roberto Arevalo, Reconciling Locality and Load Balancing in Clustered Servers
- Rahul Dhar and Boris Jabes, On the Efficiency of Peer-to-Peer Keyword Searching
- Ryan Ungaretti, Alleviating Hotspots with Replication and Redirection
- Rajiv Motwani and Soila Pertet, Caching Dynamic Content
Prior to 2004
- Glenn Judd, Improving 802.11 Access Point Selection: A Preliminary Investigation.
- Hen-I Yang and Anupam Dhanuka, Performance Evaluation of Multiple Fields Matching Scheme.
- Gautaum Garg and Gene Soo, Space-Time Codes.
- Punitha Manavalan and Michael Wagner, Robot Telemetry Manager.
- Li-Chiou Chen & Xia Chen, Evaluating Methods of Defending Distributed Denial of Service Attacks.
- Pratish Halady, Rahul Mangharam and Vishal Soni, Location Based Wireless Network Services.
- Nitin Gupta and Sandhya Gupta, QoS in Web-Servers.
- Aravind Pavuluri and Saumitra Das, An Active Architecture for User-Profile Based Dynamic Web Caching.
- Vijay Pandurangan and Mehmet Bakkaloglu, PASISizing the Web.
- Arif Ulaugac and Nawaportn Wisitpongphan, Micro-Evaluation of the Flash Server.
- David Oleszkiewicz and Ed Neto, Distributed Anonymous Information Retrieval.
- Blake Scholl, Distributed Computation of Performance-Aware Webmaps with HTTP Proxies.
- Thomas Madden and Christopher Palow, Denial of Service Detector (DoSD).
- Shaheen Gandhi and Alan Wang, Effects of Latency on Game State Prediction Methods.
- Asad Samar, An Implementation of Capture Resilient Devices.

Here are some other ideas for topics (in no particular order):

Evaluation of performance issues in high-speed non-blocking servers.The idea here is to build a high speed server that never blocks on I/O (such as the Flash server from Rice) and then do extensive micro-evaluation of its performance in order to understand the extent of the performance gain that is possible from such non-blocking servers.
Threads vs events in high speed servers. We've seen a number of conflicting conclusions in our readings. Which approach is better? Compare and contrast the performance implications of kernel-level threads, cooperatively scheduled user-level threads, and event systems based on select().
Monitoring in a non-cooperative environment. Stefan Savage at UCSD has developed a powerful technique for estimating end-to-end bandwidths and packet-loss between hosts, where the remote host is not cooperative in the sense that it would be impossible to get an account on the machine (e.g., the Yahoo server). Savage's approach is to exploit the behavior of TCP (which all servers must implement to the specification) to gain information about the effective bandwidth from the server to the client. For this project, you might apply this general idea in some new context, or use Savage's method for estimating packet loss in the context of a larger application. For example, would it be possible to use Savage's technique to build a client-side performance monitoring system that, for a given HTTP transaction, would isolate the network transmission time from the server processing time and determine which is the bottleneck?
Attaching geographical locations to IP addresses in the context of a world-wide disaster monitoring system. When natural disasters such as earthquakes occur, it is very difficult to make accurate estimates of the geographical extent and severity of the damage because the communication infrastructure disappears. However, hosts that provide Internet services are always turned on, so the lack of response from those systems contains some information. The idea is to build a system that would sample hosts in earthquake prone regions on a continual basis. Each sample is a bit vector, one bit per host. Some interesting issues are assigning IP addresses to geographical locations, developing a hierarchical scheme to aggregate response bit vectors, and developing analysis techniques of the response bit vectors to distinguish transients (e.g. localized power failures or normal host downtime) from real damage.
Scalable search engines. Current search engines are not scalable because all of the work is done at the remote server site. As a result, the servers are not able to perform much computation when they satisfy a request, typically a quick lookup of an inverted index. As a result, single-word queries, which directly index the database on the server, typically work pretty well, but multiple word queries often give poor results. The idea here is to investigate the following question: Can we improve the performance of search engines such as Google by doing some additional work on the client?