18-845 Group Project (GP)
Important dates
- Wed, Mar 13 (11:59pm): GP abstracts due
- Email your abstract to droh@cs.cmu.edu (plain text in email body)
- Mon, Apr 1: GP oral status reports due, in class
- Thu, Apr 18 (11:59pm) : GP final reports due
- Email your completed report (pdf format only) to droh@cs.cmu.edu
- Mon, Apr 22 (11:59) : GP reviews due
- Email your assigned anonymized critiques (one PDF file per critique) to droh@cs.cmu.edu
- Sun, Apr 28 (11:59pm): Final camera-ready GP reports due
- Email your report (pdf format) to droh@cs.cmu.edu
1. Instructions for Preparing Your GP abstracts
-
The abstract must contain the following parts:
- Title
- Authors
- One or two paragraphs describing the question(s) you want to
answer, and what will do to answer the question(s).
- A paragraph describing the expected result (What do you hope to learn? What
conclusions do you hope to draw?)
2. Instructions for Delivering Your GP Mid-Term
Oral Report
n - Each group will give a brief report during class on their project.
3. Instructions for Preparing and Submitting Your GP Reports
- Reports are limited to 10 pages (this is a hard limit).
- Font size must be at least 10pt (but 11pt is even better).
- Reports must follow the official ACM Proceedings format. Use the
10pt Latex template provided
here, or the 10pt Word template provided
here.
- Reports should include somthing like the following sections:
- Abstract - A paragraph that summarizes the problem and the results.
- Introduction -
Sets the context, describes the problem, and describes your solution.
- Description - One or more sections that
describes the problem and your approach to the solution in detail.
- Evaluation - A section that quantitatively evaluates your
ideas.
- Related work - Compare and contrast related work. Don't just enumerate.
- Summary and Conclusions - Summarize what you did and what interesting
things you learned from the project.
- Send your reports to droh@cs.cmu.edu.
4. Instructions for Reviewing Your Classmates' Reports
- Each report will be formally reviewed by three reviewers:
your instructor and two classmates randomly chosen by your
instructor. Thus, every student will receive three reviews of
their project. The two student reviews should be anonymous.
- Your
instructors will evaluate the quality of your reviews as part of your
overall project score.
- Use the same review template you used
for your critiques during the semester.
- Send each review as a separate PDF file attachment to
droh@cs.cmu.edu. Don't forget to remove your name.
Hints for Coming Up with a Topic
If you are currently working on a masters or Ph.D. thesis,
we encourage you to pursue a topic that is directly related to
your thesis research. It's OK (ideal in fact!) to use the group
project as a way to make progress on your thesis.
There are two basic approaches you can use for your group research projects:
- Develop a new idea or a new twist on an existing idea, and then do
enough evaluation to serve as a proof of concept.
- Do an extensive evaluation of an existing idea that gives you
some insight into the advantages or disadvantages of that idea.
Here are examples of some project ideas from previous years. Feel free
to use any of these for inspiration:
- 2018
- Murtadha Aljubran and Alex James,
Evaluation of Pseudo-Relevance Feedback Using Wikipedia
- Alex Yu,
Comparative Study of Containers and Virtual Machines
- Danqi Huang and Hang Gong,
Spark Application Evaluation and Performance Boosting
- Nnamdi Stanley Adom, Du Zhang, and Zoe (Zhiyuan) Lin,
IT: A Blockchain-Based Lightening Interbank Transaction system
- Shaoxuan Yang and Gayatri Ravi Kamat,
Comparative Study of Web Server Frameworks
- Kiran Pandit and Ian Van Stralen,
Implementing Intelligent Caching in Clipper: Low latency machine learning
- Karan Dhabalia,
NEX.IO: A Scalable Crowd-Sourced Computing Architecture Using Idle Resources
- Shrikant Giridhar and Cindy Zhang,
kqueue versus epoll: A Performance Comparison
- Zeyuan Tan and Shuangning Liu,
Indexing in Cloud File Systems
- Souptik Sen,
Mystique : An adaptive Straggler mitigation technique for Distributed Neural Network training
- 2017
- 2016
- 2015
- Anuj Patel,
Stream Processing Design and Implementation
- Darsh Shah and Yifan Li,
Improving Consistency in SolrCloud
- Debjani Biswas and Mrigesh Kalvani,
Moina: A Secure, Auditable, Scalable Instant Messaging System
- Jiayu Liu,
Implementing the Raft Consensus Algorithm
- Sean Klein and Advaya Krishna,
Building Raft - An Exploration of Test-Driven Development in Go
- Omkar Gawde and Sameera Padhye,
Comparing Web Server Architectures: Events vs Threads
- Vinaykumar Bhat and Durga Kamuju,
Improving the Deduplication Performance in ZFS
- Yijie Ma,
A Cloud Filesystem With Locking Service
- 2014
- 2013
- 2012
-
Alexander Loria and Piyush Sharma,
MapReduce on a Heterogeneous Environment
-
Nathan D. Mickulicz,
Problem Detection and Diagnosis for Large-Scale Mobile
Wireless Video Streaming Applications
-
Jipeng Han and Xia Wu,
MapReduce Application and Evaluation
-
Hsueh-Hao Chang and Chi Zhang,
Attaching geographical locations to IP addresses in the context of a
world-wide disaster monitoring system
-
Chen Wang and Abraham Levkoy,
Cloud-Based Video on Demand: Fast, Parallelized VoD Across Multiple
Platforms
- Anand Suresh,
A Multi-Threaded Event-Driven Web Server/Framework
- Georges Chamcham and Alok Shankar,
Threads vs. Events in High Speed Servers
- Shrikant Mether and Prajakta Karandikar,
Analyzing Hadoop namenode scalability and availability in
multi-namenode configuration.
- Xianzhe Liang and Xuan Zhang,
MapReduce: Efficient Graph Algorithms
- Samartha Chandrashekar and Shekhar Suman,
Realizing deterministic performance on Linux
- Kai Liu and Yilun Cui,
An event-based http Long-polling Server
- 2011
- 2007
- Michael Ho,
VM forking in Xen
- Hiroshi Isozaki and Syed Wasif Haider,
Implementing DELETE function in a trusted P2P network
- Orathai Sukwong,
User-Friendly Process Behavior Monitoring Tool
- Mark Hairgrove,
Capriccio for Multiprocessors
- Ajay Surie,
Enabling Opportunistic use of Transient Thin-Clients in Internet Suspend/Resume
- Ryan Frishberg and Keetaek Hong,
Examining performance characteristics of HTTP persistent connections and
pipelining and modifying the server scheduling algorithm to punish bad netizens
- Dan Granahan and Eric Tang,
MapReduce: An Investigation of Sorting
- Adrian Ng,
Activity Tracing Component: Preprocessing Traces in a Distributed Storage System
- Supriya Kher, Xiaohui Wang, and Shivani Kirubanandan,
C Implementation of Google's MapReduce: A Simplified Data Processing On Large Clusters
- Joseph C Laws
Efficient Memory Transport for ISR Systems
- 2006
- Chutika Udomsinn,
Determination of Replication Degree
- Himanshu Khurana and Clive Leung,
Evaluating methods of peer-to-peer keyword search
-
Benjamin Gilbert,
Xen and the Art of On-Demand Mobility
-
Sachin Kulkarni and Joseph Mou,
ISR Disk Performance with Xen and iSCSI
-
Sean O'Loughlin,
Attaching geographical locations to IP addresses in the context of a
world-wide disaster monitoring system
-
Rahul Iyer,
Parallel NFS implementation for an Object based Storage Backend
-
Theta Maxino,
Connecting Embedded and Enterprise Systems
-
Hwi (Paul) H. Cheong,
Implementation and evaluation of high-speed non-blocking server
-
Keisuke Ito,
Distributed Annotations Database Performance Measurements
-
Abhinav Mishra & Gesly George,
Exploring a peer-to-peer protocol for streaming content
- 2005
- Ian Kalinowski and Woon Ho Jung,
Pocket ISR: A Live USB-Bootable Version of Internet Suspend/Resume
- Supiti Buranawatanachoke and Kanat Tangwongsan,
A CAS Storage System for ISR
- Tudor Dumitras,
Estimating the Confidence in the QoS Guarantees of Internet Services
- Charles Fry,
Scalability of Fleet Object Store
- John Bucy,
Understanding OpenAFS performance
- Srikant Varadan, Guarav Mehta, and Gautam Kedia,
ISR Ballooning Study
- Bruce Kao and Eric Li,
A P2P Network for Content Distribution
- Pranav Goel and Rajat Venkatesh,
A CAS Storage System for ISR
- Andrew Widdowson,
A Live CD-Bootable Version of Internet Suspend/Resume
- 2004
- Prior to 2004
- Glenn Judd,
Improving 802.11 Access Point Selection: A Preliminary Investigation.
- Hen-I Yang and Anupam Dhanuka,
Performance Evaluation of Multiple Fields Matching Scheme.
- Gautaum Garg and Gene Soo,
Space-Time Codes.
- Punitha Manavalan and Michael Wagner,
Robot Telemetry Manager.
- Li-Chiou Chen & Xia Chen,
Evaluating Methods of Defending Distributed Denial of Service
Attacks.
- Pratish Halady, Rahul Mangharam and Vishal Soni,
Location Based Wireless Network Services.
- Nitin Gupta and Sandhya Gupta,
QoS in Web-Servers.
- Aravind Pavuluri and Saumitra Das,
An Active Architecture for User-Profile Based Dynamic Web Caching.
- Vijay Pandurangan and Mehmet Bakkaloglu,
PASISizing the Web.
- Arif Ulaugac and Nawaportn Wisitpongphan,
Micro-Evaluation of the Flash Server.
- David Oleszkiewicz and Ed Neto,
Distributed Anonymous Information Retrieval.
- Blake Scholl,
Distributed Computation of Performance-Aware Webmaps with HTTP Proxies.
- Thomas Madden and Christopher Palow,
Denial of Service Detector (DoSD).
- Shaheen Gandhi and Alan Wang,
Effects of Latency on Game State Prediction Methods.
- Asad Samar,
An Implementation of Capture Resilient Devices.
Here are some other ideas for topics (in no particular order):
- Evaluation of performance issues in high-speed non-blocking servers.The idea here is to build a high speed server that never blocks on I/O (such
as the Flash server from Rice) and then do extensive micro-evaluation
of its performance in order to understand the extent of the performance
gain that is possible from such non-blocking servers.
- Threads vs events in high speed servers. We've seen a
number of conflicting conclusions in our readings. Which approach is
better? Compare and contrast the performance implications of
kernel-level threads, cooperatively scheduled user-level threads, and
event systems based on select().
- Monitoring in a non-cooperative environment. Stefan
Savage at UCSD has developed a powerful technique for estimating
end-to-end bandwidths and packet-loss between hosts, where the remote
host is not cooperative in the sense that it would be impossible to
get an account on the machine (e.g., the Yahoo server). Savage's
approach is to exploit the behavior of TCP (which all servers must
implement to the specification) to gain information about the
effective bandwidth from the server to the client. For this project,
you might apply this general idea in some new context, or use Savage's
method for estimating packet loss in the context of a larger
application. For example, would it be possible to use Savage's
technique to build a client-side performance monitoring system that,
for a given HTTP transaction, would isolate the network transmission
time from the server processing time and determine which is the
bottleneck?
- Attaching geographical locations to IP addresses in the
context of a world-wide disaster monitoring system. When natural
disasters such as earthquakes occur, it is very difficult to make
accurate estimates of the geographical extent and severity of the
damage because the communication infrastructure disappears. However,
hosts that provide Internet services are always turned on, so the lack
of response from those systems contains some information. The idea is
to build a system that would sample hosts in earthquake prone regions
on a continual basis. Each sample is a bit vector, one bit per host.
Some interesting issues are assigning IP addresses to geographical
locations, developing a hierarchical scheme to aggregate response bit
vectors, and developing analysis techniques of the response bit
vectors to distinguish transients (e.g. localized power failures or
normal host downtime) from real damage.
- Scalable search engines. Current search engines are not
scalable because all of the work is done at the remote server site. As
a result, the servers are not able to perform much computation when
they satisfy a request, typically a quick lookup of an inverted index.
As a result, single-word queries, which directly index the database on
the server, typically work pretty well, but multiple word queries often
give poor results. The idea here is to investigate the following
question: Can we improve the performance of search engines
such as Google by doing some additional work on the client?
|