18-746 Storage Systems (Fall 2018): FAQ

FAQs will be posted here. Keep checking this page.

General

Gen.Q1: The project does not compile and I cannot run any tests.

Please follow the direction in the “Using Amazon Web Services” section to spin up an AWS instance with the development environment.
The project setup should work within the AWS instance that we provided for the project, and you should have no expectation that it will work anywhere else. If you are trying to run it anywhere else, it probably won’t work… and, even if it does, you must ensure your code works in the AWS instance as the instructors will use it to grade your work.

Gen.Q2: Why don't we use Piazza for 746?

Some classes use piazza. Based on our experiences, we choose to rely instead on a staff-wide mailing list, where we know that we are able to manage consistency and accuracy of answers that students get. We combine it with this FAQ page, which lets us curate the set of questions/answers that get promoted for attention by the entirity of the class. We have found that this combination allows us to more efficiently provide effective assistance to 746 students.

Gen.Q3: What is the autolab website for 746?

The autolab website is autolab.pdl.cmu.edu.

Gen.Q4: Do we get any grace days? How many?

You have a total of three grace days (i.e., unpenalized late days) for the entire semester. Each checkpoint has a due date and an end date. Grace days can be used to avoid late penalties for submissions past the due date. No submissions are accepted after the end date.

Gen.Q5: How many submissions do we have per checkpoint? Autolab shows infinite submissions remaining.

You are allowed 25 submissions per checkpoint. Any additional submission after the first 25 submissions will incur a penalty of 10% of that checkpoint grade. Autolab shows infinite submissions because we do not impose a hard limit, but rather have a penalty per submission. You can easily check the number of submissions you have already made by scrolling down the checkpoint-submission page, as well as looking at your handin history via 'View Handin History' on the left panel.

Gen.Q6: Will we be graded on code quality for the checkpoints? Do we have a code style guideline?

You should always keep your code in good style, which is generally the requirement of CMU courses (and all jobs in the real world). The style for checkpoints will be inspected and graded after the entire project is completed. In general, you should follow 15213 guidelines for code style.

Exceptions to the 15213 guidelines:

You can leave print/debug statements for checkpoint 1 and 2 of the projects, as we understand it can be useful in the later stages. They should be removed for checkpoint 3.
We won’t enforce the adherence to 15213 guidelines for checkpoint 1 of myFTL project, as this FAQ item is added after the submission deadline. For checkpoint 1 of myFTL project only, we won't be following a strict checklist to grade your code, and you should not worry too much about the code style as long as you keep it modular, clean (for example, clean unused code blocks) and well-documented. But, note that your ability to debug and our ability to help in extreme circumstances will be significantly reduced by poor code quality.

Gen.Q7: What is the password for the user account in the AWS instance? I want to install a package.

We do not provide sudo permission for students in the project VM. There are ways to install packages without root privileges if needed. If you insist on using third party tools or library in your code, you adopt any risk of your code not working when it is submitted to Autolab. Please make sure it will pass the tests on Autolab, so that you are not surprised by your score.

Gen.Q8: How can we find out more about a particular aspect of the lab?

Have you carefully read the handout? :)

Gen.Q9: I can’t access to the course materials on the web page (“Access forbidden”).

Course materials are only accessible within the CMU network. You need to use CMU’s VPN services when accessing from off-campus.

Using Amazon Web Services (AWS)

Read the following guide before starting on any project development, and make sure your AWS account is working.
You must ensure your code works in the AWS instance as the instructors will use it to grade your work.

aws.Q1: Setup and Billing

Create an AWS account here if you don't have one already. Make sure you use your Andrew email address when you create your AWS account
Once you have created your AWS account, please go ahead and join AWS Educate. To do that, click here and select "Join AWS Educate today", then "Apply for AWS Educate for Students". Join AWS Educate only if you have not done so previously for a different class. Designate that you are a student, and fill the form that comes up.
Note: Some of you may also be taking the 15-619 (Cloud Computing) class concurrently, and your Andrew email might already be tied to it. If that’s the case, use a different email address (e.g., @cs.cmu.edu, @cmu.edu, @gmail) to create a new AWS account. You don't need to do so if you have taken 15-619 in a previous semester.
If your account is new, it comes with 750 hours/month (for 12 months) of free instance time. You need to use the "Free-Tier" instances in order to take advantage of that promotion.
If you need AWS credits for the course, enter your Andrew ID in this Google spreadsheet. If you are registered with the class we will be emailing you a $50 voucher. You can start using AWS without the voucher, since Amazon will not bill you until the end of the month.
To monitor your billing and usage, sign in to your AWS account, click on the drop-down menu with your account name and click on "Billing & Cost Management".

Warning: Make sure to protect your AWS credentials! Do not share or expose them to others online. If your AWS account is compromised and unauthorized charges are made to your credit card, we will be unable to help.

aws.Q2: Starting your AWS instance

Sign into your AWS account.
Click on EC2 under "Compute Services" to be taken to the EC2 Dashboard.
On the right corner of the top bar, click the second drop-down menu from the right to select the datacenter you will be using. Select "N. Virginia".
In the sidebar on the left, click on "AMIs" under the "Images" group.
Click on the drop-down menu under "Launch" and select "Public images".
Use the bar to the right of the drop-down menu to search for '746-update10-devCopy'. Double-check that the image owner is '169965024155'. This is the VM image for the course. Use the refresh button at the top right if the image doesn't show up right away.
Select the AMI, and click "Launch".
Make sure the "t2.micro" instance type is selected (which is also Free-tier eligible, if you are using a new account that benefits from that promotion).
Click "Review and Launch".
On the next screen click "Launch".
In the first drop-down menu select "Proceed without a key pair", and check the checkbox. To access the instance, you will need this key. (If you get "access forbidden" message, make sure that you are on campus and using CMU-secure wifi, or you are connected to CMU VPN.)
Click "Launch Instances", and on the next screen click "View Instances"
In the EC2 Dashboard Instances screen wait for your VM instance state to transition to "running", and the VM status checks to indicate "2/2 checks passed".
Select your instance and check the 'Public DNS' column. Make a note of the machine's FQDN, which will be of the form X.compute-Y.amazonaws.com
From your terminal, run:

ssh -Y -i "path/to/746-student.pem" student@X.compute-Y.amazonaws.com
- Make sure the path passed to the -i option points to the 746-student.pem key pair you downloaded from the course website
- Remember to run "chmod 400 746-student.pem" to ensure your key pair file carries the right permissions
Run the ssh command to connect to your instance!

aws.Q3: Securing your AWS instance

It is a good idea to change the key you use to login to your AWS instance, and stop using the 746-student.pem which the AWS template comes configured with. To do so, we provide a simple script: make_secret_key.sh which will generate a new key and use it to replace 746-student.pem on your instance. Note that if you terminate your instance and create it again from the template image, you will have to run this script again.

aws.Q4: Terminating your AWS instance

When you are done using your VM, you can stop it or terminate it. You can stop your VM by running the shutdown command from within Ubuntu, or using Actions → Instance State → Stop.

Be warned that when your VM is stopped it still consumes resources, which will lead to usage charges for EBS storage (and you may run out of your credit faster). Instead of stopping your VM, make sure to copy out your source code when you're done, and then terminate your instance. You can do that via Actions → Instance State → Terminate.

Lab 1: myFTL

myFTL.Q1: Do we need to make the in-memory data durable?

You can assume the contents of DRAM is non-volatile by some mechanisms that we aren’t asking you to implement.

myFTL.Q2: Which GC policy do we need to implement?

You need to implement all four policies described in the handout (section 5.2). The configuration object passed to the constructor function can be used to get current GC policy (Appendix A.3.1).

myFTL.Q3: What is the age of a block for LRU / LFS cost-benefit policy?

The age of a block is how long it has been since you last modified that block. In other words, age of a block = current timestamp - timestamp of the most recently written page of block.

Lab 2: CloudFS

CloudFS.Q1: What kind of standalone databases can we use?

Use of third-party software in your CloudFS implementations is not allowed. You have a flexibility in choosing how you will organize and store the metadata for your CloudFS. One option is to manage it as a simple database for which you write the storage and access function that you need.

CloudFS.Q2: Should I port CloudFS to C++? If yes, how?

It depends. If you have a good hold on STL and are super comfortable coding in C++ compared to C, then it might be a good idea to port the project to C++. Otherwise, it might be better to stick with C. Porting the project to C++ should be fairly straightforward if you are good with C++. If you are not, then you might end up spending a lot of time doing that. TAs are not going to help you port the code to C++.

A few pointers (not necessarily an exhaustive list of things to do) to port the code to C++: A reasonable start would be to first get the codebase to compile with g++ and then modify the code to make it more C++-ish if you wish. A potential direction would be to:

rename all the .c files to .cpp files
update the Makefile and fix the compilations errors
change the compiler to g++ and patterns from "%.c" to "%.cpp" and add appropriate compiler flags in the Makefile
the variable statusG in cloudapi.cpp should be of type S3Status (static S3Status statusG = (S3Status) 0)
cast filler to void * in the call to S3_get_object in cloudapi.cpp
Based on the version of C++ you are using, you might have to initialize struct fuse_operations cloudfs_operations in the main function in cloudfs.cpp

CloudFS.Q3: lost+found directory causing tests to fail?

lost+found is a directory created by the Ext4 formatting tools. In fact a lost+found exists when you format a filesystem as any of the Ext filesystems (it may even exist on your own machines!). In our scripts you are required to ignore lost+found, i.e. correctness requires you to *not* enlist lost+found as one of the existing folders when an 'ls' is performed on the FUSE mountpoint. The changes required to ignore lost+found must be made entirely in your FUSE implementation. Please do not modify the scripts you are provided as doing that might result in autograding problems on Autolab.

CloudFS.Q4: Why do I get permission denied?

Please make sure you pass correct absolute file path to system call you invoke. For example, if the parameter is '/home/student/mnt/fuse/a' (or './a' the relative path), you should transform it into '/home/student/mnt/ssd/a'.

Also, you should not assume mount locations to be identical across different environments. Specifically, CloudFS binary will take in various mount locations through commandline argument. Do not hard code the path name in your implementation.

CloudFS.Q5: Why do I pass all local tests but get a timeout error on autolab?

First, please check CloudFS.Q4 carefully.

Second, as the writeup says, a CloudFS instance should store all file metadata and some file data on local Solid State Disk (SSD). For all the functions you implement, the given path parameter would either be a relative path or an absolute one. For the absolute one, if it does not belong to a ssd path, you should transform it to the correct ssd path. Otherwise, the incorrect path may cause some system calls stuck and finally trigger the timeout error on autolab.

CloudFS.Q6: Hash Header File?

In order to maintain hashes in cloudfs implemented in C, uthash.h can be used. If you are using C++ there are hash table functions in STL, called maps.

For usage refer to: http://troydhanson.github.io/uthash.
Students are allowed to use only uthash.h from this link.

One student asked and was granted permission to use this file. Others can do the same. Decision to use this file is entirely the responsibility of the student.

CloudFS.Q7: What is "CloudFS Checkpoint X Experimental" assessment in the Autolab?

We have added a new assessment on AutoLab called "Cloudfs Checkpoint 1 Experimental (FUSE)" (We will add one for checkpoint 2 when checkpoint 3 is released). You will be able to use it to test your code from the previous checkpoint. We are putting this up in case you want to solidify your code before launching into a new checkpoint. Submitting your code here will not affect your final grade for the previous checkpoint.

Note: your implementation for the new checkpoint will not be backward compatible with some of the old checkpoint tests.

CloudFS.Q8: How much persistence do we need to implement?

By persistence, we are expecting CloudFS to correctly retain the state after un-mount and re-mount. Your implementation should rebuild the state and continue working once it is unmounted and mounted again later.

CloudFS.Q9: Checkpoint 3 write up has due dates in 2017. What is going on?

We have updated the write up as of late afternoon on Nov 16th. Please refer to the new document at the autolab.

CloudFS.Q10: Do we need to support --no-dedup mode for checkpoint 3?

Yes, you do. It should behave similar to your checkpoint 1 code (no dedup + no cache).

CloudFS.Q11: How could we handle the "device busy" in cloudfs unmount?

We have found quite so many students meeting the problem "device busy" in unmount. The reason for "device busy" is that you hold some opened file handlers which is not closed until the unmount operation. It is important to note that the handler here means both the regular file and the directory file. So we recommend the students to implement the ".releasedir" in the file operation to close all opened dir. Additonally, we recommend to use "lsof" to lookup the opened files and directories.

Last updated: 2018-11-27 16:40:07 -0500