Team Status Report for 5/8

This week Christy sorted out cloud deployment and has gotten our site online at: www.acapella2021.com. There are still things to debug however, namely configuring things in Https as certain functions in the Web AudioAPI don’t seem to work with Http. Unfortunately, as this is central to recording and monitoring, it also means that we haven’t gotten to testing our latency over the web right now.

Additionally, because of some changes in the upload process on our webpage, we are updating our syncing on the backend to run with all the files in the group instead of one at a time. Another thing that remains is giving the users the ability to download all all their tracks. This can be easily implemented by giving users permission to download each track, however we would like there to be an option of mixing down all the tracks and exporting them as one.

In the next two days, we will be finishing up these tasks and working on our final poster and video due Monday night.

Team Status Report for 5/1

This week, our group worked on the final presentation and some finishing touches and adjustments to make sure the individual parts will work together. In our presentation, we updated our schedule, system diagrams and explained several features of our site in more detail.

Cloud deployment is the last thing we need to do. We ran into a couple problems trying to deploy with AWS. Firstly there was a database error when trying to load the python library librosa. There doesn’t seem to be any resources we can consult to fix this issue so instead, we will rewrite our code with another library, essentia, which has a similar onset detection function needed for syncing the tracks up.

In the following week, we will hopefully be able to test for latency online, with users in different locations. We will also get be getting survey responses about the UI and performance from other people, and filming the parts needed for our final video.

 

Team Status Report for 4/24

This week we resolved the issue of uploading audio in a format that is accessible from our backend. There were some problems getting our initial recorder to record in a format decodable by our backend, so instead, we changed our implementation to initialize two recorders, one for monitoring and one for capturing audio. While having two recorders may seem like one too many, this might make easier for us when it comes to reducing our latency for monitoring, as we can lower the sample-rate of the recorder used for monitoring without affecting the sample-rate of the actual recording.

Additionally, we implemented more of the track UI and set up our database, where the uploaded files will be stored for the different groups. With this, we can now sync up the audio in the files based on the timing information we send with the click track. With that done, we were able to integrate some of our individual parts together and fix some of the bugs that cropped up.

We are behind schedule as most of what we have left requires cloud deployment which has not been done. Since, we can only test on our local machines right now, the monitoring latency is mere single-digits right now but this might not be true across multiple remote clients. If that is the case, then we will have to implement some of the buffers and filters described in our Design Review Document.

 

 

Team Status Report for 4/10

What’s Working?

This week, we have made progress on file uploads, as well as measurements of latency and packet loss. Per Jackson’s status report, latency and packet loss measurements are promising, but they are no good to interpret without cloud deployment working yet. Locally, communicating between browsers, latency is around 5ms, and packet loss is almost constantly 0%. These are likely just the overhead from WebRTC, and will start to get a lot worse when you add in the factor of network speed. Still, it’s helpful to have these measurements working, so they can be interpreted as soon as cloud deployment does work. Only then can we really say if we need to use other means to improve latency (e.g. downsampling the audio being sent).

Risks & Management

Our biggest risk right now is not having enough time to finish everything we planned. With cloud deployment not yet started, and no DAW interface, we likely will not be able to get them both working perfectly by the end of the class. Since our project’s main purpose is to facilitate musical collaboration, cloud deployment is absolutely essential. Therefore, any time allocated for the DAW UI may have to be reallocated to cloud deployment until that’s working. We will likely have to all work on these tasks, replacing our initial division of labor proposed in the design review.

Another risk we have is exporting the recordings into a readable file format for processing and saving. We previously had not considered compatibility with different browsers; the common file formats are not natively supported (.wav, .mp3), so we’d have to do this conversion ourselves. If conversion doesn’t work, there is a different recording implementation that has a function built in for us to convert to .wav files. However, integrating that one could likely break some of the features we already have and likely cost us more labor.

Changes to System Design & Schedule

As of now, no system design changes to report.

While we haven’t finalized any of the changes mentioned above, we will likely have significant changes this week when we make our updated Gantt chart for the interim demo on Wednesday. One option would be to put the rest of our focus on cloud deployment and getting monitoring latency down as much as possible between two remote clients. This way, even if the DAW UI isn’t finished, we will still have a low-latency audio chat site for musicians to practice, and hopefully record as well, even if editing is not a possibility.

Since cloud deployment and the DAW UI are not complete, we will have to cut into our slack time even more. Luckily, we planned for this, and we have the slack time available.

Team Status Report for 4/3

Progress

This week, we finally have a working low-latency audio chat web app which allows you to mix people’s volumes and record to a click track! Ivy’s click track is now integrated, Christy’s UI is coming together, and Jackson’s peer-to-peer audio monitoring is working now with multiple connected users. For details on how different components work, see our individual status reports.

Risks

The risk of audio monitoring not working is no longer a thing, but there is still a risk that the latency will be too much for practical use. To manage this, we first have to perform our latency tests outlined in our design review. As described in Jackson’s status report, audio is sent over WebRTC connections, which utilize UDP, so there really isn’t a faster way to send audio. So if the latency is still too much, the only solution would be to decrease the amount of data being sent. In the context of audio, this could mean sending audio at a lower sample rate or lower bit depth to shrink the packets as much as possible. Still, no matter how little data we’re sending, there is some latency which is unavoidable.

Another significant risk is that we may not be able to build a user interface that looks like a professional DAW. While Christy is responsible for the UI (per the design review), we may have to all work together on it to mitigate this risk, since this is likely a more important aspect than additional audio effects or file uploads to the server.

Schedule

We are slightly behind schedule, but not enough to warrant a completely new one. Our tasks remain the same, just with slightly less time to do them. We planned for this though, so we can just take a week out of our slack time.

Significant Design Changes

There were no big changes to our architecture this week, but some big changes to the monitoring implementation are detailed in Jackson’s status report. Without multiple ConnectionPair objects, peer-to-peer connections between more than 2 people would not be possible. The small trade-off is memory in the browser, but since our app is really only intended for 4 users at a time, this trade-off is completely insignificant compared to the benefits.

Team Status Report for 3/27

This week, we’ve made a lot of progress on the monitoring aspect of our project. As outlined in Jackson’s status report, websocket connections between each user and the server are working, as well as WebRTC peer-to-peer connections. Though audio is not sent from browser to browser quite yet, simple text messages work just fine, and sending audio will be completed by next week. In addition, the click track is working, and the user interface has seen big improvements.

There are a few major changes to our design this week:

  1. To allow multiple asynchronous websocket connections open at once, we had to add a Redis server. This runs simultaneously with the Django server, and communicates with the Django server and its database over port 6379. This was based on the tutorial in the channels documentation, though that tutorial uses Docker to run the Redis server, while I just made the Redis server work in a terminal. This change doesn’t have much of a trade off, it’s just a necessary addition to allow asynchronous websocket channels to access the database.
  2. We have decided to use WebRTC for peer-to-peer connections. In our design review, we planned to use TCP, since it gives a virtually 0% packet loss rate. The cost of using TCP is big though when latency is such an issue as it is with music, making it impractical. WebRTC is used for real-time communication and particularly designed for media streaming between users. It uses UDP to send data with the lowest possible latency, and it’s built in to the user’s browser. The only real cost here is that now we do have to worry about packet loss. But for something like music where timing is so critical, we’ve decided that meeting the latency requirement (<100ms) is far more important than the packet loss requirement (<5%).

Since the WebRTC connection is working, we no longer have to worry about not having monitoring. However, we do have a number of other big risks. As I see it, our biggest risk right now is that none of us really know how to create the DAW user interface we showcased in our design review. Visualizing audio as you’re recording and being able to edit the way you can in industry DAWs like Pro Tools/Audacity/Ableton/etc. is going to be a challenge. To mitigate this risk, we will need to put extra work into the UI in the coming weeks, and if this fails, we can still expect to have a good low-latency rehearsal tool for musicians even without the editing functionality.

 

Team Status Report for 3/20

This week, we have been working on implementing main page and group page. Main page is where user decides to create a new group or join an existing group. Jackson worked on both front-end and back-end of this main page. We decided to make the group accessible by private room-key, whose type is uuid, and the user can get the room-key from the url. Only people who know about the room-key can access the group and start recording together.

Once people join the room, that is where concurrent recording comes into play. People are going to record synchronously. We are currently still working on implementing the UI and back-end of this page.

Team Status Report for 3/13

This week, we began working on the basic functionality of our website, implementing the recording function, some basic ui and the click generator.

Because we know little of networking, our biggest concern is still implementing the real time monitoring. A good question regarding this was raised during our Design Review presentation, where someone asked about alternatives to our socket solution, if it does not meet our latency requirement. This is a valid concern we had not considered; even though the other examples we tried out could reduce the latency below 100 ms, we might be limited with websockets and might not be able to do the same. One alternative to this is requiring the use of ethernet, which might speed a user’s connection to meet this requirement, but we are not sure if that alone would be enough.

Team Status Report for 3/6

Our biggest risk remains that sending audio over a socket connection may either not work or not lower latency enough to be used in a recording setting. To manage this risk, we are focusing most of our research efforts on sockets (Jackson and Christy) and synchronization (Ivy). As a contingency plan, our app can still work without the real-time monitoring using a standard HTML form data upload, but it will be significantly less interesting this way.

In our research, we found that other real-time audio communication tools for minimal latency use peer-to-peer connections, instead of or in addition to a web server. This makes sense, since going through a server increases the amount of transactions, which in turn increases the time it takes for data to be sent. Since a peer-to-peer connection seems to be the only way to get latency as low as we need it to be, we decided on a slightly different architecture for the app. This is detailed in our design review, but the basic idea is that audio will be sent from performer to performer over a socket connection. The recorded audio is only sent to the server when one of the musicians hits a “save” button on the project.

Because of this small change in plans, we have a new schedule and Gantt chart, which can be found in our design review slides. The high-level change is that we need more time to work on peer-to-peer communication.

Team Status Report for 2/27

Since last week, we have gained familiarity with audio processing on the web, and we know how we’re going to implement all in-browser operations. The biggest remaining challenge is streaming the audio data to and from the server. The most significant risk is that we will not be able to do this in real-time, and thus the recording musicians will not be able to hear each other. As we said last week, this is something we would love to have, but if we can’t figure it out, there are other solutions (e.g. uploading the audio after the whole track is recorded, or asynchronously but in larger chunks than would be workable for real-time). We need to decide on this early on, as it will effect many other facets of our project going in.

We haven’t made any big changes in our implementation, but we’re able to be more specific now about how things are going to get done. For example, the click track can be created using an audio buffer similar to the white noise synthesizer in Jackson’s status report. This way, the clicks can be played to recording musicians the entire time they are recording. Better yet, no server storage is needed for this. A buffer containing a click sound can be looped at a certain tempo (i.e. every quarter note), so only a small buffer is needed.

Since there aren’t any big changes to report, our initial schedule is still valid. That said, it is possible we will get the in-browser recording and click track aspects to work much sooner than anticipated, in which case we’ll have far more time to work on the challenges mentioned above (uploading and downloading audio data chunks in real-time). Research into websockets is still needed, and remains a critical part of our desired implementation.