Team Status Report for 4/24

This week we resolved the issue of uploading audio in a format that is accessible from our backend. There were some problems getting our initial recorder to record in a format decodable by our backend, so instead, we changed our implementation to initialize two recorders, one for monitoring and one for capturing audio. While having two recorders may seem like one too many, this might make easier for us when it comes to reducing our latency for monitoring, as we can lower the sample-rate of the recorder used for monitoring without affecting the sample-rate of the actual recording.

Additionally, we implemented more of the track UI and set up our database, where the uploaded files will be stored for the different groups. With this, we can now sync up the audio in the files based on the timing information we send with the click track. With that done, we were able to integrate some of our individual parts together and fix some of the bugs that cropped up.

We are behind schedule as most of what we have left requires cloud deployment which has not been done. Since, we can only test on our local machines right now, the monitoring latency is mere single-digits right now but this might not be true across multiple remote clients. If that is the case, then we will have to implement some of the buffers and filters described in our Design Review Document.

 

 

Jackson’s Status Report for 4/24

This week I did some debugging, integration with my team members’ contributions, as well as the ethics assignment and discussion. My duties (per our planned division of labor in the design review including all monitoring, websockets, peer-to-peer connections, recording and playback in browser, and corresponding server side views) are wrapping up, and everything I have left (driving down latency and packet loss below our threshold for viability) depends on cloud deployment. This is not a task assigned to me, but our original Gantt chart had it scheduled to be done by 3/11, so I may need to do that as well. Cloud deployment is absolutely necessary for our project, so if it’s not started very soon, I’ll plan to start on it before next week, though I’ve never done it successfully, so it may be difficult alone. I am now on schedule, but the group is still behind (especially with regards to cloud deployment), so I will try to pick up some of these tasks where I’m able.

Ivy and Christy have been working on uploading recorded audio to the Django server using a separate URL (/add_track/<room_code>). One consequence of this method is that the main group page establishes websocket connections using the /group/<room_code> URL. Thus, on every upload, the websocket connections I’ve been working on are broken. I tried a few approaches to fix this bug. First, I tried to change the websocket URL to just include the room code. This wasn’t working for some time, and I couldn’t figure out why (I spent more time than I’d like to admit on this), but finally I came across a stack overflow post which essentially said channels will only work if the websocket URL corresponds to the page URL. This approach wasn’t going to work unless I switched from channels to another websocket/server integration library, and channels has worked well for everything else so far, so I needed to find a new solution.

I was able to finally fix this bug with a very simple solution, which is just to redirect the add_track URL to the corresponding group URL immediately after storing the track in the database. As a result, websocket connections with the server are not actually maintained during the upload, but they are instantly and automatically reestablished after the upload. This solution is also cleaner since it ensures that the URL shown on the group page is always the /group/<room_code> URL.

This does however break the peer-to-peer connections, as the Javascript gets reloaded with each new page, but the user can reestablish the peer-to-peer connection with a button click. This isn’t hard to do by any means, but it may be annoying for users. I believe Ivy is looking into asynchronous uploads which shouldn’t involve a page reload, in which case no further integration is needed. If not though, I will look into better solutions this week that will not break the peer-to-peer connections.

Another separate bug I’ve noticed and not been able to fix yet is the way the server keeps track of which users are online/offline at any given moment. As it’s currently implemented, a user is marked online when they create their account or log in, and marked offline again when they log out. An obvious bug here is that users can (and often will) close the site without logging out. But Django doesn’t have a good way of knowing when this happens. The solution may end up being pretty complicated, and the one I have started on is sending websocket messages to all users in your group at regular intervals, just to signal “I’m online.” This seems easy enough, until you realize that the channel layer which the websockets operate on cannot change the database (this is to prevent issues with synchronization, since the database is changed synchronously and websockets are inherently asynchronous). I think we will have to remove the online/offline bit from the database entirely, and just keep track of that on each user’s browser separately. I will have this done by next week as a deliverable.

On a different note, I’ve been thinking a lot about the Langdon Winner article as it relates to our project. In my mind, our project is a solution to the problem of having to find a physical meeting place to practice and make recorded demos. An online meeting room is more convenient, especially now when so many ensembles and bands have members from all over the world, and perhaps more obviously it’s the safer option in a pandemic. But if a technology similar to our project were to become widely used and completely replace physical practice spaces for musicians, some problems could occur. Mainly, our project requires a very good internet connection, so a wide adaptation of our project could cause a large upward transfer of wealth to internet providers. Secondly, practice spaces often have things that musicians cannot keep in their living spaces (either due to cost or space constraints), like grand pianos, expensive microphones, sound treatment, etc. Without a physical practice space, musicians may be expected to have all of these things in their homes, which could again be more of a financial burden on the user than a convenience. At worst, this could drive people with less money or less space away from collaborative music entirely. Though I don’t think our work for an undergraduate class is going to cause any of this to happen, it has been interesting to think about as someone who is very fascinated by ethics and politics.

Christy’s Status Report for 4/10

This week, I have been working on improving UI for the video editor once the user uploads its audio file to the track. I have had trouble displaying waveform for the audio file because there was a little issue with the format of our audio file. I will continue to work on improving UI.

I utilized bootstrap to decorate our webpage UI.  Buttons and dropdown bars were generic html style. So, I attached customized css to html elements in order to make it better.

I struggled the most with select dropdown bars for click generator. I tried to user bootstrap’s select-picker to make our select dropdown bars more pretty. However, the version of our main bootstrap, which is bootstrap4, was not compatible with select-picker library. So, I decided not to use select-picker for dropdown bars. Instead, I generated customized css for those select dropdown bars.

I am planning on deploying our implementation on cloud before interim demo.

Ivy’s Status Report for 4/10

This week, I worked on trying to upload the recorded audio in a file format recognized by python. Chrome does not support recording in a .wav format (as a matter of fact, the only format that seems to be supported across all browers is webm), so we have to do this ourselves. Attempts involved trying to write the audio data into a wav file in the back end (which just resulted in noise), and trying to convert the recorded audio blob into a .wav file before passing it to the server.

After some research, I found this tutorial, which shows how write our own WAV header before uploading the audio to the server. Since webm does work with PCM encoding, appending our recorded audio to a WAV header seems to be the right way to go. However, after trying it, I’m still getting errors trying to read the file in the back end. I think the problem is we need to specify our sample rate and bit depth before recording and am currently looking into how to set that up.

Though I have the syncing post-upload done, not being able to get the audio in a readable format makes the entire functionality moot. I am behind right now, but I hope to get all this figured out before Monday, so we can update our gantt chart and get ready for the demo.

Team Status Report for 4/10

What’s Working?

This week, we have made progress on file uploads, as well as measurements of latency and packet loss. Per Jackson’s status report, latency and packet loss measurements are promising, but they are no good to interpret without cloud deployment working yet. Locally, communicating between browsers, latency is around 5ms, and packet loss is almost constantly 0%. These are likely just the overhead from WebRTC, and will start to get a lot worse when you add in the factor of network speed. Still, it’s helpful to have these measurements working, so they can be interpreted as soon as cloud deployment does work. Only then can we really say if we need to use other means to improve latency (e.g. downsampling the audio being sent).

Risks & Management

Our biggest risk right now is not having enough time to finish everything we planned. With cloud deployment not yet started, and no DAW interface, we likely will not be able to get them both working perfectly by the end of the class. Since our project’s main purpose is to facilitate musical collaboration, cloud deployment is absolutely essential. Therefore, any time allocated for the DAW UI may have to be reallocated to cloud deployment until that’s working. We will likely have to all work on these tasks, replacing our initial division of labor proposed in the design review.

Another risk we have is exporting the recordings into a readable file format for processing and saving. We previously had not considered compatibility with different browsers; the common file formats are not natively supported (.wav, .mp3), so we’d have to do this conversion ourselves. If conversion doesn’t work, there is a different recording implementation that has a function built in for us to convert to .wav files. However, integrating that one could likely break some of the features we already have and likely cost us more labor.

Changes to System Design & Schedule

As of now, no system design changes to report.

While we haven’t finalized any of the changes mentioned above, we will likely have significant changes this week when we make our updated Gantt chart for the interim demo on Wednesday. One option would be to put the rest of our focus on cloud deployment and getting monitoring latency down as much as possible between two remote clients. This way, even if the DAW UI isn’t finished, we will still have a low-latency audio chat site for musicians to practice, and hopefully record as well, even if editing is not a possibility.

Since cloud deployment and the DAW UI are not complete, we will have to cut into our slack time even more. Luckily, we planned for this, and we have the slack time available.

Jackson’s Status Report for 4/10

This week, my task was measuring and possibly improving latency and packet loss rate.

Latency

Measuring latency, like every other part of this project so far, is a much more complex task than I initially thought. For one, the term “latency” is a bit ambiguous. There are multiple different measurements this could mean:

  • “End-to-end delay” (E2E) or “one-way delay”
    This is the time it takes from the moment data is sent by one client to the moment that data is received by another. Since this relies on perfect synchronization between the clocks of each client, this could be difficult to measure.
  • “Round-trip time” (RTT)
    This is the time it takes from the moment data is sent by one client to the moment that data is received back by the same client. This measurement only makes sense if the remote computer is returning the data they receive.

These measurements are certainly not meaningless, but in an audio system, neither one is all that great. To explain this, it’ll be helpful to first go over the signal path again. This is what it looks like when client A is sending audio to client B (very simplified for the purposes of this discussion):

  1. Client A makes a sound.
  2. The sound is sampled and converted to a digital signal.
  3. The browser converts this signal to a MediaStreamTrack object.
  4. WebRTC groups samples into packets.
  5. These packets are sent to client B.
  6. On client B’s computer, WebRTC collects these packets and places them into a jitter buffer.
  7. Once the jitter buffer has accumulated a sufficient amount of samples to make smooth playback possible, the samples in the jitter buffer are played back, allowing client B to finally hear client A’s sound.

Steps 1-5 take place on client A’s computer, and steps 6-7 take place on client B’s computer.

The first thing to notice is that this communication is one-way, so RTT doesn’t make sense here. E2E delay must be approximated. We can do this as described in the design review. Client A sends a timestamp to client B. Client B then compares this timestamp to their current time. The difference between these times is a good approximation for E2E delay, provided the clocks between the computers are synchronized very closely. Luckily, JavaScript provides a funciton, performance.now(), which gives you the elapsed time in milliseconds since the time origin. Time origins can be synchronized by using performance.timing.navigationStart. We now have the E2E delay pretty easily.

But as you can see, E2E delay only measures the time from step 5 to step 6. Intuitively, we want to measure the time from step 1 to step 7. That is the amount of time from the moment a sound is made by client A to the moment that sound is heard by client B. This is the real challenge. The browser doesn’t even have access to the time from step  1 to step 3, since these happen on hardware or in the OS, so these are out of the picture. Step 3 to step 5 are done by WebRTC with no reliable access to the internals, since these are implemented differently on every browser, and poorly documented at that. As mentioned, we can approximate step 5 to step 6, the E2E delay. All that’s left is step 6 to step 7, and luckily, WebRTC gives this to us through their (also poorly documented) getStats API. After some poking around, I was able to find the size of the jitter buffer in seconds. This time can be added to the E2E delay, giving us the time from step 5 to step 7, and this is probably the best we can do.

So is the latency good enough? Well, maybe. On my local computer, communicating only from one browser to another, latency times calculated in this way are around 5ms, which is very good (as a reminder, our minimum viable product requires latency below 100ms). But this isn’t very useful without first deploying to the cloud and testing between different machines. Cloud deployment is not something I am tasked with, so I’m waiting on my teammates to do this. Per our initial Gantt chart, this should have happened weekly for the past few weeks. As soon as it does, we can interpret the latency test I implemented this week.

Packet Loss Rate

Unlike latency, packet loss rate is actually very simple to get. The amount of packets sent and the amount of packets dropped can both be found using the WebRTC getStats API, the same way I got the size of the jitter buffer. Interpretation of this is again dependent on cloud deployment, but locally the packet loss rate is almost 0%, and I never measured any more than 5% (our threshold for the minimum viable product).

Schedule and Deliverables

As with last week, I am pretty clearly behind, despite working well over 12 hours again this week. At this point, we have used most of our slack time, and it may be smart to think about shrinking our minimum viable product requirements, particularly dealing with the UI. If we don’t have a DAW-like UI by the end of the course, we will at the very least have a low-latency multi-peer audio chatting app which can be used for practicing musicians.

For next week, I will work on polishing what we have before the interim demo, completing the ethics assignment, and possibly UI and cloud deployment if needed. Most of the remainder of our project relies on cloud deployment getting done, and a better UI, so it’s hard to pick a better next step.

Team Status Report for 4/3

Progress

This week, we finally have a working low-latency audio chat web app which allows you to mix people’s volumes and record to a click track! Ivy’s click track is now integrated, Christy’s UI is coming together, and Jackson’s peer-to-peer audio monitoring is working now with multiple connected users. For details on how different components work, see our individual status reports.

Risks

The risk of audio monitoring not working is no longer a thing, but there is still a risk that the latency will be too much for practical use. To manage this, we first have to perform our latency tests outlined in our design review. As described in Jackson’s status report, audio is sent over WebRTC connections, which utilize UDP, so there really isn’t a faster way to send audio. So if the latency is still too much, the only solution would be to decrease the amount of data being sent. In the context of audio, this could mean sending audio at a lower sample rate or lower bit depth to shrink the packets as much as possible. Still, no matter how little data we’re sending, there is some latency which is unavoidable.

Another significant risk is that we may not be able to build a user interface that looks like a professional DAW. While Christy is responsible for the UI (per the design review), we may have to all work together on it to mitigate this risk, since this is likely a more important aspect than additional audio effects or file uploads to the server.

Schedule

We are slightly behind schedule, but not enough to warrant a completely new one. Our tasks remain the same, just with slightly less time to do them. We planned for this though, so we can just take a week out of our slack time.

Significant Design Changes

There were no big changes to our architecture this week, but some big changes to the monitoring implementation are detailed in Jackson’s status report. Without multiple ConnectionPair objects, peer-to-peer connections between more than 2 people would not be possible. The small trade-off is memory in the browser, but since our app is really only intended for 4 users at a time, this trade-off is completely insignificant compared to the benefits.

Jackson’s Status Report for 4/3

This week, I worked a lot more on monitoring. By the end of last week, I was able to send peer-to-peer text messages between two unique users. This week I extended that to allow for audio streams to be sent, and to work with an arbitrary number of users. These changes required a complete restructuring of all the monitoring code I wrote last week and the week before.

As a reminder, I’m using the WebRTC (real-time communication) API which comes built-in on your browser. WebRTC uses objects called RTCPeerConnection to communicate. I explained this in detail last week, so I won’t get into it here. The basic idea is that each connection requires both a local RTCPeerConnection object and a remote RTCPeerConnection object. If WebRTC interests you, there are many tutorials online, but this one has been by far the most helpful to me.

Connections with Multiple Users:

Last week, I created one local RTCPeerConnection object and an arbitrary amount of remote RTCPeerConnection objects. This made intuitive sense, since I want to broadcast only one audio stream, while receiving an arbitrary number of audio streams from other users. However, this week I learned that each local connection can only be associated with ONE remote user. To get around this, I created an object class called ConnectionPair. The constructor is simple and will help me explain its use:

Each peer-to-peer relationship is represented by one of these ConnectionPair objects. Whenever a user chooses to initiate a connection with another user, a ConnectionPair is created with 4 things:

  1. A new local connection object (RTCPeerConnection)
  2. A new remote connection object (RTCPeerConnection)
  3. A gain node object (GainNode from the web audio API) to control how loud you hear the other user)
  4. The other user’s username (string) which serves as a unique identifier for the ConnectionPair. You can then retrieve any ConnectionPair you’d like by using another function I wrote, getConnection(<username>), whose meaning is self-evident.

So why is this useful? Well, now there can be an arbitrary amount of ConnectionPair objects, and thus you can establish peer-to-peer connections with an arbitrary amount of users!

Sending Audio Over a WebRTC Connection:

Since the WebRTC API and the web media API are both browser built-ins, they actually work very nicely with each other.

To access data from a user’s microphone or camera, the web media API provides you with a function: navigator.MediaDevices.getUserMedia(). This function gives you a MediaStream object, which contains one or more MediaStreamTrack objects (audio and/or video tracks). Each of these tracks can be added to a local WebRTC connection by the function RTCPeerConnection.addTrack(<track>).

And finally, to receive audio on your remote RTCPeerConnection, you can just use the event handler RTCPeerConnection.ontrack, which is triggered every time the remote connection receives a track. This track can be routed to your browser’s audio output, any type of AudioNode (for example, a GainNode, which simply changes the volume of an incoming signal), and out.

User Interface Improvements:

Finally, I added a little bit to the UI just to test the functionality mentioned above. On the group page, here is how the users are now displayed:

In this group, there are 3 members:

  1. Jackson (the user whose screen you’re looking at in this screenshot)
  2. TestUserFname (a test user, who is currently offline)
  3. OtherTestUserFname (another test user, who is currently online)

Next to every online user who isn’t you, I added a button which says “send my audio.” This button initiates a peer-to-peer connection with the user it appears next to. Once the connection is opened, the audio from your microphone is sent to that user.

Lastly, next to each online user, I added a slider for that person’s volume, allowing users to create their own monitor mixes.

We are still a bit behind schedule, since I thought I would have peer-to-peer audio monitoring working a few weeks ago, but I hope it’s clear that I’ve been working very hard, well over 12 hours/week to get it working! The biggest thing I can do to get us back on schedule is continue working overtime. As I said last week though, I think the hardest part is behind us.

With monitoring finally working, I can move onto testing and really driving the latency down. For next week, I will implement the latency and packet loss tests described in our design review and work on getting the numbers to meet our requirements (<100ms latency and <5% packet loss rate).

Christy’s Status Report for 3/27

Throughout this week, i completed implementing generating sound wave for the recorded audio. I referred to the code on this website, https://www.cssscript.com/audio-visualizer-with-html5-audio-element/.

Currently, the visualizer generate the waveform as it plays the sound in the playback. Once the playback stops, the waveform disappears. I am figuring out ways to display the waveform even after the playback ends. In order to do so, I need to capture each waveform, connect them all, and display.

Next thing i am working on is propagating the visualization to other users once the user decides to upload their audio. In order to do so, i need to utilize jquery & ajax to propagate the upload.

 

Ivy’s Status Report for 3/27

I am almost finished up with the audio upload to server right now. This past couple weeks, I realized my implementation discussed in my previous status report was impractical, as I was creating and playing the click track on the server rather than on the actual webpage. To fix this, I had to rewrite my code in .js, using WebAudio API to create the metronome clicks. Unfortunately, I was unable to replicate the clock I had created in Python in Javascript, and instead resorted to recursive TimeOut calls for the intervals between the clicks. But this implementation will create inevitable delay, which would causes successive ticks to drift further and further away from the ‘correct’ timing. To fix this, I would decrease the intervals for every other tick, to make up for time if the previous tick arrived a few ms late. I don’t like this solution too much as it only fixes the delay after it happens, rather than addressing it head on. But, for the range of tempo we’re aiming for, it seems the problem isn’t too exacerbated. If we have more time at the end, I will look to see if there is another, more accurate solution.

I think our groups biggest concern now that Jackson’s figured out how to implement monitoring is the UI. I don’t really have much experience with HTML outside of basic social media layouts and our proposed plan for it is much more involved than just a static webpage with some buttons.