Jackson’s Status Report for 5/8

This week, I started working on the final video. I wrote a script and took some “b-roll” screen recordings of the app. I have not yet edited anything together, but I have all day tomorrow set aside for that. The script is mostly a clearer restatement of what is in my status report about setting up monitoring with WebRTC. It’s a challenge to try to explain how all of that works while not going well over time, but I did my best with keeping it as concise as possible. We’ll have the video up by Monday night, along with the poster. And speaking of the poster, I’ve also begun working on that. We have our graphics already from the final presentation.

At this point, with only days left before the demo, and cloud deployment still not entirely working, it will be very difficult to debug anything that breaks during deployment, and probably impossible to implement the latency improvement strategies I’ve talked about in previous status reports. With this in mind, our focus in this last week has shifted to the necessary assignments for the class, rather than the “nice-to-have”s for the project itself. So for my deliverables this week, I will edit the video and do my share of the poster and final paper.

Jackson’s Status Report for 5/1

This week, I had again planned to test latency and packet loss using the tests I implemented a few weeks ago. The results look really good just testing locally, but they won’t be meaningful until we can test them between different computers over the internet, which depends on cloud deployment. Christy has begun working on deployment, but since it’s not done yet, I don’t know what parts of our project will “break” as a result of deployment. The plan on our initial schedule was for Ivy and I to make incremental changes to the app during the week, and for Christy to deploy every weekend, beginning on 3/11. We’ve completed about as much as we can before deployment. Because of this, I’m a bit behind schedule, and I’ll try to catch up as soon as cloud deployment is done.

The code isn’t the only part of the project that needed work this past week, though. So I spent a significant amount of time on the final presentation slides. I wrote simplified explanations of the more complicated portions of the project that I worked on: WebSocket signalling, establishing peer-to-peer connections, sending audio over these peer-to-peer connections, and tests for determining end-to-end latency and packet loss rate. A real challenge was conveying all of that information as simply as possible for the presentation format. As you know, I’ve written out very detailed explanations in my status reports, and there’s a lot more to it than I could fit in a few powerpoint slides.

I also made a new Gantt chart showing the schedule and division of labor as they actually happened, and our plan for the last couple weeks of the semester. This is also in the final presentation slides.

For next week, I hope to make significant progress on the poster, video, and final report. Also, if cloud deployment finishes up, I can do any last-minute debugging, determine the actual end-to-end latency and packet loss rate, and do my best to improve them with what little time we have left. But my main priority going forward will be the necessary class assignments.

Jackson’s Status Report for 4/24

This week I did some debugging, integration with my team members’ contributions, as well as the ethics assignment and discussion. My duties (per our planned division of labor in the design review including all monitoring, websockets, peer-to-peer connections, recording and playback in browser, and corresponding server side views) are wrapping up, and everything I have left (driving down latency and packet loss below our threshold for viability) depends on cloud deployment. This is not a task assigned to me, but our original Gantt chart had it scheduled to be done by 3/11, so I may need to do that as well. Cloud deployment is absolutely necessary for our project, so if it’s not started very soon, I’ll plan to start on it before next week, though I’ve never done it successfully, so it may be difficult alone. I am now on schedule, but the group is still behind (especially with regards to cloud deployment), so I will try to pick up some of these tasks where I’m able.

Ivy and Christy have been working on uploading recorded audio to the Django server using a separate URL (/add_track/<room_code>). One consequence of this method is that the main group page establishes websocket connections using the /group/<room_code> URL. Thus, on every upload, the websocket connections I’ve been working on are broken. I tried a few approaches to fix this bug. First, I tried to change the websocket URL to just include the room code. This wasn’t working for some time, and I couldn’t figure out why (I spent more time than I’d like to admit on this), but finally I came across a stack overflow post which essentially said channels will only work if the websocket URL corresponds to the page URL. This approach wasn’t going to work unless I switched from channels to another websocket/server integration library, and channels has worked well for everything else so far, so I needed to find a new solution.

I was able to finally fix this bug with a very simple solution, which is just to redirect the add_track URL to the corresponding group URL immediately after storing the track in the database. As a result, websocket connections with the server are not actually maintained during the upload, but they are instantly and automatically reestablished after the upload. This solution is also cleaner since it ensures that the URL shown on the group page is always the /group/<room_code> URL.

This does however break the peer-to-peer connections, as the Javascript gets reloaded with each new page, but the user can reestablish the peer-to-peer connection with a button click. This isn’t hard to do by any means, but it may be annoying for users. I believe Ivy is looking into asynchronous uploads which shouldn’t involve a page reload, in which case no further integration is needed. If not though, I will look into better solutions this week that will not break the peer-to-peer connections.

Another separate bug I’ve noticed and not been able to fix yet is the way the server keeps track of which users are online/offline at any given moment. As it’s currently implemented, a user is marked online when they create their account or log in, and marked offline again when they log out. An obvious bug here is that users can (and often will) close the site without logging out. But Django doesn’t have a good way of knowing when this happens. The solution may end up being pretty complicated, and the one I have started on is sending websocket messages to all users in your group at regular intervals, just to signal “I’m online.” This seems easy enough, until you realize that the channel layer which the websockets operate on cannot change the database (this is to prevent issues with synchronization, since the database is changed synchronously and websockets are inherently asynchronous). I think we will have to remove the online/offline bit from the database entirely, and just keep track of that on each user’s browser separately. I will have this done by next week as a deliverable.

On a different note, I’ve been thinking a lot about the Langdon Winner article as it relates to our project. In my mind, our project is a solution to the problem of having to find a physical meeting place to practice and make recorded demos. An online meeting room is more convenient, especially now when so many ensembles and bands have members from all over the world, and perhaps more obviously it’s the safer option in a pandemic. But if a technology similar to our project were to become widely used and completely replace physical practice spaces for musicians, some problems could occur. Mainly, our project requires a very good internet connection, so a wide adaptation of our project could cause a large upward transfer of wealth to internet providers. Secondly, practice spaces often have things that musicians cannot keep in their living spaces (either due to cost or space constraints), like grand pianos, expensive microphones, sound treatment, etc. Without a physical practice space, musicians may be expected to have all of these things in their homes, which could again be more of a financial burden on the user than a convenience. At worst, this could drive people with less money or less space away from collaborative music entirely. Though I don’t think our work for an undergraduate class is going to cause any of this to happen, it has been interesting to think about as someone who is very fascinated by ethics and politics.

Team Status Report for 4/10

What’s Working?

This week, we have made progress on file uploads, as well as measurements of latency and packet loss. Per Jackson’s status report, latency and packet loss measurements are promising, but they are no good to interpret without cloud deployment working yet. Locally, communicating between browsers, latency is around 5ms, and packet loss is almost constantly 0%. These are likely just the overhead from WebRTC, and will start to get a lot worse when you add in the factor of network speed. Still, it’s helpful to have these measurements working, so they can be interpreted as soon as cloud deployment does work. Only then can we really say if we need to use other means to improve latency (e.g. downsampling the audio being sent).

Risks & Management

Our biggest risk right now is not having enough time to finish everything we planned. With cloud deployment not yet started, and no DAW interface, we likely will not be able to get them both working perfectly by the end of the class. Since our project’s main purpose is to facilitate musical collaboration, cloud deployment is absolutely essential. Therefore, any time allocated for the DAW UI may have to be reallocated to cloud deployment until that’s working. We will likely have to all work on these tasks, replacing our initial division of labor proposed in the design review.

Another risk we have is exporting the recordings into a readable file format for processing and saving. We previously had not considered compatibility with different browsers; the common file formats are not natively supported (.wav, .mp3), so we’d have to do this conversion ourselves. If conversion doesn’t work, there is a different recording implementation that has a function built in for us to convert to .wav files. However, integrating that one could likely break some of the features we already have and likely cost us more labor.

Changes to System Design & Schedule

As of now, no system design changes to report.

While we haven’t finalized any of the changes mentioned above, we will likely have significant changes this week when we make our updated Gantt chart for the interim demo on Wednesday. One option would be to put the rest of our focus on cloud deployment and getting monitoring latency down as much as possible between two remote clients. This way, even if the DAW UI isn’t finished, we will still have a low-latency audio chat site for musicians to practice, and hopefully record as well, even if editing is not a possibility.

Since cloud deployment and the DAW UI are not complete, we will have to cut into our slack time even more. Luckily, we planned for this, and we have the slack time available.

Jackson’s Status Report for 4/10

This week, my task was measuring and possibly improving latency and packet loss rate.

Latency

Measuring latency, like every other part of this project so far, is a much more complex task than I initially thought. For one, the term “latency” is a bit ambiguous. There are multiple different measurements this could mean:

  • “End-to-end delay” (E2E) or “one-way delay”
    This is the time it takes from the moment data is sent by one client to the moment that data is received by another. Since this relies on perfect synchronization between the clocks of each client, this could be difficult to measure.
  • “Round-trip time” (RTT)
    This is the time it takes from the moment data is sent by one client to the moment that data is received back by the same client. This measurement only makes sense if the remote computer is returning the data they receive.

These measurements are certainly not meaningless, but in an audio system, neither one is all that great. To explain this, it’ll be helpful to first go over the signal path again. This is what it looks like when client A is sending audio to client B (very simplified for the purposes of this discussion):

  1. Client A makes a sound.
  2. The sound is sampled and converted to a digital signal.
  3. The browser converts this signal to a MediaStreamTrack object.
  4. WebRTC groups samples into packets.
  5. These packets are sent to client B.
  6. On client B’s computer, WebRTC collects these packets and places them into a jitter buffer.
  7. Once the jitter buffer has accumulated a sufficient amount of samples to make smooth playback possible, the samples in the jitter buffer are played back, allowing client B to finally hear client A’s sound.

Steps 1-5 take place on client A’s computer, and steps 6-7 take place on client B’s computer.

The first thing to notice is that this communication is one-way, so RTT doesn’t make sense here. E2E delay must be approximated. We can do this as described in the design review. Client A sends a timestamp to client B. Client B then compares this timestamp to their current time. The difference between these times is a good approximation for E2E delay, provided the clocks between the computers are synchronized very closely. Luckily, JavaScript provides a funciton, performance.now(), which gives you the elapsed time in milliseconds since the time origin. Time origins can be synchronized by using performance.timing.navigationStart. We now have the E2E delay pretty easily.

But as you can see, E2E delay only measures the time from step 5 to step 6. Intuitively, we want to measure the time from step 1 to step 7. That is the amount of time from the moment a sound is made by client A to the moment that sound is heard by client B. This is the real challenge. The browser doesn’t even have access to the time from stepĀ  1 to step 3, since these happen on hardware or in the OS, so these are out of the picture. Step 3 to step 5 are done by WebRTC with no reliable access to the internals, since these are implemented differently on every browser, and poorly documented at that. As mentioned, we can approximate step 5 to step 6, the E2E delay. All that’s left is step 6 to step 7, and luckily, WebRTC gives this to us through their (also poorly documented) getStats API. After some poking around, I was able to find the size of the jitter buffer in seconds. This time can be added to the E2E delay, giving us the time from step 5 to step 7, and this is probably the best we can do.

So is the latency good enough? Well, maybe. On my local computer, communicating only from one browser to another, latency times calculated in this way are around 5ms, which is very good (as a reminder, our minimum viable product requires latency below 100ms). But this isn’t very useful without first deploying to the cloud and testing between different machines. Cloud deployment is not something I am tasked with, so I’m waiting on my teammates to do this. Per our initial Gantt chart, this should have happened weekly for the past few weeks. As soon as it does, we can interpret the latency test I implemented this week.

Packet Loss Rate

Unlike latency, packet loss rate is actually very simple to get. The amount of packets sent and the amount of packets dropped can both be found using the WebRTC getStats API, the same way I got the size of the jitter buffer. Interpretation of this is again dependent on cloud deployment, but locally the packet loss rate is almost 0%, and I never measured any more than 5% (our threshold for the minimum viable product).

Schedule and Deliverables

As with last week, I am pretty clearly behind, despite working well over 12 hours again this week. At this point, we have used most of our slack time, and it may be smart to think about shrinking our minimum viable product requirements, particularly dealing with the UI. If we don’t have a DAW-like UI by the end of the course, we will at the very least have a low-latency multi-peer audio chatting app which can be used for practicing musicians.

For next week, I will work on polishing what we have before the interim demo, completing the ethics assignment, and possibly UI and cloud deployment if needed. Most of the remainder of our project relies on cloud deployment getting done, and a better UI, so it’s hard to pick a better next step.

Jackson’s Status Report for 4/3

This week, I worked a lot more on monitoring. By the end of last week, I was able to send peer-to-peer text messages between two unique users. This week I extended that to allow for audio streams to be sent, and to work with an arbitrary number of users. These changes required a complete restructuring of all the monitoring code I wrote last week and the week before.

As a reminder, I’m using the WebRTC (real-time communication) API which comes built-in on your browser. WebRTC uses objects called RTCPeerConnection to communicate. I explained this in detail last week, so I won’t get into it here. The basic idea is that each connection requires both a local RTCPeerConnection object and a remote RTCPeerConnection object. If WebRTC interests you, there are many tutorials online, but this one has been by far the most helpful to me.

Connections with Multiple Users:

Last week, I created one local RTCPeerConnection object and an arbitrary amount of remote RTCPeerConnection objects. This made intuitive sense, since I want to broadcast only one audio stream, while receiving an arbitrary number of audio streams from other users. However, this week I learned that each local connection can only be associated with ONE remote user. To get around this, I created an object class called ConnectionPair. The constructor is simple and will help me explain its use:

Each peer-to-peer relationship is represented by one of these ConnectionPair objects. Whenever a user chooses to initiate a connection with another user, a ConnectionPair is created with 4 things:

  1. A new local connection object (RTCPeerConnection)
  2. A new remote connection object (RTCPeerConnection)
  3. A gain node object (GainNode from the web audio API) to control how loud you hear the other user)
  4. The other user’s username (string) which serves as a unique identifier for the ConnectionPair. You can then retrieve any ConnectionPair you’d like by using another function I wrote, getConnection(<username>), whose meaning is self-evident.

So why is this useful? Well, now there can be an arbitrary amount of ConnectionPair objects, and thus you can establish peer-to-peer connections with an arbitrary amount of users!

Sending Audio Over a WebRTC Connection:

Since the WebRTC API and the web media API are both browser built-ins, they actually work very nicely with each other.

To access data from a user’s microphone or camera, the web media API provides you with a function: navigator.MediaDevices.getUserMedia(). This function gives you a MediaStream object, which contains one or more MediaStreamTrack objects (audio and/or video tracks). Each of these tracks can be added to a local WebRTC connection by the function RTCPeerConnection.addTrack(<track>).

And finally, to receive audio on your remote RTCPeerConnection, you can just use the event handler RTCPeerConnection.ontrack, which is triggered every time the remote connection receives a track. This track can be routed to your browser’s audio output, any type of AudioNode (for example, a GainNode, which simply changes the volume of an incoming signal), and out.

User Interface Improvements:

Finally, I added a little bit to the UI just to test the functionality mentioned above. On the group page, here is how the users are now displayed:

In this group, there are 3 members:

  1. Jackson (the user whose screen you’re looking at in this screenshot)
  2. TestUserFname (a test user, who is currently offline)
  3. OtherTestUserFname (another test user, who is currently online)

Next to every online user who isn’t you, I added a button which says “send my audio.” This button initiates a peer-to-peer connection with the user it appears next to. Once the connection is opened, the audio from your microphone is sent to that user.

Lastly, next to each online user, I added a slider for that person’s volume, allowing users to create their own monitor mixes.

We are still a bit behind schedule, since I thought I would have peer-to-peer audio monitoring working a few weeks ago, but I hope it’s clear that I’ve been working very hard, well over 12 hours/week to get it working! The biggest thing I can do to get us back on schedule is continue working overtime. As I said last week though, I think the hardest part is behind us.

With monitoring finally working, I can move onto testing and really driving the latency down. For next week, I will implement the latency and packet loss tests described in our design review and work on getting the numbers to meet our requirements (<100ms latency and <5% packet loss rate).

Team Status Report for 3/27

This week, we’ve made a lot of progress on the monitoring aspect of our project. As outlined in Jackson’s status report, websocket connections between each user and the server are working, as well as WebRTC peer-to-peer connections. Though audio is not sent from browser to browser quite yet, simple text messages work just fine, and sending audio will be completed by next week. In addition, the click track is working, and the user interface has seen big improvements.

There are a few major changes to our design this week:

  1. To allow multiple asynchronous websocket connections open at once, we had to add a Redis server. This runs simultaneously with the Django server, and communicates with the Django server and its database over port 6379. This was based on the tutorial in the channels documentation, though that tutorial uses Docker to run the Redis server, while I just made the Redis server work in a terminal. This change doesn’t have much of a trade off, it’s just a necessary addition to allow asynchronous websocket channels to access the database.
  2. We have decided to use WebRTC for peer-to-peer connections. In our design review, we planned to use TCP, since it gives a virtually 0% packet loss rate. The cost of using TCP is big though when latency is such an issue as it is with music, making it impractical. WebRTC is used for real-time communication and particularly designed for media streaming between users. It uses UDP to send data with the lowest possible latency, and it’s built in to the user’s browser. The only real cost here is that now we do have to worry about packet loss. But for something like music where timing is so critical, we’ve decided that meeting the latency requirement (<100ms) is far more important than the packet loss requirement (<5%).

Since the WebRTC connection is working, we no longer have to worry about not having monitoring. However, we do have a number of other big risks. As I see it, our biggest risk right now is that none of us really know how to create the DAW user interface we showcased in our design review. Visualizing audio as you’re recording and being able to edit the way you can in industry DAWs like Pro Tools/Audacity/Ableton/etc. is going to be a challenge. To mitigate this risk, we will need to put extra work into the UI in the coming weeks, and if this fails, we can still expect to have a good low-latency rehearsal tool for musicians even without the editing functionality.

 

Jackson’s Status Report for 3/27

I worked a lot on many different parts of the project in the past two weeks, and since there was no status report due last week, I’ll write about both weeks. Forgive me, it’s a long one!

Last week, I worked a lot on the server code, setting up Django models for groups, tracks, and recordings. A logged-in user can create a group and add other users to the group, so all group members can edit/record with their group. In each group, members will be able to add audio tracks, which can consist of multiple recordings. To help with group creation, I made a simple homepage:

The forms check for the validity of the user’s inputs and displays error messages if necessary. For example, if you don’t enter a group name, you get this message at the top of the screen:

After you create a group, the server creates a group model for you, and sends you to the corresponding group page that I made:

It looks a little rough right now, but it’s mainly for testing features at the moment. Eventually, this will contain the DAW interface we had in our design review. But this screen is enough to explain what I’ve been working on this week:

At the top, the online users are displayed, along with a “Monitor” button. When you click this button, this will allow you to hear the other users in real-time. Though that hasn’t been fully implemented yet, I think I got it most of the way there. So here’s what actually happens right now:

  1. Before the button is even clicked, as soon as the page is loaded, a websocket connection is opened with the Django (ASGI) server. Simultaneously, your browser creates an SDP (session description protocol) “offer” to connect with other users, which contains all of your computer’s public-facing IP address/port combinations (aka ICE candidates) that other computers can use to connect to you. This is needed so that peer-to-peer connections can be established, since your private IP address/port cannot be connected to as is (for security reasons).
  2. When the button is clicked, your SDP offer from step 1. gets sent to the server over the websocket connection, and the server echoes this offer to every other online user in the group, via their own websocket connections.
  3. When you receive an SDP offer from another user via your websocket connection, your browser generates an SDP “answer,” which is very similar to the SDP offer from step 1. The answer is then sent automatically back to the server via your websocket connection, and then forwarded to the user who sent you the offer via their websocket connection.
  4. When you receive an SDP answer back from a user you’ve requested to connect with, a peer-to-peer connection is finally established! I chose the WebRTC protocol, which is essentially a very low latency way to send data using UDP, intended for media streaming.

Right now, handshaking works perfectly, connections can be established successfully between 2 users, and you can send string messages from one browser to the other. To ensure that it works, I connected two different browsers (using the monitor button) and then shut down the server completely. Even after the server gets shut down, the browsers can still send messages back and forth over the WebRTC data channel! All that remains on the monitoring front is sending an audio stream rather than text messages, and connecting more than 2 users at once. These two things are the deliverables I plan to have completed by next week’s status report.

This probably goes without saying, but I didn’t have any idea how websockets, handshaking, or peer-to-peer connections worked 2 weeks ago. I’ve learned quite a lot from many online tutorials, I’ve made a number of practice Django apps, and I’ve been working well over 12 hours/week, probably closer to 24. In spite of this though, I am a little bit behind, because I didn’t know just how complicated sending audio over a peer-to-peer connection would be. To catch up, I’ll continue working a lot on this, since it’s probably the main functionality of our capstone project and has to get done. The recording aspect will have to either be pushed back, or I will need help from my other group members. Though I’m a bit more familiar with how that works, and I don’t think it’s quite as difficult as the monitoring aspect.

Jackson’s Status Report for 3/13

This week I gave the design review presentation, and worked on code for recording, storing, and playing back audio on the web.

The design review process included writing detailed notes for myself with talking points for each slide, and after receiving the feedback, I’ve begun thinking about ideas for what to add to our design review paper, which is a main deliverable for this coming week. In particular, we were told that our requirements are not use case driven. When we drafted our requirements, I thought we just had to detail what the final product for the class would need in order to be considered successful, and I think we did that. However, after receiving our feedback, it seems I may have had a bit of a misunderstanding about what exactly counts as a requirement. I’m still not quite sure what exactly makes a requirement use case driven, so this will be a point to talk about in our meetings this week.

For our actual project, I finished code this week which uses the built-in Web Audio API to grab audio from any audio input on the computer (it even works with my USB interface!), record it in chunks using a JavaScript MediaRecorder object, store the recording with its own URL, and play it back to the user using an HTMLMediaElement object. To accomplish this I created an FSM-like structure to switch between three modes: STOPPED, RECORDING, and PLAYING.

My biggest deliverables for this week were the design review and a working recording interface, and since those are both done, I am on schedule. My deliverables for next week will be the design review paper and monitoring over websockets.

Team Status Report for 3/6

Our biggest risk remains that sending audio over a socket connection may either not work or not lower latency enough to be used in a recording setting. To manage this risk, we are focusing most of our research efforts on sockets (Jackson and Christy) and synchronization (Ivy). As a contingency plan, our app can still work without the real-time monitoring using a standard HTML form data upload, but it will be significantly less interesting this way.

In our research, we found that other real-time audio communication tools for minimal latency use peer-to-peer connections, instead of or in addition to a web server. This makes sense, since going through a server increases the amount of transactions, which in turn increases the time it takes for data to be sent. Since a peer-to-peer connection seems to be the only way to get latency as low as we need it to be, we decided on a slightly different architecture for the app. This is detailed in our design review, but the basic idea is that audio will be sent from performer to performer over a socket connection. The recorded audio is only sent to the server when one of the musicians hits a “save” button on the project.

Because of this small change in plans, we have a new schedule and Gantt chart, which can be found in our design review slides. The high-level change is that we need more time to work on peer-to-peer communication.