Jackson’s Status Report for 5/8

This week, I started working on the final video. I wrote a script and took some “b-roll” screen recordings of the app. I have not yet edited anything together, but I have all day tomorrow set aside for that. The script is mostly a clearer restatement of what is in my status report about setting up monitoring with WebRTC. It’s a challenge to try to explain how all of that works while not going well over time, but I did my best with keeping it as concise as possible. We’ll have the video up by Monday night, along with the poster. And speaking of the poster, I’ve also begun working on that. We have our graphics already from the final presentation.

At this point, with only days left before the demo, and cloud deployment still not entirely working, it will be very difficult to debug anything that breaks during deployment, and probably impossible to implement the latency improvement strategies I’ve talked about in previous status reports. With this in mind, our focus in this last week has shifted to the necessary assignments for the class, rather than the “nice-to-have”s for the project itself. So for my deliverables this week, I will edit the video and do my share of the poster and final paper.

Jackson’s Status Report for 5/1

This week, I had again planned to test latency and packet loss using the tests I implemented a few weeks ago. The results look really good just testing locally, but they won’t be meaningful until we can test them between different computers over the internet, which depends on cloud deployment. Christy has begun working on deployment, but since it’s not done yet, I don’t know what parts of our project will “break” as a result of deployment. The plan on our initial schedule was for Ivy and I to make incremental changes to the app during the week, and for Christy to deploy every weekend, beginning on 3/11. We’ve completed about as much as we can before deployment. Because of this, I’m a bit behind schedule, and I’ll try to catch up as soon as cloud deployment is done.

The code isn’t the only part of the project that needed work this past week, though. So I spent a significant amount of time on the final presentation slides. I wrote simplified explanations of the more complicated portions of the project that I worked on: WebSocket signalling, establishing peer-to-peer connections, sending audio over these peer-to-peer connections, and tests for determining end-to-end latency and packet loss rate. A real challenge was conveying all of that information as simply as possible for the presentation format. As you know, I’ve written out very detailed explanations in my status reports, and there’s a lot more to it than I could fit in a few powerpoint slides.

I also made a new Gantt chart showing the schedule and division of labor as they actually happened, and our plan for the last couple weeks of the semester. This is also in the final presentation slides.

For next week, I hope to make significant progress on the poster, video, and final report. Also, if cloud deployment finishes up, I can do any last-minute debugging, determine the actual end-to-end latency and packet loss rate, and do my best to improve them with what little time we have left. But my main priority going forward will be the necessary class assignments.

Jackson’s Status Report for 4/24

This week I did some debugging, integration with my team members’ contributions, as well as the ethics assignment and discussion. My duties (per our planned division of labor in the design review including all monitoring, websockets, peer-to-peer connections, recording and playback in browser, and corresponding server side views) are wrapping up, and everything I have left (driving down latency and packet loss below our threshold for viability) depends on cloud deployment. This is not a task assigned to me, but our original Gantt chart had it scheduled to be done by 3/11, so I may need to do that as well. Cloud deployment is absolutely necessary for our project, so if it’s not started very soon, I’ll plan to start on it before next week, though I’ve never done it successfully, so it may be difficult alone. I am now on schedule, but the group is still behind (especially with regards to cloud deployment), so I will try to pick up some of these tasks where I’m able.

Ivy and Christy have been working on uploading recorded audio to the Django server using a separate URL (/add_track/<room_code>). One consequence of this method is that the main group page establishes websocket connections using the /group/<room_code> URL. Thus, on every upload, the websocket connections I’ve been working on are broken. I tried a few approaches to fix this bug. First, I tried to change the websocket URL to just include the room code. This wasn’t working for some time, and I couldn’t figure out why (I spent more time than I’d like to admit on this), but finally I came across a stack overflow post which essentially said channels will only work if the websocket URL corresponds to the page URL. This approach wasn’t going to work unless I switched from channels to another websocket/server integration library, and channels has worked well for everything else so far, so I needed to find a new solution.

I was able to finally fix this bug with a very simple solution, which is just to redirect the add_track URL to the corresponding group URL immediately after storing the track in the database. As a result, websocket connections with the server are not actually maintained during the upload, but they are instantly and automatically reestablished after the upload. This solution is also cleaner since it ensures that the URL shown on the group page is always the /group/<room_code> URL.

This does however break the peer-to-peer connections, as the Javascript gets reloaded with each new page, but the user can reestablish the peer-to-peer connection with a button click. This isn’t hard to do by any means, but it may be annoying for users. I believe Ivy is looking into asynchronous uploads which shouldn’t involve a page reload, in which case no further integration is needed. If not though, I will look into better solutions this week that will not break the peer-to-peer connections.

Another separate bug I’ve noticed and not been able to fix yet is the way the server keeps track of which users are online/offline at any given moment. As it’s currently implemented, a user is marked online when they create their account or log in, and marked offline again when they log out. An obvious bug here is that users can (and often will) close the site without logging out. But Django doesn’t have a good way of knowing when this happens. The solution may end up being pretty complicated, and the one I have started on is sending websocket messages to all users in your group at regular intervals, just to signal “I’m online.” This seems easy enough, until you realize that the channel layer which the websockets operate on cannot change the database (this is to prevent issues with synchronization, since the database is changed synchronously and websockets are inherently asynchronous). I think we will have to remove the online/offline bit from the database entirely, and just keep track of that on each user’s browser separately. I will have this done by next week as a deliverable.

On a different note, I’ve been thinking a lot about the Langdon Winner article as it relates to our project. In my mind, our project is a solution to the problem of having to find a physical meeting place to practice and make recorded demos. An online meeting room is more convenient, especially now when so many ensembles and bands have members from all over the world, and perhaps more obviously it’s the safer option in a pandemic. But if a technology similar to our project were to become widely used and completely replace physical practice spaces for musicians, some problems could occur. Mainly, our project requires a very good internet connection, so a wide adaptation of our project could cause a large upward transfer of wealth to internet providers. Secondly, practice spaces often have things that musicians cannot keep in their living spaces (either due to cost or space constraints), like grand pianos, expensive microphones, sound treatment, etc. Without a physical practice space, musicians may be expected to have all of these things in their homes, which could again be more of a financial burden on the user than a convenience. At worst, this could drive people with less money or less space away from collaborative music entirely. Though I don’t think our work for an undergraduate class is going to cause any of this to happen, it has been interesting to think about as someone who is very fascinated by ethics and politics.

Jackson’s Status Report for 4/10

This week, my task was measuring and possibly improving latency and packet loss rate.

Latency

Measuring latency, like every other part of this project so far, is a much more complex task than I initially thought. For one, the term “latency” is a bit ambiguous. There are multiple different measurements this could mean:

  • “End-to-end delay” (E2E) or “one-way delay”
    This is the time it takes from the moment data is sent by one client to the moment that data is received by another. Since this relies on perfect synchronization between the clocks of each client, this could be difficult to measure.
  • “Round-trip time” (RTT)
    This is the time it takes from the moment data is sent by one client to the moment that data is received back by the same client. This measurement only makes sense if the remote computer is returning the data they receive.

These measurements are certainly not meaningless, but in an audio system, neither one is all that great. To explain this, it’ll be helpful to first go over the signal path again. This is what it looks like when client A is sending audio to client B (very simplified for the purposes of this discussion):

  1. Client A makes a sound.
  2. The sound is sampled and converted to a digital signal.
  3. The browser converts this signal to a MediaStreamTrack object.
  4. WebRTC groups samples into packets.
  5. These packets are sent to client B.
  6. On client B’s computer, WebRTC collects these packets and places them into a jitter buffer.
  7. Once the jitter buffer has accumulated a sufficient amount of samples to make smooth playback possible, the samples in the jitter buffer are played back, allowing client B to finally hear client A’s sound.

Steps 1-5 take place on client A’s computer, and steps 6-7 take place on client B’s computer.

The first thing to notice is that this communication is one-way, so RTT doesn’t make sense here. E2E delay must be approximated. We can do this as described in the design review. Client A sends a timestamp to client B. Client B then compares this timestamp to their current time. The difference between these times is a good approximation for E2E delay, provided the clocks between the computers are synchronized very closely. Luckily, JavaScript provides a funciton, performance.now(), which gives you the elapsed time in milliseconds since the time origin. Time origins can be synchronized by using performance.timing.navigationStart. We now have the E2E delay pretty easily.

But as you can see, E2E delay only measures the time from step 5 to step 6. Intuitively, we want to measure the time from step 1 to step 7. That is the amount of time from the moment a sound is made by client A to the moment that sound is heard by client B. This is the real challenge. The browser doesn’t even have access to the time from stepĀ  1 to step 3, since these happen on hardware or in the OS, so these are out of the picture. Step 3 to step 5 are done by WebRTC with no reliable access to the internals, since these are implemented differently on every browser, and poorly documented at that. As mentioned, we can approximate step 5 to step 6, the E2E delay. All that’s left is step 6 to step 7, and luckily, WebRTC gives this to us through their (also poorly documented) getStats API. After some poking around, I was able to find the size of the jitter buffer in seconds. This time can be added to the E2E delay, giving us the time from step 5 to step 7, and this is probably the best we can do.

So is the latency good enough? Well, maybe. On my local computer, communicating only from one browser to another, latency times calculated in this way are around 5ms, which is very good (as a reminder, our minimum viable product requires latency below 100ms). But this isn’t very useful without first deploying to the cloud and testing between different machines. Cloud deployment is not something I am tasked with, so I’m waiting on my teammates to do this. Per our initial Gantt chart, this should have happened weekly for the past few weeks. As soon as it does, we can interpret the latency test I implemented this week.

Packet Loss Rate

Unlike latency, packet loss rate is actually very simple to get. The amount of packets sent and the amount of packets dropped can both be found using the WebRTC getStats API, the same way I got the size of the jitter buffer. Interpretation of this is again dependent on cloud deployment, but locally the packet loss rate is almost 0%, and I never measured any more than 5% (our threshold for the minimum viable product).

Schedule and Deliverables

As with last week, I am pretty clearly behind, despite working well over 12 hours again this week. At this point, we have used most of our slack time, and it may be smart to think about shrinking our minimum viable product requirements, particularly dealing with the UI. If we don’t have a DAW-like UI by the end of the course, we will at the very least have a low-latency multi-peer audio chatting app which can be used for practicing musicians.

For next week, I will work on polishing what we have before the interim demo, completing the ethics assignment, and possibly UI and cloud deployment if needed. Most of the remainder of our project relies on cloud deployment getting done, and a better UI, so it’s hard to pick a better next step.

Jackson’s Status Report for 4/3

This week, I worked a lot more on monitoring. By the end of last week, I was able to send peer-to-peer text messages between two unique users. This week I extended that to allow for audio streams to be sent, and to work with an arbitrary number of users. These changes required a complete restructuring of all the monitoring code I wrote last week and the week before.

As a reminder, I’m using the WebRTC (real-time communication) API which comes built-in on your browser. WebRTC uses objects called RTCPeerConnection to communicate. I explained this in detail last week, so I won’t get into it here. The basic idea is that each connection requires both a local RTCPeerConnection object and a remote RTCPeerConnection object. If WebRTC interests you, there are many tutorials online, but this one has been by far the most helpful to me.

Connections with Multiple Users:

Last week, I created one local RTCPeerConnection object and an arbitrary amount of remote RTCPeerConnection objects. This made intuitive sense, since I want to broadcast only one audio stream, while receiving an arbitrary number of audio streams from other users. However, this week I learned that each local connection can only be associated with ONE remote user. To get around this, I created an object class called ConnectionPair. The constructor is simple and will help me explain its use:

Each peer-to-peer relationship is represented by one of these ConnectionPair objects. Whenever a user chooses to initiate a connection with another user, a ConnectionPair is created with 4 things:

  1. A new local connection object (RTCPeerConnection)
  2. A new remote connection object (RTCPeerConnection)
  3. A gain node object (GainNode from the web audio API) to control how loud you hear the other user)
  4. The other user’s username (string) which serves as a unique identifier for the ConnectionPair. You can then retrieve any ConnectionPair you’d like by using another function I wrote, getConnection(<username>), whose meaning is self-evident.

So why is this useful? Well, now there can be an arbitrary amount of ConnectionPair objects, and thus you can establish peer-to-peer connections with an arbitrary amount of users!

Sending Audio Over a WebRTC Connection:

Since the WebRTC API and the web media API are both browser built-ins, they actually work very nicely with each other.

To access data from a user’s microphone or camera, the web media API provides you with a function: navigator.MediaDevices.getUserMedia(). This function gives you a MediaStream object, which contains one or more MediaStreamTrack objects (audio and/or video tracks). Each of these tracks can be added to a local WebRTC connection by the function RTCPeerConnection.addTrack(<track>).

And finally, to receive audio on your remote RTCPeerConnection, you can just use the event handler RTCPeerConnection.ontrack, which is triggered every time the remote connection receives a track. This track can be routed to your browser’s audio output, any type of AudioNode (for example, a GainNode, which simply changes the volume of an incoming signal), and out.

User Interface Improvements:

Finally, I added a little bit to the UI just to test the functionality mentioned above. On the group page, here is how the users are now displayed:

In this group, there are 3 members:

  1. Jackson (the user whose screen you’re looking at in this screenshot)
  2. TestUserFname (a test user, who is currently offline)
  3. OtherTestUserFname (another test user, who is currently online)

Next to every online user who isn’t you, I added a button which says “send my audio.” This button initiates a peer-to-peer connection with the user it appears next to. Once the connection is opened, the audio from your microphone is sent to that user.

Lastly, next to each online user, I added a slider for that person’s volume, allowing users to create their own monitor mixes.

We are still a bit behind schedule, since I thought I would have peer-to-peer audio monitoring working a few weeks ago, but I hope it’s clear that I’ve been working very hard, well over 12 hours/week to get it working! The biggest thing I can do to get us back on schedule is continue working overtime. As I said last week though, I think the hardest part is behind us.

With monitoring finally working, I can move onto testing and really driving the latency down. For next week, I will implement the latency and packet loss tests described in our design review and work on getting the numbers to meet our requirements (<100ms latency and <5% packet loss rate).

Jackson’s Status Report for 3/27

I worked a lot on many different parts of the project in the past two weeks, and since there was no status report due last week, I’ll write about both weeks. Forgive me, it’s a long one!

Last week, I worked a lot on the server code, setting up Django models for groups, tracks, and recordings. A logged-in user can create a group and add other users to the group, so all group members can edit/record with their group. In each group, members will be able to add audio tracks, which can consist of multiple recordings. To help with group creation, I made a simple homepage:

The forms check for the validity of the user’s inputs and displays error messages if necessary. For example, if you don’t enter a group name, you get this message at the top of the screen:

After you create a group, the server creates a group model for you, and sends you to the corresponding group page that I made:

It looks a little rough right now, but it’s mainly for testing features at the moment. Eventually, this will contain the DAW interface we had in our design review. But this screen is enough to explain what I’ve been working on this week:

At the top, the online users are displayed, along with a “Monitor” button. When you click this button, this will allow you to hear the other users in real-time. Though that hasn’t been fully implemented yet, I think I got it most of the way there. So here’s what actually happens right now:

  1. Before the button is even clicked, as soon as the page is loaded, a websocket connection is opened with the Django (ASGI) server. Simultaneously, your browser creates an SDP (session description protocol) “offer” to connect with other users, which contains all of your computer’s public-facing IP address/port combinations (aka ICE candidates) that other computers can use to connect to you. This is needed so that peer-to-peer connections can be established, since your private IP address/port cannot be connected to as is (for security reasons).
  2. When the button is clicked, your SDP offer from step 1. gets sent to the server over the websocket connection, and the server echoes this offer to every other online user in the group, via their own websocket connections.
  3. When you receive an SDP offer from another user via your websocket connection, your browser generates an SDP “answer,” which is very similar to the SDP offer from step 1. The answer is then sent automatically back to the server via your websocket connection, and then forwarded to the user who sent you the offer via their websocket connection.
  4. When you receive an SDP answer back from a user you’ve requested to connect with, a peer-to-peer connection is finally established! I chose the WebRTC protocol, which is essentially a very low latency way to send data using UDP, intended for media streaming.

Right now, handshaking works perfectly, connections can be established successfully between 2 users, and you can send string messages from one browser to the other. To ensure that it works, I connected two different browsers (using the monitor button) and then shut down the server completely. Even after the server gets shut down, the browsers can still send messages back and forth over the WebRTC data channel! All that remains on the monitoring front is sending an audio stream rather than text messages, and connecting more than 2 users at once. These two things are the deliverables I plan to have completed by next week’s status report.

This probably goes without saying, but I didn’t have any idea how websockets, handshaking, or peer-to-peer connections worked 2 weeks ago. I’ve learned quite a lot from many online tutorials, I’ve made a number of practice Django apps, and I’ve been working well over 12 hours/week, probably closer to 24. In spite of this though, I am a little bit behind, because I didn’t know just how complicated sending audio over a peer-to-peer connection would be. To catch up, I’ll continue working a lot on this, since it’s probably the main functionality of our capstone project and has to get done. The recording aspect will have to either be pushed back, or I will need help from my other group members. Though I’m a bit more familiar with how that works, and I don’t think it’s quite as difficult as the monitoring aspect.

Jackson’s Status Report for 3/13

This week I gave the design review presentation, and worked on code for recording, storing, and playing back audio on the web.

The design review process included writing detailed notes for myself with talking points for each slide, and after receiving the feedback, I’ve begun thinking about ideas for what to add to our design review paper, which is a main deliverable for this coming week. In particular, we were told that our requirements are not use case driven. When we drafted our requirements, I thought we just had to detail what the final product for the class would need in order to be considered successful, and I think we did that. However, after receiving our feedback, it seems I may have had a bit of a misunderstanding about what exactly counts as a requirement. I’m still not quite sure what exactly makes a requirement use case driven, so this will be a point to talk about in our meetings this week.

For our actual project, I finished code this week which uses the built-in Web Audio API to grab audio from any audio input on the computer (it even works with my USB interface!), record it in chunks using a JavaScript MediaRecorder object, store the recording with its own URL, and play it back to the user using an HTMLMediaElement object. To accomplish this I created an FSM-like structure to switch between three modes: STOPPED, RECORDING, and PLAYING.

My biggest deliverables for this week were the design review and a working recording interface, and since those are both done, I am on schedule. My deliverables for next week will be the design review paper and monitoring over websockets.

Jackson’s Status Report for 3/6

This week, I spent a lot of time looking into websockets and how they can be integrated with Django. I updated the server on our git repo to work with the “channels” library, a Python websockets interface for use with Django. This required changing the files to behave as an asynchronous server gateway interface (ASGI), rather than the default web server gateway interface (WSGI). The advantage this provides is that the musicians using our application can receive audio data from the server without having to send requests out at the same time. As a result, the latency can be lowered quite a bit.

Additionally, I worked pretty hard on our design review presentation (to be uploaded 3/7), which included a lot more research on the technologies we plan to use. In addition to research on websockets, I looked specifically at existing technology that does what we plan to do. One example is an application called SoundJack, which is essentially an app for voice calling with minimal latency. While it doesn’t deal with recording or editing at all, Ivy and I were able to talk to each other on SoundJack with latency around 60ms, far lower than we thought was possible. It does this by sending tiny packets (default is 512 samples) at a time using a peer-to-peer architecture.

We are still on schedule to finish in time. Per the updated Gantt chart, my main deliverable this week is a functional audio recording and playback interface on the web.

Jackson’s Status Report for 2/27

This week, I familiarized myself with the Web Audio API. This is crucial to our project since it deals with in-browser audio processing, and none of us have experience with that sort of thing. Though I’ve done a lot of audio DSP, I’ve never used the web APIs meant to do that. What I’ve learned is that it’s actually quite simple, and fairly similar to Nyquist, though a lot less elegant. Audio elements such as sound sources, oscillators, filters, gain controls, etc. are all built into your browser already, and all you have to do is route them all properly.

Following a number of different tutorials, I made a few different small in-browser audio applications. I ran them on a Django server I created with a single view pointing to a url, which loads an html file. This html file loads some javascript, which is where I did most of my work. This post contains some code snippets of interest that make up a small portion of code that I wrote this week.

Firstly, I created a small synthesizer, which generates white noise on a button press with a simple for loop:

The channelData is connected to an audio buffer called “buffer”, which is routed to a gain node before being sent to your computer’s audio output. Generating a click track (which will be part of our final product) can be done in a similar way.

The white noise can be filtered in real time by inserting a biquad filter in between the buffer and the gain node. The syntax for this is very simple as well:

With some Web Audio API basics down, I moved on to the main thing our application needs to do in-browser, which is audio recording and playback. I created a small recorder, which saves CD quality audio in chunks (in real time), and after a set amount of time, it stops recording, creates a blob containing the recorded audio, and plays the audio back to the user. This recording interface was my deliverable this week, so I would say we are still on schedule.

For next week, I will integrate this recorder with the Django server we’re using for the final product, and hopefully get the recorded audio sent to the server. More than likely, I will write a new one from scratch, with the information I’ve learned this week.

Jackson’s Status Report for 2/20

This week, I worked on the Project Proposal slides, I set up the WordPress site with the required pages and formatting, and I worked on familiarizing myself with some technologies we plan to use for our project.

Specifically, I found a JavaScript audio processing interface called Web Audio API, which we plan to use to handle all in-browser audio operations. This includes recording, basic DSP, and some UI components as well like displaying waveforms and spectrograms. I’ve followed a few tutorials on the Web Audio API, since I’m fairly familiar with audio DSP, but not as much with web programming.

In addition to the Web Audio API, I’ve also started experimentation with Python audio processing libraries from this tutorial which will help with any necessary audio manipulation on the server side. Since the main challenges involved in our project are timing-related, server side audio processing will likely not be as important as in-browser manipulation, but some basic processing will of course be necessary.

Right now we are on schedule, though the project is still in the design stage. We need to familiarize ourselves with the specific technologies (like web audio programming in my case) before we can reasonably plan out exactly how the project will be done, and I have made good progress with that.

In the next week, I hope to have some kind of working audio recording app in-browser using the Web Audio API, which can be converted into a Django app for our final project. We also will have received feedback on our project proposal, so we will likely have a more concrete idea of exactly what our project needs to do, and we’ll make any necessary adjustments.