Team Weekly Status Report for 2/11

The most significant risk for our project right now is the timeline. We have a very strict timeline where we aim to complete much of the research and outlining process for the signal processing within the next week (two weeks total including this week). While we feel it’s a realistic goal, we have found that the process can be very time-consuming and our completion of our research for this week was nearly overdue. The difficulties can compound when it comes to implementing the research in the back-end of our web app, because we have to format the outputs in a way that the computer can process, not simply charts and visual representations that we have been working with for now. There are also elements of our signal processing design that may require further work when we reach the coding stage, as Python has limited signal processing libraries compared to MATLAB, where most of our research is done. Aditya found that after determining the parameters of the STFT of the test signal in MATLAB, he had to work out a different set of parameters while working with the SciPy library’s stft() method.

 

As we have only just completed the design process process and currently we are on track with our schedule of implementation, we have not required any changes to the existing system design.

 

These are some pictures of the frequency algorithm detector Alejandro found in Matlab to detect frequencies. The first image shows an audio of a piano playing the c-scale and the second one is a constant c note. We can see the x-axis being the time doing and the y axis the frequency domain.

 

Our project includes considerations for education and economics. We realize that a lot of people might want to have access to a free easy to use music transcriber. Most transcribers out there come in the form of applications and require subscriptions to them to be able to use them. With our webapp anyone could use it for free. It would especially be useful for teachers who might want to show students the transcription of a specific song they are playing in class for example. It would also make it efficient for people to have a tool that can transcribe short monophonic audios for them, instead of just having to manually transcribe it themselves. Finally, it increases accessibility to music, especially those who may not have the time, financial resources, and other barriers. This may be especially helpful to students and teachers in low-income communities as often their arts and music programs are the first to get cut.

 

 

 

Aditya’s Status Report for 2/11

This week I worked on determining how to use the Short-time Fourier transform to depict an audio signal in both the time and frequency domains at the same time. I recorded several basic audio samples using my iPhone of me playing the piano; one was a C note held for a couple seconds, another was a C scale ascending for five notes. I started by using MATLAB, as that is the system most designed for this kind of signal processing and plotting. I wanted to determine how to isolate only the relevant frequencies in a signal and determine at what point in time they are at a relevant magnitude.

I used this method from the scipy library to generate the STFT:

f, t, Zxx = stft(time_domain_sig, fs=sample_rate, window = 'hann', nperseg = sixteenth, noverlap = sixteenth // 8);

The key difficulty here was determining the parameters of the STFT. The functions asks us to pick a window shape, as well as the size of the window and how much the sliding window should overlap. I attempted to use a rectangular window at the size corresponding to one second of the audio clip, but I realized that a smaller-sized Hann window worked better to account for the signal’s constantly changing magnitude. I also assigned a window size of 1/2 the sample rate, because each note of the scale I played for about 1 second, meaning I would have 2 windows applying the Fourier transform to each note. I translated this code to a python script and wrote a method that takes a file name as input and generates the STFT.

This code resulted in the following graph:

You can see the earlier pulses are more accurate to the start of the pulse than the later ones, suggesting a smaller window is needed to get accurate DFTs. The cost of this accuracy is a slower and more redundant process of obtaining the frequency-domain representation of the signal.

I am on schedule for researching the frequency domain representations of the audio signals, because my goal was to have a proper back-end representation of the signal’s magnitude at relevant frequencies. My plan for the next week is to fine-tune this data to be more accurate and write code to detect which frequencies correspond to which musical note and generate a dictionary-like representation of the note, it’s magnitude, and it’s length. My deliverable will be the python code which outputs an easily comprehensible list of notes in text form.