Aditya’s Status Report 4/8

This week I started working out how to format information within the note data structures so that it’s always readable by Vexflow. I wrote functions in the back-end that went through the output of the integrator of the pitch and rhythm processors and modified the duration of each element, so that it would always be a multiple of 4. This is because our STFT was designed such that 4 windows would correspond to the length of an eight-note in sheet music.

I then needed to make sure each duration is a power of 2. That’s because each subsequent note type (eighth, quarter, half, etc…) is 2 times the duration of the previous type. I iterated through the data and if any value was not a power of 2, I would split it into smaller notes. For example, if a notes was 24 units long, it would be split into notes of length 16 and 8. Then a Tie was added, which in music notation indicates that the durations of two notes are combined.

Here is a demonstration of ties in the music:

As you can see, the first measure has been properly split so that the duration of the measure doesn’t exceed 4 beats (or 32 units in the back end). One bug is the tie connecting the notes at the end of Row 1 and the start of Row 2. This week I plan to fix this by implementing dotted notes, which are another way we handle non-power of 2 note sizes that don’t result in this graphical error.


To check the results of the formatting algorithms, I manually generated several small note lists that were meant to represent the output of the integrator, then ran it through the formatting methods. For example, a list containing only a note of duration 12 would output a new list of length 2, which notes of duration 8 and 4.

In the future, I plan to implement tests that test data of varying tempos and time signatures, such as 3/4 and 6/8. I want the testing code to be robust and able to handle all these time signatures the same way, so that the same function can be used to generate notes from 6/8 as in 3/4, with the parameters of the function being changed.

Aditya’s Status Report for 4/1

This week I worked on using the Vexflow API to transcribe the output of the pitch and rhythm sub-processors. A lot of this involved translating the data from our own Note design structure to fit into the StaveNote class provided by Vexflow. I also had to determine a robust method of dividing the processor outputs into smaller sets of data, because Vexflow is implemented in a stave-by-stave method, meaning you only draw 4 beats at a time before rendering that section of the sheet music. There’s a lot of in-between calculation here as I need to determine whether or not the 4 beats have been completed within a number of notes ranging from 1 to 8. Getting just one calculation error means the whole chart will be off.

By next week I hope to have the rhythm processor fully integrated, as I’m falling behind on that front due to the learning curve of the Vexflow API. I’ve figured out the library now, so things should be smoother from here. I also hope to be able to have a proper song, such as “twinkle twinkle little star” as input instead of brief sequences of notes.

Team Status report for 4/1

Design Changes:

We discovered some needed changes to our design in the process of implementing it. We noticed that even signals with little dead air before the music begins can have a lot of needless rests at the beginning of the transcription. To account for this, we truncated the audio signals to remove any silent audio from before the user starts playing their instrument. We also added a feature that gives the user an option of selecting the tempo at which the audio is played, as we found that attempting to automatically detect the tempo was incredibly complex and unreliable. However, to keep our target users’ limited means in mind, we keep this feature optional because many will not have access to a metronome to ensure they stay on tempo.


The largest risks currently posed to our project is the error present in our calculations of the duration of each note and it’s placement within the sheet music. We find ourselves having to modify the note durations calculated by the rhythm processor in order to have the data fit into Vexflow’s API. This leaves a lot of room for error in the output; for example, 3 eighth-notes could be transcribed as 3 quarter-notes due to compounding change in each note’s length in the process of sending information from back-end to front-end.

Another problem is that the transcription of a very short piece tends to result in a very long output, resulting in a file that people won’t be able to read conveniently as you can’t scroll down a computer screen while playing an instrument.


Our current status is that we are able to display the pitches of each note with very high accuracy, and we are able to accurately detect and transcribe rests, but the rhythm of each note is currently treated as every note being an eighth note.

(Image of transcription being displayed)


Aditya’s Status report for 3/25

My schedule changed this week as after we built the Note design structures, we realized we would have to implement the integrator step of the project in the front-end instead of the back-end, as discussed in our team status report. Because of this, I couldn’t use the output of the integrator in my Vexflow code like I’d originally planned.

Instead, I worked on determining how long each transcription will need to be based on the number of notes in each audio. Vexflow’s library is setup so that you have to manually instantiate each measure of the musical piece before adding notes to it. So, I wrote code that took the length of the input audio and using the baseline time signature of 4/4 determines how many measures (defined as Staves in the Vexflow API) would be required in the transcription.

Once the number of Staves was known, I could setup a 2-d array of locations, to track which measure goes where on the output PDF. I chose that there would always be 4 Staves in each row, so the 2-d array was an N-by-4 matrix where 4*N is the total number of Staves. If the number of Staves isn’t divisible 4, there are “padding” Staves added to the ending so the output still looks neat.

Once the Staves are instantiated, I iterate through the piece and write each note (only the pitches currently, as the integrator is incomplete); the row and column of the desired Stave is determined  based on the index of the Note in the list recieved from the backend; for example, the 5th item in the list will correspond to the 2nd row, 1st Stave of the transcription.

Aditya Status Report March 18

This week I started implementing Vexflow in the front-end of our web-app. My team already had the format of the HTTP Response containing the note information planned out, so I was able to work on this concurrently with my teammates working on the back-end integration of the frequency and rhythm processors.

I used the API found at to implement this. I experimented with downloading the library manually or via npm, but I decided the simplest way would be to access the library via a <script> tag including a link to the unpkg source code provided in the Vexflow tutorial.

The plan is for the back-end to send a list of Note objects containing the pitch and duration data for each note. I started with just the pitch information, automatically assigning each note a duration of a quarter-note to minimize the errors. This was successful, but I found that I ran into an obstacle because Vexflow requires us to manually create each new “Stave” aka a set of 4 beats within a signal. Because of that, the app is currently limited to only transcribing 4 notes at a time. My plan for next week is to write a program to determine how many Staves will be needed based on how many beats are in the input signal.

I also managed to fix a major bug with the pitch processor, where the math code used to determine which note was played often caused rounding errors resulting in an output that was one half-step off of the desired note; for example, reading a D note as a D#.

Aditya’s Status Report 2/25

My work this week primarily focused on the Software area of our project. I worked within the django app to design a method of outputting a list of notes detected within a time signal.

The initial method involved examining a segment of N-frames, where N is one-quarter of the sample rate at a time. Copied the samples in the segment to their own array, then ran the scipy method fft() on it. However, this posed a problem bc the different size of the segment resulted in a less accurate output in the frequency domain.

I changed my process. Instead of creating an array the size of only one segment, I copied the entire time-signal then multiplied it by a box window, so that every sample is 0 outside of the desired time range. This had the same accuracy of frequency that analyzing the entire signal at once would do. The code below is the output for

One major error that’s put me behind schedule is that the size of the window results in capturing portions of the signal with differing frequencies. This results in the calculated fundamental frequency being a note between the two actual notes played: for example, an F# is detected at the border between the F and G notes of the C scale. I plan to make up for this by attempting to use Alejandro’s progress with the rhythm processor. I will try to determine how the pulse-detection algorithm can be applied to the frequency processor to prevent it from calculating the FFT at the points in the signal where a new pulse is detected. As integrating the processors was already part of the plan for next week, it won’t be a serious deviation from the schedule.

>>> from audio_to_freq import *
>>> getNoteList('')
Note detected at [261.41826923]
Note detected at [261.41826923]
Note detected at [261.41826923]
Note detected at [261.41826923]
Note detected at [261.41826923]
Note detected at [293.35508242]
Note detected at [293.35508242]
Note detected at [293.35508242]
Note detected at [330.57177198]
Note detected at [327.60989011]
Note detected at [330.44299451]
Note detected at [329.92788462]
Note detected at [349.37328297]
Note detected at [349.37328297]
Note detected at [348.98695055]
Note detected at [349.75961538]
Note detected at [392.51373626]
Note detected at [391.48351648]
Note detected at [392.64251374]
Note detected at [391.35473901]
Note detected at [392.12740385]
['C', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'F', 'E', 'F', 'F', 'F#', 'F#', 'F', 'F#', 'G#', 'G', 'G#', 'G', 'G#']

This is an example of how the signal is modified to detect the frequency at a given point. The note detected is a middle C.

Aditya Status Report for 2/18

This week I developed code to detect the notes within a frequency array. The code isolates a relevant portion of the array and determines the average frequency of the segment. It then applies a mathematical formula that determines the note associated with a frequency based on the reference note, which in our case is A4 = 440 Hz.

I also built the file hierarchy of a django app and began the process of integrating our starter code into it. I designed a structure to hold information for a Note object, to build a database of notes for the web app to access when it comes to sending the information to the front-end.

My plan for next week is to design and implement the process of applying the Discrete Fourier transform to get the frequency arrays which will be processed by the detectNotes() method, so that I can get a sequence of Notes each of length one-quarter beat which will be integrated with the rhythm processor to determine where each Note should begin. After these have been integrated, the Note objects will be built.

Aditya’s Status Report for 2/11

This week I worked on determining how to use the Short-time Fourier transform to depict an audio signal in both the time and frequency domains at the same time. I recorded several basic audio samples using my iPhone of me playing the piano; one was a C note held for a couple seconds, another was a C scale ascending for five notes. I started by using MATLAB, as that is the system most designed for this kind of signal processing and plotting. I wanted to determine how to isolate only the relevant frequencies in a signal and determine at what point in time they are at a relevant magnitude.

I used this method from the scipy library to generate the STFT:

f, t, Zxx = stft(time_domain_sig, fs=sample_rate, window = 'hann', nperseg = sixteenth, noverlap = sixteenth // 8);

The key difficulty here was determining the parameters of the STFT. The functions asks us to pick a window shape, as well as the size of the window and how much the sliding window should overlap. I attempted to use a rectangular window at the size corresponding to one second of the audio clip, but I realized that a smaller-sized Hann window worked better to account for the signal’s constantly changing magnitude. I also assigned a window size of 1/2 the sample rate, because each note of the scale I played for about 1 second, meaning I would have 2 windows applying the Fourier transform to each note. I translated this code to a python script and wrote a method that takes a file name as input and generates the STFT.

This code resulted in the following graph:

You can see the earlier pulses are more accurate to the start of the pulse than the later ones, suggesting a smaller window is needed to get accurate DFTs. The cost of this accuracy is a slower and more redundant process of obtaining the frequency-domain representation of the signal.

I am on schedule for researching the frequency domain representations of the audio signals, because my goal was to have a proper back-end representation of the signal’s magnitude at relevant frequencies. My plan for the next week is to fine-tune this data to be more accurate and write code to detect which frequencies correspond to which musical note and generate a dictionary-like representation of the note, it’s magnitude, and it’s length. My deliverable will be the python code which outputs an easily comprehensible list of notes in text form.