achang2 – Team B2: Talking Piano

December 11, 2022December 11, 2022

Team status report, 10/12/2022

Firstly, we finished the slides for our final presentation. This was also a good opportunity to discuss and plan work to get us ready for the final demo.

This week we began preparing for our demo. Firstly, we have started integrating all the different parts of our project. We also came up with a plan to collect data from users in the next few days, and also during our demo, which will be a great opportunity to do so, as there will be many people there who will interact with the project.

We have also started to write the final report. We are in the process of generating graphs and gathering old meeting notes that will contribute to the final report.

December 11, 2022December 11, 2022

Angela’s status report, 10/12/2022

The beginning of the week, I and my team worked on our powerpoint for the final presentation. As I was the one who would present, I wrote down an outline for the points I wished to make during my presentation and made sure to include what my teammates wanted said.

This week, I spent my time working on the parallelization of a timing bottleneck in the audio processing module. This was a part that parses the audio file for the sliding DFT for each key. I am going to test different numbers of threads on our AWS instances to find the one with the best speedup. The threading library was used to facilitate this. This was a very good learning opportunity for me, as beforehand I was not aware one could be able to enable parallelism in a high level language like Python.

I implemented some new decay calculation methods in my code. Unfortunately, this caused some bugs that made my module non-functional. I hope to resolve these by the demo.

I also spent time helping John Martins debug the virtual piano. There was a bug where only the first element of the array was being passed to the end of the pipeline. Fortunately this was resolved and the piano is able to now play more than one key at once.

November 20, 2022

Angela’s status report, 2022/11/19

This week, my teammates and I decided to fully commit to the virtual piano. For my implementation of the note scheduler, this required small adjustments. Firstly, I removed the limitation of 5 volume levels, as well as the requirement of a minimum volume of 5N, since we don’t have solenoids anymore. This will allow us to reach a wider range of volumes through both extending the floor of the lowest volume as well as allowing for more granularity.

Furthermore, I’ve started to read documentation in preparation to write another part of the project. My teammates and I discussed further testing and presentation methods for our final product, and we’ve decided to use speech recognition both as a testing method as well as a way to present our work. We plan to run speech recognition on both the initial input as well as the final output as a way to measure fidelity. We will also use speech to text modules to create captioning to present our product to the user, in order to allow for easier recognition of what the piano is “saying”. I’ve examined the Uberi module, which seems appropriate for our project. An alternative is wav2letter which is in C++ and offers faster latency. I will discuss latency issues with my teammates at our next meeting to determine where the bottleneck is.

November 12, 2022

Angela’s status report, 2022/11/11

At the beginning of this week, I helped my teammates prepare for the demo. We debugged an issue where the audio file was being saved as a .webm file instead of a .wav file. This opportunity allowed me to read a lot of Django documentation which I’m sure will be helpful when it comes to writing glue code between further parts of the system in the upcoming weeks.

After discussion with the professors during the Wednesday demo and discussions with my teammates afterwards, I began to reconsider the way I was implementing volume in the key pressing. Initially, I had made some assumptions:

1. Since 5N was the minimum force for the sound to be heard, any increment less than 5N would not result in a noticeable difference in sound.

2. Discrete levels of volume were “good enough” for the purposes of our project.

Upon further consideration, I realized that this was unnecessarily limiting the dynamic volume range of our system. Since we are using PWM to control the solenoid force, we can have any force level from 0 to 100% (0 to 25N). Since we need at least 5N to sound a key, this gives us 5 to 25N. I also adjusted the volume level calculations to better reflect the relationship between force and volume. Since the audio processing output gives us amplitude in Pascals (Newton over metres squared) and the distance is a constant, the volume parameter is linear with Newtons. Previously, I mistakenly assumed that the volume was in decibels and had implemented a logarithmic relationship between with two.

November 11, 2022

Team status report, 2022/11/11

At the beginning of this week we worked on our glue code to get ready for the demo. We have now connected the webapp input module and the signal processing module. The system is able to record user speech and start processing it as well as display graphs that represent the processing steps. In the upcoming week, we will complete the glue code for the note scheduler as well as complete its output to the raspberry pi.

On Monday, Byron and we discussed planning for future weeks as well as testing. We have started to plan testing for different parametres, both quantitative and qualitative.

We ran into a problem with the resolution for our audio processing filtering. Our frequency data, due to the sampling rate of the FFT, is a bit low. The granularity of the FFT gives us steps of 14Hz. However, the granularity of the piano keys increases as frequency increases. At the lowest bass notes, the keys are less than 14 Hz apart. As a result, in the lower frequencies, we do not have frequencies mapping to some of the keys.

We have discussed solving this problem with Prof. Sullivan, who suggested a Hamming window. This method would also allow us to lower our sample rate and reduce the size of the dataset we have to work with.

November 6, 2022

Angela’s status report, 2022/11/05

This week I continued to work on the note scheduling module. Last week I completed all the main functions, but I was unhappy with the state of the syllable and phoneme recognition. (Note: a phoneme is an atomic unit of speech, such as a vowel sound or a consonant sound in English).

Phoneme recognition is important for our project as it allows us to know when to lift or press piano keys that have already been pressed, and when to keep them pressed to sustain the sound. This allows for fluid speech-like sounds, as opposed to a stutter.

First, I read about how speech recognition handled syllable recognition. I learned that it was done through volume amplitudes. When someone speaks, the volume of speech dips in between each syllable. I discussed using this method with my team, but we realized that it would fail to account for the phonemes. For example, the words “flies” and “bear” are both monosyllabic, but require multiple phonemes.

I’ve now implemented two different methods for phoneme differentiation.

Method 1. Each frequency at each time interval has its volume compared to its volume in the previous time interval. If it’s louder by a certain threshold, it is pressed again. If it’s the same volume or slightly quieter, it’s held. If it’s much quieter or becomes silent, the key is either lifted and re-pressed with a lower volume or just lifted.

Method 2. At each time interval, the frequencies and their amplitudes are abstracted into a vector. We calculate the multidimensional difference between the vectors at different time intervals. If the difference is larger than a threshold, it will be judged to be a new phoneme and keys will be pressed again.

In the upcoming weeks we will implement ways to create the sounds from the key scheduling module and test both these methods, as well as other methods we think of, on volunteers to determine the best method for phoneme differentiation.

October 30, 2022

Angela’s status report, 2022/10/29

This week, I completed writing the preliminary code for the note scheduler. This module will parse the output of the signal processing module and convert it into discrete keypresses on the piano. This was done in Python. I have committed the code to Github for view by my teammates.

Firstly, the way we’re currently separating syllables is a bit clumsy. We only look at the differences between volumes for each time period. I believe that a more elegant solution is to use something like a k-dimensional difference in volume between one note and the next (where each dimension is a frequency on the piano). We could also employ machine learning for this. A K-th nearest neighbours algorithm would work well to determine the k-dimensional difference in volume. We could also use a neural network since there are so many inputs. I initially proposed a decision tree but I decided there were too many keys/frequencies for this.

Secondly, during my work, I made note of the spatial and temporal locality of the list accesses. Should we decide to convert our project to C to improve timing, I will make sure to take advantage of these to optimize the runtime. There is also much potential for multithreading since we do the same operations on different entries in the list.

October 23, 2022

Angela’s status report, 2022/10/22

During the beginning of the week, I worked with John to write arduino sketches and test our solenoids to ensure that a physical implementation would be feasible and practical. We were concerned firstly with ensuring the utility of the shift registers and secondly with making sure they were able to send bits to the solenoids. Firstly, we tested the shift registers with LEDs. We wrote test cases and passed them through an arduino and out to the shift registers, and finally to the LEDs. We struggled with the timing of the bits. Though it was apparent that the bits were indeed reaching the LEDs, their timing was off and some bits “stayed” for too long; ie. the LEDs did not turn off in time. I deduced that this was due to a timing issue. We were only pulsing the signals to the shift registers, and the time was too short for it to be consistently registered, thus the unintended behaviour. We added small delays between enabling and disabling signals and the LEDs behaved as intended.

On the software side, I began to think of classes with which to store the output. I wrote some class definitions and preliminary class functions with which to use for the time:frequency format of the data we will be working with.

October 9, 2022

Angela’s Status Report, 2022/10/08

At the beginning of the week, I helped prepare the slides and studied the different aspects of the project in preparation for questions. Even though I’m not directly in charge of things like the audio processing, it was a good idea to re-familiarize myself with the concepts, both in anticipation of questions and also to work on the project collaboratively.

I also began the coding process. I have now outlined all the functions of the note scheduling module, and the inputs, outputs, preconditions and postconditions for each. This was possible as Marco and I discussed last week the format for the output of the audio processing module. I have also written pseudo-code for some of the functions.

I expressed interest in helping with some of the audio processing earlier on, especially with writing our custom FFT function. I thought it would be an interesting problem to work on, as it reduces the Fourier’s time complexity from O(n^2) to O(nlogn). As we would have access to AWS resources, we would also be able to further speed up the process with parallelism. I will investigate whether this is necessary for near-real-time speech-to-piano later. It’s possible that the bottleneck would be elsewhere, but if the bottleneck is in the audio processing, it’s good to know we have options for improving the timing here.

October 2, 2022October 2, 2022

Angela’s Status Report, 2022/10/01

This week I met with my teammates to talk about the specifics of implementing our project. We also received our first parts and are excited to start working on our proof-of-concept for the physical piano interface soon.

Something I thought a lot about, as I am in charge of scheduling the keys, is how often to play them. We know from piano manufacturer Yamaha that a piano’s inner mechanisms allow each key to be pressed up to 15 times in one second. This is limited by an element known as the hammer: it strikes the strings of the piano to create sound. It takes it approximately 1/15 of a second to leave the string after each strike.

Even though we will be working on a digital piano and not an acoustic one, digital pianos are made to imitate acoustic pianos (when on the “piano” setting). Therefore, we can assume that digital pianos can also be approximately played 15 times per second. I first decided that I would be scheduling the keys at 15 Hz. Upon further consideration, I realized that human speech creates phonemes at a far slower rate than 15 per second. I also realized that playing keys at 15 Hz as long as there exists a frequency would result in a “stutter”: instead of “Hello” in 2 syllables, we would hear many syllables. This would render the speech both inaccurate and unintelligible.

I decided that the keys should be scheduled as such: at each time period, we should compare the status of each frequency to its status at the previous time period. These statuses will be encoded as booleans: 0 if the frequency is not heard, and 1 if it is. If the frequency is going from 0 to 1, we will play the key. If the frequency is going from 1 to 0, we will release the key. Otherwise, keys will retain their former position. This should result in acceptable fidelity to human speech. The time periods have not been decided yet; we will experiment with different frequencies to determine the best one for evoking speech.