Marco’s Status Report 10/1/2022

This week I met with the team to finish discussing the design choices that needed to be made regarding our individual contributions. These are some of the questions we answered regarding my work and the project at large:

Where is everything going to be hosted?

We have three options:

  • A large portion of the processing and scheduling can be hosted on AWS
    • The remote server communicates to a Raspberry Pi that sends power to the solenoids that press the keys
    • Pro: We gain access to more compute power
    • Con: Communicating between the users computer and the raspberry pi will be bottle-necked by the internet upload and download speeds of each device
  • All of the computation is hosted on a raspberry pi
    • Pro: We no longer are bottle-necked by transmission rates between devices
    • Con: We lose the compute power necessary to host everything
  • All the computation is hosted on a Jetson Nano
    • Pro: We no longer are bottle-necked by transmission rates between devices
    • Pro: We gain access to more compute power

Were we completely set on implementing the physical device, we would implement it on Jetson Nano. However, if our proof of concept experiment doesn’t go well and we pivot towards a virtual piano, the need for speed goes away, and AWS becomes the best option. With this in mind, we’ve chosen to go with the AWS model, it provides the compute power necessary to host a majority of the processes, and also gives us room to pivot these processes onto a Jetson if the physical interface does finally get built.

How are the frequencies for each sample going to be gathered?

  • The audio file contains information in the time domain
  • We need to sample the audio file for some time in order to collect the frequencies that make up the sounds played within that time
    • If that window is too short, then we get inaccurate readings into what frequencies make up that sound
    • If the window is too long, then we won’t process the incoming sounds at a speed that is pleasurable to the user
    • A natural compromise arises from the sampling requirements of the audio file and the play rate of the piano keys. An audio file is sampled at 44.1kHz (i.e 0.0268ms) and a piano key can be pressed at most 15 times per second (i.e 66.67 ms), check out Angela’s post for more information on how we arrived at that number! At those rates, there are ~2487 audio samples between the moments we can play a key. This window is large and within our timing constraints — exactly what we were looking for!

Are we ‘punching holes’  into the audio or ‘blurring it’ out around the frequencies of the keys?

  • 5kHz is typically the highest energy considered in speech perception research, and 80Hz is typically the fundamental frequency of the adult voice [1]. That’s a range of 5k – 80 = 4920 frequencies that the human voice could be made up of. With only 69 keys (i.e distinct frequencies) if we simply filtered out the energy at those 69 frequencies from the human speech input we’d at best be able to collect (69/4980) = 2% of the frequencies that make up the human input. This is what I refer to as ‘punching holes’ through the input
  • Instead we’ll use the 69 distinct frequencies to collect an average of the nearby frequencies at those values — this is what I refer to as ‘blurring’ the input. By blurring the input as opposed to punching holes in it, we’ll be able to collect more information about the frequencies that make up the incoming speech.

References

  1. Monson, B. B., Hunter, E. J., Lotto, A. J., & Story, B. H. (1AD, January 1). The perceptual significance of high-frequency energy in the human voice. Frontiers. Retrieved October 1, 2022, from https://www.frontiersin.org/articles/10.3389/fpsyg.2014.00587/full

Team Status report for 9/24

This week finished preparing our Gantt chart, which after the feedback we received from our proposal presentation, was refactored include less work within the first 2 weeks of development. John went out to the piano rooms in the Hall of Arts to gather measurements on the piano we’re planning to work on. Below are some sample images of the measurements he took.

The team met on Friday to discuss the feedback from our presentation. One of the questions in our feedback asked, “Why are you interested in using all 88 keys if you’re replicating speech?”. This was an interesting question, initially we had assumed  we’d need all 88 keys. However, Angela and John noticed that if you look at the Mark Rober video we presented in class, the leftmost 1/4 of the keys on the piano aren’t being played! In order to play music on the piano, yes we might need all 88 keys, but as it turns out, adult speech has a much smaller frequency range.

The voiced speech of a typical adult male will have a fundamental frequency from 85 to 155 Hz, and that of a typical adult female from 165 to 255 Hz [1]. From the recording of Marco’s voice, we can also see that the frequencies that make up his voice might also lie in a much smaller range than that of the 88 keys covered by a piano. So, the original question was “Do we need all 88 keys”, the answer might be no! That is exciting because it means we may be able to look into smaller keyboards (which are also cheaper!).

This question will be investigated and hopefully answered by the end of this following week.

References:

1. Baken, R. J. (2000). Clinical Measurement of Speech and Voice, 2nd Edition. London: Taylor and Francis Ltd. (pp. 177), ISBN 1-5659-3869-0.

Marco’s Status Report for 9/24

This week I prepared our proposal slides for Wednesday’s presentation with John’s help. I met with John on Tuesday to help prepare before the presentation. After the feedback we received, the team and I met to reshape our project timeline on the Gantt chart. We’ve been contacting suppliers for the solenoids and have started gathering quotes. We’re planning on buying a small batch of parts (5 solenoids, some mosfets, etc.) in order to build the proof of concept physical interface.

Going into next week I will be making some concept drawings for the frame, and building a prototype of the circuit that drives the solenoids. Tomorrow, we will be meeting again to discuss any implementation details we have questions about before we work on our individual parts.