Status Report #9: (11/23) Cyrus

  • Met with Stern on Wednesday to try to fix issues with our MFCC + DTW code.
  • The librosa library we were using in Python does not provide enough options for tweaking the algorithm to our specific application, and hence we decided to switch to MATLAB after a second meeting with Professor Stern.
  • Verification of our algorithm requires the use of spectrograms, and we are currently having trouble replicating spectrograms in MATLAB for audio samples whose spectrograms are already known.

 

Status Report #8: (11/16) Cyrus

  • Worked on using DTW to map the time series of one signal to the time series of another signal.
  • Wrote DTW code that returns a path with this mapping, but we are unsure how to use this path to actually compare 2 signals, and determine whether they match. Hence, we decided to meet with Stern to figure out what conclusion we can get from DTW, and how this actually solves some of the issues that we experienced with our initial MFCC implementation.
  • We also started testing under different environments, and realised that ambient noise might be a problem. Hence we are thinking of using adaptive filters to filter noise out.

Status Report #6: 11/2 (Cyrus)

  • Had to miss the meeting with Professor Stern due to an onsite interview. Also looked into dynamic time warping (this was suggested by Professor Stern, and hence I had to ramp up on this due to missing the meeting).
  • Worked with Spencer and Eugene to create the demo for the upcoming week. The demo uses a prediction model based on MFCC coefficients.

Status Report #5: 10/26 (Cyrus)

  • Looked into audio transcription using C++, and this made us realise that audio transcription is I/O bound. This confirmed our suspicion that a signal processing based approach was the only way to move forward.
  • Set up the time sync infrastructure with Eugene, and fixed numerous bugs across the 2 python scripts, as well as understanding some of the source code for PyAudio to understand why some of our programs weren’t working as expected.
  • With a better understanding of PyAudio, Eugene and I were able to reduce the lower-bound latency even more (around 100ms).
  • Looked into MFCC coefficients with Spencer, but Spencer and I were unable to come up with an accurate way of comparing these coefficients across 2 different recordings. We are meeting with Professor Stern on Monday to obtain clarity on the same.

Status Report #4: 10/19 (Cyrus)

  • Setup venv to handle speech recognition module.
  • Looked at Spencer’s code involving audio to text conversion for potential improvements and optimizations.
  • Looked into compiled python as a way to improve performance over interpreted python. Minimal difference in performance (which hints that the program is I/O bound).
  • Next steps: looking to replicate this in C++ to enhance performance. Spencer and I are diverging at this point to try 2 different approaches, and see which one works. My approach should be sufficient if audio to text is computationally bound. Otherwise signal processing might be required to reduce the dependence on I/O.
  • Looking to use Tensorflow.

Status Report #1: 9/28 (Cyrus)

  • Carried out experiments on how to “jam” wake word on Siri, since we did not have google home/alexa yet. Tests were successful with human voices. However, playing a voice recording of the jamming voice in a loop seemed to give around a 50% success rate. (Done with Spencer)
  • Redefined the problem as a latency problem: how do we obfuscate the wake word effectively? Need to hit the “s” sound at the same time as siri.  
  • Latency testing for the program that Eugene wrote using python and pyaudio: good results. It is very fast to detect input + spit out a predefined output. This is without a neural net in the middle. Establishes that what we are doing is possible (done with Spencer).