Status Report #11: (12/7) Eugene

  • Finished up our final design report
  • Tested directional microphone, which doesn’t have correct driver configurations to let us read in and output sound
  • Tested HomePod, which is much more resilient to our exploit. We chalk this up to a much more robust system of speakers designed to more effectively parse out sound than a low-power, single-mic, always-on system.

Status Report #9: (11/23) Eugene

  • After meeting with Stern on Wednesday, I spent time trying to apply DTW to MFCC coefficients to identify how to map between two signals. Initial results proved unfruitful.
  • Following Spencer’s and Cyrus’ meeting with Stern on Friday, I tried to re-implement our code in MATLAB for more robust tools, and more support + documentation.
  • Barring installation errors with MATLAB, we are currently blocked on interpreting spectrogram results so will need to meet with Stern (again) to resolve.

Status Report #8: (11/16) Eugene

  • Wrote a version of the jammer demo that averaged MFCC samples to compare.
  • Ran benchmarks tests with DTW, but we’re currently blocked on understanding how to use DTW. We plan on meeting with Stern to figure out its use cases.
  • Initial tests don’t yield better performance than comparing against multiple samples individually. I’m going to investigate possibly adding more samples to level out and see how this changes.

Status Report #7: (11/9) Eugene

  • Configured project demo, tweaking the exploit’s volume output and delay for recognition.
  • Looking into normalization of signals to help factor out volume in recognizing wake words.
  • Further research into what to do with MFCC’s: correlation is pretty low for audio sample analysis:

Status Report #6: 11/2 (Eugene)

  • Met with Professor Stern to talk about the motivation behind and applications of MFCCs on speech detection. As of now, mean square error is not an excellent indication of correlation between two audio samples, so he recommended that we look into dynamic time warping. Vyas told us that this might extend past the scope of our project in terms of capturing every possible utterance of “Hey Siri”, but it might be useful if MFCCs continue to prove unhelpful.
  • Worked on designing our in-lab demo from end-to-end. Investigating the use of bash scripting to handle time synchronization because research into system time sync through Python has come up unfruitful.

Status Report #5: 10/26 (Eugene)

– Ran timing tests using NTP and Python scripts across machines to identify lower-bound latency. On average, lower-bound response times hover around 100ms, which gives us up to 200ms to react.

– Based off of Spencer’s research into MFCC, we know that we can construct these coefficients in 5ms, which gives us a lot more time to react.

– I scheduled a meeting with Prof. Stern to discuss the use of MFCC. In the meantime, based off of research identifying the significance of the first 13 coefficients, I tried to calculate the mean squared error comparing the MFCC of two audio samples. Correlation is limited and not entirely informative. I hope to learn more in our meeting with Stern on Monday.

Status Report #4: 10/19 (Eugene)

  • Helped Cyrus and Spencer get up to speed with venv setup as most of the development thus far has been local on my machine.
  • Wrote second part of timing code using Pyaudio to timestamp emission of sound.
  • Investigated NTP solutions for time synchronization for laptops. Looking into using bash script to explicitly set machine times before executing script.