Status Report #10: (11/30) Spencer

  • Met with Prof. Stern wrt how to proceed in our attack
  • Rewrote Matlab version of attack code
  • Finished various dtw + mfcc implementations
    • Constantly polling in 30ms frames vs. triggering only at a given threshold
    • DTW incorporation causes much lower false positive rate. Can play music or some verbal sounds without triggering automatically, which is far better than our midpoint demo.
    • Seems a little bit too slow running in practice for jamming an iPhone. Could be due to my hardware being older than Eugene’s since the attack seems to work on his computer.

Status Report #9: (11/23) Spencer

  • Met with Prof. Stern multiple times to try resolving the issues we had with our pipeline.
  • Verification of our results with the Librosa MFCC + FastDTW chain is difficult due to the untransparent nature of the code.
  • We switched to Matlab to try making a more transparent, verifiable, and well supported pipeline.
  • Currently we are having trouble creating spectrograms that verify the results that we care about – the warped data should have a similar spectrogram to the reference sample to be warped, with the utterances in the same locations.
  • Will continue to explore fixes with this pipeline.

Status Report #8: (11/16) Spencer

  • Implemented a version of the code with dtw + mfcc mix. Doesn’t seem to be effective, so need to figure out if it is an implementation bug or it should theoretically not work the way we have it currently implemented.
  • Did further reading on dtw to try understanding why this method is not working as expected.
  • Thinking about adding adaptive filter (https://pypi.org/project/adaptfilt/) to potentially improve performance.
  • Will talk to Prof. Stern next week re: dtw and figure out how to proceed.

Status Report #7: (11/9) Spencer

  • Explored viability of using dynamic time warping for more accurate MFCC prediction.
  • Read FastDTW: uses time/space efficient method in O(n) time and space rather than O(n^2) time and space to perform a good approximation of the DTW algorithm. https://pdfs.semanticscholar.org/05a2/0cde15e172fc82f32774dd0cf4fe5827cad2.pdf
  • Exploring integration of FastDTW module on python: https://pypi.org/project/fastdtw/

Status Report #5: 10/26 (Spencer)

  • Since audio transcription is super slow, investigated a signal processing based approach to speed up system.
  • Research on MFCC & its significance wrt speech recognition
  • Ran tests to check speed of MFCC library (librosa).
  • Worked on integration of librosa with audio input from previous weeks. Added timing code – librosa can process an audio chunk from prev system in 0.005 sec, which is good news for us.
  • Next steps: talking to Prof. Stern about MFCC & best way to recognize matching speech. Integration of simple end to end system for in lab demo.

Status Report #4: 10/19 (Spencer)

  • Setup venv to handle speech recognition module.
  • Created basic audio -> text proof of concept pipeline using speech recognition module in python.
  • Measured performance of compiled vs. interpreted python & found no noticeable difference in performance. Performance of this pipeline is really poor and takes > 1 second to run consistently.
  • Next steps: Investigating ways to use signal processing techniques to enhance performance/response time of basic pipeline. Ex: using MFCC coefficients may be faster than audio to text.
  • Possible library to look at: (https://github.com/MycroftAI/sonopy)