Page 3 – Team B² – Jamming Attack on Voice Recognition Systems

Status Report #5: 10/26 (Eugene)

– Ran timing tests using NTP and Python scripts across machines to identify lower-bound latency. On average, lower-bound response times hover around 100ms, which gives us up to 200ms to react.

– Based off of Spencer’s research into MFCC, we know that we can construct these coefficients in 5ms, which gives us a lot more time to react.

– I scheduled a meeting with Prof. Stern to discuss the use of MFCC. In the meantime, based off of research identifying the significance of the first 13 coefficients, I tried to calculate the mean squared error comparing the MFCC of two audio samples. Correlation is limited and not entirely informative. I hope to learn more in our meeting with Stern on Monday.

Status Report #5: 10/26 (Spencer)

Since audio transcription is super slow, investigated a signal processing based approach to speed up system.
Research on MFCC & its significance wrt speech recognition
Ran tests to check speed of MFCC library (librosa).
Worked on integration of librosa with audio input from previous weeks. Added timing code – librosa can process an audio chunk from prev system in 0.005 sec, which is good news for us.
Next steps: talking to Prof. Stern about MFCC & best way to recognize matching speech. Integration of simple end to end system for in lab demo.

Status Report #4: 10/19 (Cyrus)

Setup venv to handle speech recognition module.
Looked at Spencer’s code involving audio to text conversion for potential improvements and optimizations.
Looked into compiled python as a way to improve performance over interpreted python. Minimal difference in performance (which hints that the program is I/O bound).
Next steps: looking to replicate this in C++ to enhance performance. Spencer and I are diverging at this point to try 2 different approaches, and see which one works. My approach should be sufficient if audio to text is computationally bound. Otherwise signal processing might be required to reduce the dependence on I/O.
Looking to use Tensorflow.

Status Report #4: 10/19 (Eugene)

Helped Cyrus and Spencer get up to speed with venv setup as most of the development thus far has been local on my machine.
Wrote second part of timing code using Pyaudio to timestamp emission of sound.
Investigated NTP solutions for time synchronization for laptops. Looking into using bash script to explicitly set machine times before executing script.

Status Report #4: 10/19 (Spencer)

Setup venv to handle speech recognition module.
Created basic audio -> text proof of concept pipeline using speech recognition module in python.
Measured performance of compiled vs. interpreted python & found no noticeable difference in performance. Performance of this pipeline is really poor and takes > 1 second to run consistently.
Next steps: Investigating ways to use signal processing techniques to enhance performance/response time of basic pipeline. Ex: using MFCC coefficients may be faster than audio to text.
Possible library to look at: (https://github.com/MycroftAI/sonopy)

Looked at design review feedback, and started looking at more concrete metrics on NLP systems.
Worked on the design review document.
Looking into setting up time sync for our testing infrastructure.

Status Report #3: 10/12 (Spencer)

Slack week.

Design Review Presentation

Based on our progress, we’ve created a design review presentation for our new goals. Check it out here!