Status Report #5: 10/26 (Eugene)

– Ran timing tests using NTP and Python scripts across machines to identify lower-bound latency. On average, lower-bound response times hover around 100ms, which gives us up to 200ms to react.

– Based off of Spencer’s research into MFCC, we know that we can construct these coefficients in 5ms, which gives us a lot more time to react.

– I scheduled a meeting with Prof. Stern to discuss the use of MFCC. In the meantime, based off of research identifying the significance of the first 13 coefficients, I tried to calculate the mean squared error comparing the MFCC of two audio samples. Correlation is limited and not entirely informative. I hope to learn more in our meeting with Stern on Monday.

Status Report #5: 10/26 (Spencer)

  • Since audio transcription is super slow, investigated a signal processing based approach to speed up system.
  • Research on MFCC & its significance wrt speech recognition
  • Ran tests to check speed of MFCC library (librosa).
  • Worked on integration of librosa with audio input from previous weeks. Added timing code – librosa can process an audio chunk from prev system in 0.005 sec, which is good news for us.
  • Next steps: talking to Prof. Stern about MFCC & best way to recognize matching speech. Integration of simple end to end system for in lab demo.

Status Report #4: 10/19 (Cyrus)

  • Setup venv to handle speech recognition module.
  • Looked at Spencer’s code involving audio to text conversion for potential improvements and optimizations.
  • Looked into compiled python as a way to improve performance over interpreted python. Minimal difference in performance (which hints that the program is I/O bound).
  • Next steps: looking to replicate this in C++ to enhance performance. Spencer and I are diverging at this point to try 2 different approaches, and see which one works. My approach should be sufficient if audio to text is computationally bound. Otherwise signal processing might be required to reduce the dependence on I/O.
  • Looking to use Tensorflow.

Status Report #4: 10/19 (Eugene)

  • Helped Cyrus and Spencer get up to speed with venv setup as most of the development thus far has been local on my machine.
  • Wrote second part of timing code using Pyaudio to timestamp emission of sound.
  • Investigated NTP solutions for time synchronization for laptops. Looking into using bash script to explicitly set machine times before executing script.

Status Report #4: 10/19 (Spencer)

  • Setup venv to handle speech recognition module.
  • Created basic audio -> text proof of concept pipeline using speech recognition module in python.
  • Measured performance of compiled vs. interpreted python & found no noticeable difference in performance. Performance of this pipeline is really poor and takes > 1 second to run consistently.
  • Next steps: Investigating ways to use signal processing techniques to enhance performance/response time of basic pipeline. Ex: using MFCC coefficients may be faster than audio to text.
  • Possible library to look at: (