Status Report #8: (11/16) Cyrus

  • Worked on using DTW to map the time series of one signal to the time series of another signal.
  • Wrote DTW code that returns a path with this mapping, but we are unsure how to use this path to actually compare 2 signals, and determine whether they match. Hence, we decided to meet with Stern to figure out what conclusion we can get from DTW, and how this actually solves some of the issues that we experienced with our initial MFCC implementation.
  • We also started testing under different environments, and realised that ambient noise might be a problem. Hence we are thinking of using adaptive filters to filter noise out.

Status Report #8: (11/16) Eugene

  • Wrote a version of the jammer demo that averaged MFCC samples to compare.
  • Ran benchmarks tests with DTW, but we’re currently blocked on understanding how to use DTW. We plan on meeting with Stern to figure out its use cases.
  • Initial tests don’t yield better performance than comparing against multiple samples individually. I’m going to investigate possibly adding more samples to level out and see how this changes.

Status Report #8: (11/16) Spencer

  • Implemented a version of the code with dtw + mfcc mix. Doesn’t seem to be effective, so need to figure out if it is an implementation bug or it should theoretically not work the way we have it currently implemented.
  • Did further reading on dtw to try understanding why this method is not working as expected.
  • Thinking about adding adaptive filter (https://pypi.org/project/adaptfilt/) to potentially improve performance.
  • Will talk to Prof. Stern next week re: dtw and figure out how to proceed.

Status Report #7: (11/9) Eugene

  • Configured project demo, tweaking the exploit’s volume output and delay for recognition.
  • Looking into normalization of signals to help factor out volume in recognizing wake words.
  • Further research into what to do with MFCC’s: correlation is pretty low for audio sample analysis: https://www.researchgate.net/post/Why_we_take_only_12-13_MFCC_coefficients_in_feature_extraction

Status Report #7: (11/9) Spencer

  • Explored viability of using dynamic time warping for more accurate MFCC prediction.
  • Read FastDTW: uses time/space efficient method in O(n) time and space rather than O(n^2) time and space to perform a good approximation of the DTW algorithm. https://pdfs.semanticscholar.org/05a2/0cde15e172fc82f32774dd0cf4fe5827cad2.pdf
  • Exploring integration of FastDTW module on python: https://pypi.org/project/fastdtw/

Status Report #6: 11/2 (Eugene)

  • Met with Professor Stern to talk about the motivation behind and applications of MFCCs on speech detection. As of now, mean square error is not an excellent indication of correlation between two audio samples, so he recommended that we look into dynamic time warping. Vyas told us that this might extend past the scope of our project in terms of capturing every possible utterance of “Hey Siri”, but it might be useful if MFCCs continue to prove unhelpful.
  • Worked on designing our in-lab demo from end-to-end. Investigating the use of bash scripting to handle time synchronization because research into system time sync through Python has come up unfruitful.

Status Report #6: 11/2 (Cyrus)

  • Had to miss the meeting with Professor Stern due to an onsite interview. Also looked into dynamic time warping (this was suggested by Professor Stern, and hence I had to ramp up on this due to missing the meeting).
  • Worked with Spencer and Eugene to create the demo for the upcoming week. The demo uses a prediction model based on MFCC coefficients.

Status Report #5: 10/26 (Cyrus)

  • Looked into audio transcription using C++, and this made us realise that audio transcription is I/O bound. This confirmed our suspicion that a signal processing based approach was the only way to move forward.
  • Set up the time sync infrastructure with Eugene, and fixed numerous bugs across the 2 python scripts, as well as understanding some of the source code for PyAudio to understand why some of our programs weren’t working as expected.
  • With a better understanding of PyAudio, Eugene and I were able to reduce the lower-bound latency even more (around 100ms).
  • Looked into MFCC coefficients with Spencer, but Spencer and I were unable to come up with an accurate way of comparing these coefficients across 2 different recordings. We are meeting with Professor Stern on Monday to obtain clarity on the same.