Status Report #1: 9/28 (Group Report)

  • Talked to Prof. Vyas about how to reframe the problem, since the problem space we specified is much larger than we can handle in a semester. He was concerned that even our normal goals were quite difficult to do and suggested we reframe the problem into attacking either the wake words or select query phrases.
  • How to reduce latency between when voice detected & audio playback? Seems to take slightly longer than the observed 8ms in our timing code. We are looking into how latency and buffers affect it: We tried lowering the sample rate to fill the audio buffer more quickly but it did not seem to make a difference.
  • Risk management: Professor Vyas we consider jamming one or two specific commands instead of the wake word as a backup. This might be a good alternative if the latency is too much for the current version of the problem that we are targeting, because we do not need to generate the jamming input until the user speaks after saying the wake word (gives us more time).
  • Updated schedule: breaking project into 3 phases to reflect the updated project.
    • First phase: Determining jamming inputs (research phase)
      • Defining sample voice inputs and generate voice recordings 
      • Reducing latency after detection of audio 
      • Set up various black box systems 
      • Testing sample inputs on Siri/Google Home/Alexa
    • Second phase: Wake word detection
      • Building model for wake word detection 
      • Training model to recognize wake word 
      • Generating noise after wake word detected 
      • Detecting when user has stopped speaking
    • Third phase: Timing optimization / generalization of attack
      • Setting up timing infrastructure for testing attack 
      • Investigating model to predict time delay between wake phrase 
      • Building model for wake phrase length prediction 
      • Training/Testing model for wake phrase length prediction 
      • Integration 
      • Performance Tuning
      • Obfuscation from User
  • Next week: we need to find better metrics on how often our voice activated systems correctly interpret queries without attempted interference.

Leave a Reply

Your email address will not be published. Required fields are marked *