Status Report #2: 10/5 (Eugene)

  • Did code analysis on basic audio IO demo from last week to identify ways to decrease latency from hearing sound to responding
  • Did analysis on query waveforms to figure out best opportunities based on volume to have program recognize a query has started
  • Set up Alexa and Google Home for testing (1 hour with computing services got nowhere. I learned later that Google Home can’t be set up with G Suite and no one said otherwise)

Status Report #1: 9/28 (Eugene)

This week, we decided to work on black box systems in an experimental phase. After some experimentation in lab, Spencer and I decided to investigate further by meeting with Professor Stern, who is an expert on voice recognition systems. He directed us to an article published by Apple explaining the underlying mechanics of Hey Siri, which he used as evidence to back his claim that obfuscation of mature black-box systems with server-side processing honed by trillion-dollar companies over the last decade would be a difficult project to accomplish with a relatively limited solution space. 

After meeting with Spencer and Cyrus, we decided to pivot the focus and challenge of the exploit to low-latency responses. One new solution/exploit that we are now considering is building an NLP system that can react to “hey Siri”, “OK Google”, and other wake-words as fast as possible. To verify the feasibility of a system like this, I wrote baseline code using Pyaudio to listen for noise and play music as soon as possible. I’ve linked it to a GitHub repo, and you can clone the file to try with any .WAV file (unfortunately, the one I’ve been testing with a copyrighted song, and I don’t want to get arrested for this project).

Next week, I hope to continue identifying jamming signals using a more methodical, experimental approach to better quantify the success rate of certain signals are based off of loudness, distance from the person, speaker, and other metrics. I also want to use Apple’s machine learning paper as a foundation for our NLP system to identify wake-words. Finally, I hope to meet with our advisors and professors to better hone the project details, feasibility, and specification.

 

Project Introduction & Summary

This is the blog page for Team B2’s capstone project. This semester, our team will build a computer program designed to output a signal when someone speaks to a smart speaker, obfuscating their command. While many smart speaker exploits exist, they take advantage of unrealistic setups, such as multiple speakers in one room or playing loud ultrasonic signals that can penetrate walls. Our exploit attempts to prevent smart speaker interaction on commodity hardware, thus preventing access to a home’s IoT network. By the end of the semester, we hope to have created a low-footprint, background process that maliciously runs on a nearby computer to listen for audio and obfuscate commands when someone speaks nearby.