April 2019 – Team A2: Project LAKE – Logging of Acoustic Keyboard Emanations

Kevin

Accomplishments

- - - - The new PCBs and parts came in this week. We assembled and tested all of them. There appear to be no major issues with any of the assembled boards. We were able to flash all of them successfully. Additionally, we can power them via batteries and charge the batteries in the manor intended. Both microphones are are functioning properly and we are receiving good audio quality.
      - The does appear to be a minor issue with the battery management unit however, on two of the board when powered only through the 5V pin, the output voltage sometimes drops below 3.3V, causing the ESP32 to brown-out. The issue is likely caused by imperfect solder joints, however it not not severe enough to warrant re-work.
      - I tested the current draw of the board during the different power modes. The board is pulling about 120mA when in normal use and 0.7mA when in deep sleep mode.

Upcoming Work

- - Next week we will be giving our final presentation, given by James. We are finishing up the Final Presentation and practicing the talk.
  - We will mostly be focused on getting ready for the demo; practicing how everything will work during the live demo.

James

Accomplishments

- - This week, I worked toward improving the machine learning algorithms toward labeled data, as we were unsuccessful in fully eliminating dropped data.
  - By using labeled data, I was able to achieve a leave-one-out cross validation accuracy of 16%, with a sample size of 1107.
  - Moving forward, using this trained classifier, unlabeled 10-character random passwords were able to be retrieved with fairly decent accuracy. We met our original requirement of guessing 80% of 10-character random passwords in 75 tries or less. We also achieved a successful guess rate of 50% in 5 tries or less, and 40% in 1 try.

Upcoming work

In this next week, we will need to work toward preparing for the demo. I will further tune the parameters the classifier to attempt to improve accuracy. I will construct a sound-proofing box to lower the loud background noise we expect in the gym during the final demo.

Ronit

Accomplishments

The ESP32 is still randomly dropping large chunks of data. As such we have been unable to collect accurate tdoa data.
I have tried switching to the auxiliary peripheral clock, an external, more accurate oscillator on the esp32, however this issue still persists.
For a time we thought that the issue may be because the dma buffers are filling up, we tried increasing the dma buffer and switching to udp. However, this did not seem to fix the issue.
We have so far been unable to ascertain the reason for the dropped data.
I finally tested the esp32’s current draw in deep sleep mode. It come to about 0.7mA. Which is very small compared to its nominal current draw of 0.1-0.3mA during normal data collection and transmission.
At this point we have to rely on frequency features alone. I worked with James to get a fresh new clean set of data. With leave one out cross correlation, we were able to get about a 20-30% error rate.
We generated some confusion matrices and used a breadth first approach to generate the top 75 possible guesses for the password.

Upcoming work

I will continue to investigate the dropped data, but my major focus will be on integration and getting a working demo.

Team Status

Accomplishments

- - This week, the team worked closely together to assemble the final revision of the PCB. Three working boards were assembled.
  - Different options were explored to eliminate dropped packets to aid in TDoA localization and clustering. However, because we were unable to, we have decided to move forward with labeled data.
  - We have achieved good accuracy in classification using labeled data and have met our original requirement for password accuracy.

Upcoming work

In the next week, we will need to focus on preparing for the final demo.

Kevin

Accomplishments

- This week I reviewed the last PCB revision. I placed the order for the PCB through PCBway, same as the previous boards. I also ordered more parts so we have enough to build out 3 of the new PCBs. This should be our last order. We won’t have enough time to buy new parts so I ordered extras of some components that could easily get damaged or lost.
- I worked on improving the keystroke detection using the delta method from earlier. I was able to tune the parameters to get a slightly improved result. However, we switched to using a thresholding method as it seems to perform better.
- I worked with James on collecting longer samples of data. In order to test our clustering performance, we collected about 30 samples from each key on the keyboard. We then experimented with different clustering techniques and features. Kmeans with euclidean distance gave us the best results.
- While collecting the data, we used two boards so that we can also experiment with TDoA data. The TDoA appeared to be working until about one third into the audio clip. At that point the keystrokes moved to 85ms apart from each other, which does not make sense. There may be an issue of adding or dropping samples while transmitting.
- We believe using Cepstral features and TDoA will give us decent clustering results.

Upcoming Work

- Next week I will be focusing on supporting the effort of training and clustering. This will involve collecting data tuning parameters.
- The new PCB should arrive this week, so once that arrives I will be putting it together and verifying its functionality.

James

Accomplishments

- This week, we worked on collecting long samples of 3-way TDoA data. TDoA between the PCB boards was found have high degrees of separation for non-adjacent keys.
- However, when moving to a much longer audio recording (6 minutes), we found that the audio signal between the two microphones were becoming misaligned, with the same keystroke appearing on one microphone over 80ms before the other. This should not be possible, as that would require a distance difference of 27 meters based on the speed of sound through air. We suspect that one or both of the sensor packages is dropping samples.
- We found a more faster, more noise-resistant method of cracking the substitution cipher problem, using quadgram probability data from http://practicalcryptography.com/. This method was able to decipher a 5500 word cipher within 10 minutes. Noise and unknown word boundaries had minimal effect.

Upcoming work

I will need to fine tune the clustering parameters to improve clustering accuracy.

Ronit

Accomplishments

In order to increase the resolution in the TDOA data, we increased the sampling rate from 40kHz to 60kHz. There is an average separation of about 1cm between each key switch on the keyboard. With a higher sampling rate, the number of samples.
We were able to get good separation of keystrokes using tdoa and cepstral features.
We tried using 3-way tdoa, however we were not able to collect good data. The ESP32 dev board with the external mic was not properly impedance matched.
We found an efficient means of solving the substitution cipher using ngrams. We are now able to crack substitution ciphers in less than 10 mins for passages under 400 words.

Upcoming work

We need to collect 3 way tdoa data to get better separation. So we need to build a 3rd PCB.
We need to start integration.

Team Status

Accomplishments

- This week put out the order for our final PCB.
- We found a way to efficiently decoded the substitution cipher.
- We were able to get good clustering using cepstral and tdoa data.

Upcoming work

In the following weeks, we will need to divert more attention to the machine learning aspects of the project, as well as to fine tune much of the signal processing algorithms in order to be more robust and effective.

Status Report 8

Kevin

Accomplishments

This week I finished the placement and routing of the new PCB. It is 2”x1.5” which should allow for portability as well as easy integration into any package size. While routing, I used many pours this time to connect power between the two power management pins and from external power. This should help to produce cleaner power, leading to better signal integrity.
The top layer is a ground pour, similar to the last revision, to make routing much simpler. This time, however, the bottom layer is a 3.3V pour, which allowed me to much more easily route power the devices that needed it.
I also continued researching the Power Level Difference method of sound reduction. I found several descriptive papers that I can possible recreate the method from if we deem it necessary.

Upcoming Work

Next week I will be working on integrating our signal processing code together to make it much easier to use, and make it a complete system. Ronit, James and I will be spending most of our time working on tuning our design and code to get the results we need.
I will also be placing the order for our last PCB and any parts we need.

James

Accomplishments

This week, I rewrote much of the Matlab code in order to allow the different components (filtering, keystroke detection, clustering, etc.) to fit and interface together. This will allow us to test the full system and see how it performs at the current stage in development
I optimized the keystroke separation and feature extraction algorithms to run much faster, allowing us to process 10-minute recordings within 30 seconds.
Lastly, I modified the current data receiving server and TDoA algorithm to accept timestamps taken from an NTP server, allowing for much more accurate determination of the start time of each audio clip.

Upcoming work

This upcoming week, we will begin to fully integrate the different components of the system. We will need to work toward fine tuning the signal processing and machine learning portions of the project.

Ronit

Accomplishments

We had a bug whereby the esp32 would collect data via DMA even when the device was not connected to the laptop, as a result we were not starting the recordings of the two mics at the same time and thus we were unable to collect accurate TDOA.
The fix was making the mic sleep before collecting data and clear the buffers via a soft reset before reconnecting to the laptop.
To further improve the tdoa data, we now have the the nodes synchronise time with network time protocol . The laptop then tells them to start collecting data some time in the future, this ensures that the tdoa data is being recorded from the same period in time.
Machine learning is still still a challenge, i explored the naive bayes approach more as per professor Mai’s advice.
We are also looking at existing substitution cipher solver and seeing if we can leverage them.

Upcoming work

The next week will be more machine learning. We need to find an efficient method of cracking substitution ciphers.

Team Status

Accomplishments

This week, we have completed most of the work for the final revision of our PCB. We have also began finalizing the software running on our ESP32, allowing for NTP synchronization and removing bugs like failing to clear buffers between clips.
We are in the process of organizing the code in order to allow for integration of the individual components of the system.

Upcoming work

In the following weeks, we will need to divert much more attention to the machine learning aspects of the project, as well as to fine tune much of the signal processing algorithms in order to be more robust and effective.

Changes to schedule description

There are no major changes to the schedule.

Status Report 7

Kevin

Accomplishments

- This week I began the final revision of our PCB, named Snorlax.
- This revision has all of the header pins for I/O removed as well as all the header pins for the Vesper wake up mic removed. We decided to power the Vesper at 3.3V since this is within its capabilities and it allows us to remove a linear regulator and related peripherals. I fixed the issue from our previous board by connecting the feedback pin on the buck/boost converter to its Vout pin. Lastly I removed all the LEDs except the one for main power to save space and energy.
- Right now I am aiming to have to board fit on a 2”x1.5” board, which is smaller than our previous 2”x2.5” board. Reducing the size is important because it increases portability.
- I have also been researching ways to reduce non-stationary noise. I have mainly been looking into a technique called Power Level Difference (PLD). PLD is based on having at least two microphones. The general idea is that the signal we want to record is closer to the two microphones that sources of background noise. This means there will be a perceptible difference in power level from close audio source and background sources will have the same power level.
- The papers I am reading take the PLD concept and created Weiner filters based on the difference of power levels between the two signals.

Upcoming Work

- Next week I will be finishing the final PCB and reviewing the design. This needs to be completed in order have the PCB ready for the final demo
- I will also start looking into how to implement the PLD noise reduction algorithm if I have time after completing the PCB. This task is secondary because we can demo our project with a less noisy background if necessary. However, the better noise reduction, the more widely applicable the final device will be.

Ronit

Accomplishments

- I worked with James to support multiple clients(listening devices) in the network stack.
- I worked on reducing the power consumption of the PCB by having the board go into deep sleep mode to further reduce power consumption. In this state, the wifi antenna is powered down and the oscillator is turned off. unfortunately , this also means that the gpios are powered off.
- The mode pin on the vesper microphone has to be set high in order for it to output the digital wakeup signal.
- This means that that the PCB has to have a pullup resistor on the mode pin so that it is set high even when the processor goes to sleep.
- I tried optimizing the naive bayes approach, for decoding the substitution cipher. It now terminates 2 around 1 hour 30 mins for texts ~500 words, but this time seems to go up linearly with the amount of noise. Will have to test more.

Upcoming work

The focus at this stage is purely on machine learning. I have some small tasks left in the esp32, but that should be manageable.

James

Accomplishments

- I made a few more modifications to the server responsible for receiving sound data from the sensor devices. Now, the connection can be terminated from the server side.
- We collected some recordings using two of our sensor boards placed at about 3ft apart, with the keyboard placed in the middle. Unfortunately, we discovered that data left in the buffer would pollute future transmissions. The image below shows this occurring. The data before the red line from microphone 1 corresponded to the previous recording. Everything after was the new recording. We will need to correct this by clearing all DMA and TCP buffers at the end of a transmission.
- Using the data above, I also began working on extracting TDoA data from individual keystrokes. Each keystroke found in one recording is matched to a keystroke within a 40 ms window in the second recording. The two keystrokes are then cross correlated to determine the TDoA. The time differences are visualized in the plot below for each keystroke.

Upcoming work

- In the next week, I will try to help Ronit ensure that buffers are completely cleared after each transaction.
- I will begin incorporating TDoA data into clustering the keystrokes as an additional feature. Notably, there is a currently a risk that the sampling rates are not exactly the same between two boards. This will cause drift in the TDoA values. We may need to use the APLL clock to produce a more accurate clock signal if we find that this is an issue.

Team Status

Accomplishments

- The major accomplishments this week were to begin work on the final PCB design. Many of the debug features were removed, making for a smaller design. All of the final bugs have been worked out regarding the Vesper microphone and the power supply.
- We have begun work on incorporating TDoA data, including modifying the existing data collection server, and processing data collected on multiple microphones in Matlab.
- Lastly, we have worked on lowering power consumption of the device by incorporating the Vesper mic and sleeping.

Upcoming work

In the upcoming weeks, we will need to focus on fully refining the signal processing and machine learning portions of the project. We currently have many individual parts, but have not integrated each component together.

Changes to schedule description

We are still currently behind schedule. We will be making use of much of the slack time we originally allotted to work toward completing the project on time.