Shilika’s Status Update for 10/23/20

This week, I built off of the signal processing work we did in the previous weeks to create the output of the signal processing algorithm. The process after reading the original input file is as follows:

  1. We first apply a pre-emphasis on the audio input:
    1. To do this, we use the equation y(t) = x(t) – alpha*x(t-1). The alpha value is a predetermined filter coefficient which is usually 0.95 or 0.97.
    2. By doing so, we will be able to improve the signal to noise ratio by amplifying the signal.
  2. We then frame the updated signal:
    1. Framing is useful because a signal is constantly changing over time. Doing a simple Fourier transform over the whole signal because we would lose the variations through time.
    2. Thus, by taking the Fourier transform of adjacent frames with overlap, we will preserve as much of the original signal as possible.
  3. We are using 20 millisecond frames with 10 millisecond frames.
    1. With the updated signal, we use a hamming window:
    2. A Hamming window reduces the effects of leakage that occurs when performing a Fourier transform on the data.
    3. To apply it, we use a simple line of code in python.
  4. Fourier Transform and Power Spectrum:
    1. We can now do the Fourier Transform on the data and compute the power spectrum to be able to distinguish different audio data from each other.

The output will continue to be modified and enhanced to make our algorithm better, but we have something to input into our neural network now. I began looking into filter banks and mfcc, which are two techniques used to change the data so it is more understandable to the human ear. I will continue this next week and if time allows help the team with the neural network algorithm. 

Team Status Update for 10/16/20

This week, the team continued researching and implementing their respective parts, particularly the implementation portion. A change we made to the facial detection part was in the initial set-up phase. In our proposal presentation, we stated that we wanted to have the initial set-up computed within 5 seconds. However, after testing the program, it turned out that 5 seconds was too short of a time, especially if the user is not used to using the system. We increased this time to 10 seconds. 

Jessica worked on implementing the off-center detection and initial setup phase for the eye detection portion of the facial detection part. When a user’s eyes are wandering around, which constitutes as subpar eye contact, for up to 5 seconds, iRecruit will alert the user that their eyes are not centered. This frame of reference for centered is measured through moments in OpenCV, which calculate the centroid of each iris/pupil image. The center coordinates are calculated for each eye detection, and then the average of all the center coordinates is taken to calculate the reference center coordinates (X and Y). If the user’s eyes differ from this reference center (within a range), they are alerted. She also started testing the eye detection portion, and will continue doing this next week. She will also start looking into the screen alignment portion with facial landmark detection. 

Mohini worked on implementing the signal processing aspect of the project. From her work from last week, the team determined that the time signal representation of the audio recording was not sufficient enough, so this week the audio signal was analyzed in the frequency domain. After meeting with the PhD student, we have a couple of ideas to implement for next week (the Hamming window and the log mel filterbank coefficients). 

Shilika worked on the signal processing portion of the project. She worked with the team to make modifications to the output of the signal processing algorithm. Modifications included splitting the total audio file into 20 millisecond chunks and trimming the file so there is no excess silence. The output still needs further modifications which she will continue working on this coming week. 

Shilika’s Status Report for 10/16/20

This week, I worked with Mohini on the signal processing part. We needed to research and experiment with different ways to trim our audio and scale our x-axis to make all the final outputs the same length. We decided to take a different approach and analyze the Short Term Fourier Transform (STFT) over 20 millisecond chunks of the whole audio file. After splitting the audio file and  applying the fourier transform to each chunk, we plotted the results on a spectrogram. Unlike before, we were able to see slight similarities when we said the same letter multiple times and differences between the different letters. We additionally met with a PhD student who specializes in speech recognition. He gave us tips on how to further hone our input. For example, he recommended we use a Hamming window with a 50% overlap and scale the frequency values so the numbers aren’t too small. 

I believe I am still on schedule. The goal last week was to have an output ready so we could use it as the input for the neural network. Though the output needs more modifications, we were able to come up with a solution. This week, I hope to continue my work in the signal processing portion and add all the modifications that were recommended by the PhD students and solidify the output of the signal processing algorithm. 

Shilika’s Status Report for 10/09/2020

This week, I continued to work on the web application platform, questions database, and signal processing. I completed the html and css of the profile page on our website. This page allows the user to upload a picture of themselves, and contains links to setting up their initial video, picking their skillset, and accessing their previous videos. These links will lead to the pages once we have our facial detection and speech processing functioning.

I also continued to work on the questions database. I completed the behavioral database which contains approximately 100 questions which will randomly be assigned to the user. For the technical database, we are collecting a question, an output example, an output for the user to test their code with, and the correct answer for each question. Additionally, for each category (arrays, linked lists, etc.), we will have easy, medium, and hard questions. So far, I added nine questions with examples, outputs, and output answers using LeetCode and will continue to add questions routinely.

Lastly, I continued the work on the signal processing portion. Continuing the foundation from previous weeks, I gained an understanding of what the input into our neural network should look like. I refined and added to my previous code which stores in a more correct integer array of the audio, breaks the input into small chunks of audio, and outputs the values in a user-friendly format. I worked with Mohini to see if there are any patterns or similarities between each individual letter and were able to find commonalities in the audio signal.

I believe my progress is on schedule. Next week, I hope to continue adding to the technical database and have an input for our neural net. This input will have many iterations of refinement, but my goal is to have proper, calculated values.   

Shilika’s Status Report for 10/02/2020

This week, I created the navigation bars that will be used across the pages in our web application. The top navigation bar has three main components – 1. The menu button allows you to open the side navigation bar. It has two additional buttons, one that leads you to the behavioral interview page and the other that leads you to the technical interview page. 2. The profile button leads you to your profile page. 3. The help button leads you to a page in which our web application features will be explained. The side navigation bar has two buttons that lead you to the behavioral and technical interview pages.

I also began creating the behavioral and technical databases. I used online websites and used common questions that are asked in behavioral/technical interviews for software engineering interviews roles. 

Lastly, I researched the steps of our speech processing algorithm to detect letters that the user will speak. So far, I have been able to successfully read the audio, convert it to an integer array, and graph the audio. These preliminary steps are the foundation of creating the data we will feed into our neural network.

I believe that my progress is on schedule. Next week, I aim to complete the css and html for the user profile page, complete collecting questions for the databases, and get a solid understanding of how the Fourier transform can be used in python to pre-process the audio signal we are receiving.