Jessica’s Status Update for 11/06/2020

This week, I worked on implementing the initial setup phase and off-center screen alignment detection for the nose, and updating the web application for the behavioral interview page. For the initial setup phase, it is similar to the eye contact portion, where the coordinates (X, Y) of the nose are stored into arrays for the first 10 seconds. Then, the average of the coordinates are taken, and that gives us the coordinates that will serve as the frame of reference for what is “center.” For the off-center screen alignment detection for the nose, we check if the current coordinates of the nose for the video frame are within range of the frame of reference coordinates. If they are not, we alert the user to align their face with a pop-up message box. 

One change that we made this week was that we decided to split up the facial detection portion into three different options. We were thinking about it from the user perspective, and thought that it would be good to account for different levels of experience with behavioral interviewing. The first option is for beginner-level users, who are unfamiliar with the iRecruit behavioral interviewing platform or with behavior interviews in general. It allows for users to practice with both eye contact and screen alignment, so iRecruit will provide real-time feedback for both aspects. The second and third options are for intermediate-level to advanced-level users, who are familiar with behavioral interviewing and know what they would like to improve upon. The second option allows for users to practice with only eye contact and the third option allows for users to practice with only screen alignment. We thought this would be useful if a user knows their strengths and only wants to practice with feedback on one of the interview tactics. I separated these three options into three different code files (facial_detection.py, eye_contact.py, and screen_alignment.py).

I was able to update the web application for the behavioral interview page (see image below) to make the interface more detailed and user-friendly. The page gives an overview and describes the various options available. I was able to learn more about Django, HTML, and CSS from this, which was very helpful! I believe that we are making good progress with the facial detection part. Next week, I plan on working on the initial setup phase and off-center screen alignment detection for the mouth. This will probably wrap up the main technical implementation for the facial landmark detection portion. I also plan on updating the user interface for the dashboard and technical interview pages on the web application.

Mohini’s Status Update for 11/06/2020

This week, I worked on integrating the signal processing and machine learning algorithms in order to create a complete speech recognition implementation. First, I finished creating the training data. This involved recording myself speaking a word and letting the algorithm run that results in the binary representation of the data being stored in a text file. I manually appended the contents in this temporary text file as well as the English representation of the word to my training data text file. I decided to record each of the 8 categories 8 different times for a total of 64 samples in the training data set. This process was quite tedious and took a couple hours to complete as I had to wait for the signal processing algorithm to run for each sample. I used a similar approach to create the testing data set. Currently, there are only 7 samples in it, but I will add more samples in the upcoming future.

Next, I used the training and test data sets as the input to the neural network implementation. I coded the baseline of this implementation from scratch last year for my 10301: Introduction to Machine Learning course. I had to tweak a few things in order to adapt the code for the purposes of this project. One challenge was formatting the datasets in the best way so that the reading and processing of those files is as simplified as possible. Another challenge was ordering the data in the training dataset in the optimal order as changing the order of the data had a significant impact on the accuracy of the model. For example, I noticed that the accuracy of the model decreased if the training dataset had multiple samples of the same word in a row. After overcoming these obstacles, I modified the stochastic gradient descent algorithm to work with these datasets and fine tuned the parameter matrices. Then I wrote a predict function using the optimal parameter matrices determined from training the neural network in order to predict the corresponding English word for each sample in the test data set. Currently, this accuracy is at 42%, but I will work on improving this in the upcoming future. 

Finally, I integrated the speech recognition implementation with the Django web app so that the user can record themselves from the technical interview page and the algorithm returns the predicted word. This word is then displayed on the technical interview page. Next steps will include improving the accuracy of this algorithm and picking a question from the corresponding category to display on the screen. 

The image below is a snapshot of the training data set. Each sample is of length 15000.