Mohini’s Status Report for 11/13/2020

This week, I worked on a multitude of things. I started with looking into the speech recognition algorithm and thinking of possible ways to increase the accuracy. I created more training data which helped increase the accuracy by about 5%. I also tested the workflow and integration of the algorithm significantly, making sure that the signal processing and machine learning components work well together. 

Second, I worked on the web application this week. I spent a little bit of time cleaning up the backend logic for users to create a new account and login into iRecruit. This included fixing a minor bug so now the user’s username appears on the dashboard and the navigation bar. I also created a user database to store the information of any user that makes an account with iRecruit. This database will be utilized in the profile page to keep track of each individual user’s completed behavioral and technical interview practices. Additionally, I worked on the tips page and researched and added in tips for technical interviewing for the user’s convenience. 

Majority of my time was spent continuing to work on the technical interview page. I finished creating the questions database so that each of our eight categories have a couple questions. I displayed the user’s chosen category (the output of our speech recognition algorithm) on the webpage as well as a random question associated with that category. I also created an input text box for the user to submit their answer in. Next steps include writing backend code in the Django framework to retrieve the user’s answer and check its accuracy. I also plan on displaying a possible correct answer on the screen, so the user can compare theirs to this sample answer if desired. I will be storing the user’s questions and answers in the database, so that a summary of their practices can be displayed on the profile page.

I believe I am on progress as the skeleton of the technical interview page has been completed. I will spend the rest of the semester trying to improve the speech recognition algorithm and formatting the technical interview page to incorporate the best UI practices; however, I feel that the core of the project has been completed.

 

Team’s Status Update for 11/06/20

This week, we continued working on implementing our respective three portions of the project. We made a design decision for the facial detection portion to give users three options to account for different levels of experience with behavioral interviews. The first option is for users who are of beginner-level, and allows for them to practice with both eye contact and screen alignment. We thought this would be good for users who are unfamiliar with behavioral interviewing or the iRecruit behavior interviewing platform, to give them maximum feedback. The second and third options are for users who are of intermediate-level to advanced-level, and allows for them to practice with either only eye contact or only screen alignment. We thought this would be good for users who know their strengths and weaknesses for behavioral interviewing, and only wish to receive feedback on one technique. 

Jessica worked on implementing the initial setup phase and off-center screen alignment detection for the nose, and updating the behavioral interview page on the web application. She was able to store the X and Y coordinates of the nose into arrays for the first 10 seconds, and then take the average of those coordinates to calculate the frame of reference coordinates. If the current coordinates of the nose for the video frame are not within range of the frame of reference coordinates, the user is alerted with a pop-up message box. She updated the behavioral interview page to give the user an overview and provide them with the three different options. Next week, she is planning to work on the initial setup phase and off-center screen alignment detection for the mouth, and updating the dashboard and technical interview pages.

Mohini worked on integrating the signal processing and machine learning components together with the Django webapp. The output from the signal processing is saved to a text file which is the input to the machine learning algorithm which then outputs the predicted category into a separate text file. Then, Django reads from this text file to display the predicted category on the webpage. When the user records the category of questions they are interested in receiving, the webpage displays the category name through the speech recognition algorithm. The accuracy of this algorithm is quite low, so next steps would be fine tuning the model to increase the accuracy. 

Shilika worked on the css and html for the web application portion, and saving videos to the profile page of the web app. She also worked on the neural network portion of the speech processing aspect. She is researching on improving the accuracy of the current neural network and implementing one more hidden layer in the neural network. Next week, she will continue improving the accuracy of the neural network and in turn the speech recognition.

 

Shilika’s Status Update for 11/06/20

This week, I worked on the web application components and the neural network portion. For the web application, I made the css and html for the login and register pages more user friendly and appealing. The design now properly integrates with the rest of the web pages, as well.

In addition to the css, I continued to work on saving the completed behavioral interview videos on the web page. I have not been able to properly display the video, as a blank video appears in every web browser that I have tried such as Firefox, Safari, and Chrome. 

In the neural network portion, I worked with Mohini to continue the code we are using for our baseline (the neural network homework code from the Machine Learning course at Carnegie Mellon). In the beginning, the neural network was predicting the same output for every training and testing data point we provided. After debugging and testing, we realized that the ordering in which the training data is provided has an effect on the final outcome of the predictions. Despite varying the order of the training data, our accuracy of our testing data is still low at approximately a 42% accuracy rate. To improve the accuracy, I decided to implement an additional hidden layer in the neural network. The changes that this will require in the neural network is integrating an additional layer of hidden layer after first hidden layer, initializing weights associated with the neural network, performing a stochastic gradient descent to optimize the weights/parameters, and connecting the hidden layer with the two output classes. I am currently working on applying SGD on the parameters and have been running into index out of bound bugs. 

By next week, I hope to have completed this layer and run it to test if the accuracy has increased. I will also continue to research other methodologies to improve the accuracy.  I also hope to figure out displaying completed behavioral interview videos in the django webapp. I am behind in this aspect because I intended to finish it this week. In order to get back on track, I will reach out to my team to help me with this portion as I have not been able to figure it out despite trying multiple possibilities that I found through online resources.

Jessica’s Status Update for 11/06/2020

This week, I worked on implementing the initial setup phase and off-center screen alignment detection for the nose, and updating the web application for the behavioral interview page. For the initial setup phase, it is similar to the eye contact portion, where the coordinates (X, Y) of the nose are stored into arrays for the first 10 seconds. Then, the average of the coordinates are taken, and that gives us the coordinates that will serve as the frame of reference for what is “center.” For the off-center screen alignment detection for the nose, we check if the current coordinates of the nose for the video frame are within range of the frame of reference coordinates. If they are not, we alert the user to align their face with a pop-up message box. 

One change that we made this week was that we decided to split up the facial detection portion into three different options. We were thinking about it from the user perspective, and thought that it would be good to account for different levels of experience with behavioral interviewing. The first option is for beginner-level users, who are unfamiliar with the iRecruit behavioral interviewing platform or with behavior interviews in general. It allows for users to practice with both eye contact and screen alignment, so iRecruit will provide real-time feedback for both aspects. The second and third options are for intermediate-level to advanced-level users, who are familiar with behavioral interviewing and know what they would like to improve upon. The second option allows for users to practice with only eye contact and the third option allows for users to practice with only screen alignment. We thought this would be useful if a user knows their strengths and only wants to practice with feedback on one of the interview tactics. I separated these three options into three different code files (facial_detection.py, eye_contact.py, and screen_alignment.py).

I was able to update the web application for the behavioral interview page (see image below) to make the interface more detailed and user-friendly. The page gives an overview and describes the various options available. I was able to learn more about Django, HTML, and CSS from this, which was very helpful! I believe that we are making good progress with the facial detection part. Next week, I plan on working on the initial setup phase and off-center screen alignment detection for the mouth. This will probably wrap up the main technical implementation for the facial landmark detection portion. I also plan on updating the user interface for the dashboard and technical interview pages on the web application.

Mohini’s Status Update for 11/06/2020

This week, I worked on integrating the signal processing and machine learning algorithms in order to create a complete speech recognition implementation. First, I finished creating the training data. This involved recording myself speaking a word and letting the algorithm run that results in the binary representation of the data being stored in a text file. I manually appended the contents in this temporary text file as well as the English representation of the word to my training data text file. I decided to record each of the 8 categories 8 different times for a total of 64 samples in the training data set. This process was quite tedious and took a couple hours to complete as I had to wait for the signal processing algorithm to run for each sample. I used a similar approach to create the testing data set. Currently, there are only 7 samples in it, but I will add more samples in the upcoming future.

Next, I used the training and test data sets as the input to the neural network implementation. I coded the baseline of this implementation from scratch last year for my 10301: Introduction to Machine Learning course. I had to tweak a few things in order to adapt the code for the purposes of this project. One challenge was formatting the datasets in the best way so that the reading and processing of those files is as simplified as possible. Another challenge was ordering the data in the training dataset in the optimal order as changing the order of the data had a significant impact on the accuracy of the model. For example, I noticed that the accuracy of the model decreased if the training dataset had multiple samples of the same word in a row. After overcoming these obstacles, I modified the stochastic gradient descent algorithm to work with these datasets and fine tuned the parameter matrices. Then I wrote a predict function using the optimal parameter matrices determined from training the neural network in order to predict the corresponding English word for each sample in the test data set. Currently, this accuracy is at 42%, but I will work on improving this in the upcoming future. 

Finally, I integrated the speech recognition implementation with the Django web app so that the user can record themselves from the technical interview page and the algorithm returns the predicted word. This word is then displayed on the technical interview page. Next steps will include improving the accuracy of this algorithm and picking a question from the corresponding category to display on the screen. 

The image below is a snapshot of the training data set. Each sample is of length 15000.

Team’s Status Update for 10/30/20

This week, we continued working on implementing our respective portions of the project, making progress in the three main parts. Jessica worked on implementing a center of frame reference, facial landmark detection, and testing the eye detection portion. She thought it would be helpful to have a centered guideline for users to position themselves accordingly during the initial setup phase, so that they have a reference for the center of the video screen. She continued working on the facial landmark detection, and was able to get the coordinates of the center of the nose and mouth. The eye detection portion was also tested more, and the results seem to align with the accuracy goal. Next week, she will work on completing the initial setup phase for the facial landmark detection, and would like to complete the off-center screen alignment portion for the nose as well. She will also continue testing both the eye detection and screen alignment parts. 

Mohini finalized the signal processing algorithm and started making the training data set for the neural network algorithm. This week concluded the signal processing portion of our project, so I will be focusing on the machine learning portion as well as integrating the different components of our project together for the rest of the semester. Next week, I will be working on testing the neural network after finishing generating the rest of the training data set. 

Shilika began reviewing the neural network concepts, as this is the next technical aspect of the technical interview portion that she will help tackle. She also continued to work on the web application to improve the css and features that appear in the front-end to make the app more user friendly. Next week, she will continue working on the web application and the neural network. 

Shilika’s Status Report for 10/30/20

This week after finalizing the output of the signal processing, I began to review the concepts of a neural network which will be the next technical portion of our project. I will be working with Mohini to improve the neural network that we created in a Machine Learning course we previously took. In this algorithm, we use a single layer neural network that uses a sigmoid activation function for the hidden layer, a softmax function on the output layer, and the cross-entropy loss function to gauge the accuracy of our model. I reviewed the concepts behind these activation functions and how the output layer is formed using the input layer and hidden layers. 

I additionally started working on the web application components of our project again. I worked on how to run the java code in django and used the “copy path” command to be able to run the code from a separate direction. I also began working on the profile page again which is where the user will be able to save their skill set and view previously recorded behavioral interviews. I improved the css for the profile page to make it more user friendly and began to look at saving the videos locally in django.

Next week, my goal is to be able to save the videos on django and allow the user to upload a profile photo to the profile page. Additionally as soon as our training data is ready, I start implementing ways in which our neural network can be improved to classify our 8 outputs.

Mohini’s Status Report for 10/30/2020

This week, I finalized the output for the signal processing algorithm. I applied the Mel scale filterbank implementation to simplify the dimension of the output to 40 x 257. Once I verified that the output seemed reasonable, my next steps were to determine the best way to feed it into the neural network. I experimented with passing the Mel scale filterbank representation, but these matrices seemed too similar between different words. Since it was the spectrogram visual representation that differed between words, I decided to save it as an image and pass the grey scaled version of the image as the input into the neural network. 

Once I decided this was the best way to represent the input vector, I began creating the training data for the neural network. We currently have about 8 different categories that users can pick their technical questions from. I plan to generate about 10 samples for each category as initial training data. To make a good model, I’d need to generate close to 1000 samples of training data. However generating each sample requires me to record the word and run the signal processing algorithm which takes a few minutes. Since this process is somewhat slow, I don’t think it’d be practical to generate more than a 100 samples of training data. So far, my training data set has approximately 30 samples. 

This week, I also integrated my neural network code with our Django webapp. I wrote my neural network code in Java, so figuring out a way for our Django webapp to access it was a challenge. I ultimately used an “os.system()” command to call my neural net code from the terminal. Next steps include finishing the training data set as well as passing it through the neural network to view the accuracy of the model.

 

Jessica’s Status Update for 10/30/2020

This week, I worked on implementing a center of frame reference, continuing the facial landmark detection, and testing the eye detection portion. I thought it would be useful to the user to give them some guidelines in the beginning as to where to position their face. The current implementation draws two thin lines, one spanning the middle of the video frame horizontally and one spanning the middle of the video frame vertically. At the center of the frame (width/2, height/2), there is a circle, which is ideally where the user would center their nose. I thought that these guidelines would serve as a base of what “centered” is, although they do not have to be followed strictly for the facial detection to work.

I continued implementing the facial landmark detection portion, and worked off of the model that I had of all of the facial coordinates last week. I determined that it would be more helpful to get the locations of the center of the nose and mouth, as these are actual coordinates that we can base the frame of reference off of, instead of an array of the facial landmark coordinates. I was able to locate the coordinates of the center of the nose and mouth (by looping through the array and pinpointing which ones correspond to the nose and mouth) and will be using a similar tactic of storing the coordinates into an array during the initial setup period, and then taking the average of those coordinates to use as the frame of reference.

I tested the eye detection portion some more, and the accuracy seems to be in the range of what we were hoping for. So far, only a few false positives have been detected. I believe we are on a good track for the facial detection portion, with the eye detection portion working as expected and the facial landmark detection being on its way. Next week, I plan on completing the initial setup phase for the facial landmark part, as well as hopefully, the off-center screen alignment portion for the nose. I also will do more testing of the eye detection portion and begin testing the initial setup phase.

Team Status Update for 10/23/2020

This week, we continued to work on implementing our respective portions of the project.

Jessica continued to work on the facial detection portion, specifically looking into the saving of videos, the alerts, and the facial landmark part. She was able to get the video saving portion to work, where the VideoWriter class in OpenCV is used to write video frames to an output file. There are currently two options that exist for the alerts, one with audio and one with visuals. Both are able to capture the user’s attention when their eyes are off-centered. She began looking into the facial landmark detection and has a working baseline. Next week, she is hoping to get the center of the nose and mouth coordinates from the facial landmark detection to use as frames of reference for screen alignment. She is also hoping to do more testing for the eye detection portion. 

Shilika worked on the signal processing algorithm and has an input ready for the neural network. She followed the process of applying a pre-emphasis, framing, windowing, and applying a fourier transform and power spectrum to transform the signal into the frequency domain.

Mohini also continued to work on the signal processing algorithm. The decision the team has made in regards to categorizing entire words, rather than individual letters, reduces many of our anticipated risks. The signals of many of the individual letters were looking quite alike whereas the signals of the different words have distinct differences. This change will simplify our decision greatly and is expected to have a higher accuracy as well. Next steps for Mohini include feeding the signal processing output into the neural network and fine tuning that algorithm.