B2: iRecruit

iRecruit is an interview assistant intended to help software engineering jobseekers practice for the interview process. In today’s society, people are challenged with navigating fully virtual interviews and learning how to conduct themselves during the two key interviews of the application process: behavioral and technical interviews. Although there exist several written guidelines and programming platforms, there is a lack of opportunity to practice simulated interviews. The goal of iRecruit is to provide users with ways to practice for interviews through facial detection for behavioral interviews, and speech recognition (a mix of signal processing and machine learning) for technical interviews.

Here is our final video: http://course.ece.cmu.edu/~ece500/projects/f20-teamb2/2020/12/08/final-video/.

Jessica’s Status Update for 12/04/2020

This week, I worked on combining the eye contact and screen alignment parts together for the facial detection portion, and implemented a way to store summaries about each video recording. I was able to integrate the eye contact part and the screen alignment part for option 1 to alert the user of both subpar eye contact and screen alignment. This required combining the two pieces of code I had written separately for options 2 and 3, so that there are setup phases for both eye detection and facial landmark detection, and that the coordinates of the center of the eyes and the nose/mouth were averaged during the initial 5 seconds. We then have separate frames of references for the eyes, nose, and mouth. In the respective parts of the code, if the current eye, nose, or mouth coordinates are off-center, the user is alerted of the appropriate one (eye contact or screen alignment). 

We were going to store the video recordings at the beginning of our project, and then allow users to view them in the profile section. However, we decided that it would be more helpful if we summarized the feedback from each video recording. There is a common text file (called behavioral_interview_output.txt) to store the video summaries. We calculate the interview number by counting the number of lines in the text file, and retrieve the timestamp of when the video practice took place using the Python datetime library. We keep track of the amount of times that the user had subpar eye contact and/or screen alignment during a video recording using count variables. The interview number, timestamp, subpar eye contact count, and subpar screen alignment count (for options 2 and 3, subpar screen alignment is “N/A” and subpar eye contact is “N/A,” respectively) are appended to the text file. This text file is to be displayed on the behavioral section of the profile page for the user to access.

I believe that we are making good progress for the facial detection portion, as we are wrapping up the technical portions and were able to accomplish a lot of the corresponding profile part as well. Next week, I plan on integrating the text file of the video recording summaries into Django for the web application. I also plan on continuing testing for the eye contact only option, and beginning testing for the screen alignment only option and the integrated eye contact and screen alignment option. I would like to get an idea of the current accuracy of the systems. 

Jessica’s Status Update for 11/20/2020

This week, I worked on integrating the behavioral interview questions that Shilika compiled into the facial detection system and implementing the behavioral portion of the tips page. Shilika created a file at the beginning of the semester that contains a large number of common behavioral interview questions. I took a handful of these questions and placed them into a global array in the eye contact and screen alignment code. Then, using Python’s random library, I choose one of the questions in the array at random. This chosen question is displayed at the top of the video screen while the user records themselves, so that they can refer back to the question whenever they want to.

I also worked on implementing the behavioral portion of the tips page that we decided to change from the original help page last week. I used articles from Indeed and Glassdoor to provide the user with information about behavior interviews, as well as helpful tips and common behavioral interview questions. This gives background to the user on how to prepare for behavioral interviews and what to expect during a typical interview. The tips page will be divided into two sections, one for behavioral and one for technical. I implemented the behavioral section, and Mohini and Shilika will be working on the technical section for technical interview information and tips.

I believe that we are on track for the facial detection portion, as most of the technical implementation is complete and now we are working on improving and fine-tuning. Next week, I plan on working on combining the eye contact and screen alignment parts together for the first option on the behavioral page. I also plan on figuring out how to keep track of the amount of times that the user had subpar eye contact and/or screen alignment during their video recording, so we can provide this information to the user in their profile page.

Jessica’s Status Update for 11/13/2020

This week, I worked on implementing the initial setup phase and off-center screen alignment detection for the mouth, and updating the user interface for the home, dashboard, and technical interview pages on the web application. I decided to change the initial setup phase time back to 5 seconds (the original amount), because after running the program multiple times, I realized that if the user is set up and ready to go, 5 seconds is enough time. 10 seconds required a lot of sitting around and waiting. The initial setup phase and off-center screen alignment detection for the mouth is similar to that of the nose that I worked on last week. The X and Y coordinates of the mouth are stored into separate arrays for the first 5 seconds. We then take the average of the coordinates, which will give us the frame of reference coordinates for what constitutes as “center” for the mouth. For each video frame, we check if the current coordinates of the mouth are within range of the frame of reference coordinates. If they are not (or the nose coordinates are not), then we alert the user with a pop-up message box. If the nose coordinates are not centered, then neither are the mouth coordinates, and vice versa. I wanted to have both the nose and mouth coordinates for points of reference in case the landmark detection for one of them fails unexpectedly.

I also updated the user interface for the home, dashboard, and technical interview pages on the web application to make the pages more detailed and increase usability. For the home page, I adjusted the font and placement for the login and register buttons. For the dashboard, I reformatted the page to match the behavioral interview page. The dashboard is the user home page, which gives them an overview of what iRecruit has to offer and the various options they can navigate to. For the technical interview page, I also reformatted the page to match the behavioral interview page. The technical interview page provides users with information about the different technical question categories and instructions to audio record themselves. 

I believe that we are making good progress, as most of the technical implementation for the facial detection and web application portions are complete at this point. Next week, I plan on integrating the behavioral interview questions that Shilika wrote with the rest of the facial detection code, so that users have a question to answer during the video recording. I also plan on implementing the tips page on the web application. This was originally a help page, but we realized that our dashboard provides all of the information necessary for the user to navigate iRecruit. We thought that it would be better to have an interview tips page, where we give users suggestions on good interviewing techniques, how to practice for interviews, etc.

Jessica’s Status Update for 11/06/2020

This week, I worked on implementing the initial setup phase and off-center screen alignment detection for the nose, and updating the web application for the behavioral interview page. For the initial setup phase, it is similar to the eye contact portion, where the coordinates (X, Y) of the nose are stored into arrays for the first 10 seconds. Then, the average of the coordinates are taken, and that gives us the coordinates that will serve as the frame of reference for what is “center.” For the off-center screen alignment detection for the nose, we check if the current coordinates of the nose for the video frame are within range of the frame of reference coordinates. If they are not, we alert the user to align their face with a pop-up message box. 

One change that we made this week was that we decided to split up the facial detection portion into three different options. We were thinking about it from the user perspective, and thought that it would be good to account for different levels of experience with behavioral interviewing. The first option is for beginner-level users, who are unfamiliar with the iRecruit behavioral interviewing platform or with behavior interviews in general. It allows for users to practice with both eye contact and screen alignment, so iRecruit will provide real-time feedback for both aspects. The second and third options are for intermediate-level to advanced-level users, who are familiar with behavioral interviewing and know what they would like to improve upon. The second option allows for users to practice with only eye contact and the third option allows for users to practice with only screen alignment. We thought this would be useful if a user knows their strengths and only wants to practice with feedback on one of the interview tactics. I separated these three options into three different code files (facial_detection.py, eye_contact.py, and screen_alignment.py).

I was able to update the web application for the behavioral interview page (see image below) to make the interface more detailed and user-friendly. The page gives an overview and describes the various options available. I was able to learn more about Django, HTML, and CSS from this, which was very helpful! I believe that we are making good progress with the facial detection part. Next week, I plan on working on the initial setup phase and off-center screen alignment detection for the mouth. This will probably wrap up the main technical implementation for the facial landmark detection portion. I also plan on updating the user interface for the dashboard and technical interview pages on the web application.

Jessica’s Status Update for 10/30/2020

This week, I worked on implementing a center of frame reference, continuing the facial landmark detection, and testing the eye detection portion. I thought it would be useful to the user to give them some guidelines in the beginning as to where to position their face. The current implementation draws two thin lines, one spanning the middle of the video frame horizontally and one spanning the middle of the video frame vertically. At the center of the frame (width/2, height/2), there is a circle, which is ideally where the user would center their nose. I thought that these guidelines would serve as a base of what “centered” is, although they do not have to be followed strictly for the facial detection to work.

I continued implementing the facial landmark detection portion, and worked off of the model that I had of all of the facial coordinates last week. I determined that it would be more helpful to get the locations of the center of the nose and mouth, as these are actual coordinates that we can base the frame of reference off of, instead of an array of the facial landmark coordinates. I was able to locate the coordinates of the center of the nose and mouth (by looping through the array and pinpointing which ones correspond to the nose and mouth) and will be using a similar tactic of storing the coordinates into an array during the initial setup period, and then taking the average of those coordinates to use as the frame of reference.

I tested the eye detection portion some more, and the accuracy seems to be in the range of what we were hoping for. So far, only a few false positives have been detected. I believe we are on a good track for the facial detection portion, with the eye detection portion working as expected and the facial landmark detection being on its way. Next week, I plan on completing the initial setup phase for the facial landmark part, as well as hopefully, the off-center screen alignment portion for the nose. I also will do more testing of the eye detection portion and begin testing the initial setup phase.

Jessica’s Status Update for 10/23/2020

This week, I worked on implementing the saving of practice interview videos, the alerts given to the user, and the facial landmark part for screen alignment. Each time that the script is run, a video recording begins, and when the user exits out of the recording, it gets saved (currently, to a local directory, but eventually, to a database hopefully). This is done through the OpenCV library in Python. Similar to how the VideoCapture class is used to capture video frames, the VideoWriter class is used to write video frames to a video file. Each video frame is written to the video output created at the beginning of main().

I also worked on implementing the alerts given to the user for subpar eye contact. Originally, I thought of doing an audio alert – particularly playing a bell sound when the user’s eyes are off-center. However, this proved pretty distracting, although effective in getting the user’s attention. Then, I experimented with a message box alert, which pops up when the user’s eyes are off-center. This proved to be another effective way of getting the user’s attention. I plan on experimenting with both of these options some more, but they both work well to alert the user to re-center as of now.

I began researching into the facial landmark portion, and have a basic working model of all of the facial coordinates mapped out. Instead of utilizing each facial feature coordinate, I thought it would be more helpful to get the location of the center of the nose and perhaps the mouth. This way, there are definitive coordinates to use for the frame of reference. If the nose and mouth are off-center, then the rest of the fact is also off-center. Next week, I plan on attempting to get the coordinates of the center of the nose and mouth utilizing facial landmark detection. This requires going through the landmarks array and figuring out which coordinates correspond to which facial feature. I also plan on doing more testing on the eye detection portion, and getting a better sense of the current accuracy.

Jess’ Status Update for 10/16/2020

This week, I worked on implementing off-center detection and initial set-up phase for the eye detection portion of the facial detection part. When a user’s eyes are wandering around (subpar eye contact) for up to 5 seconds, the system will alert the user that their eyes are not centered. Centered in this case means within a certain range of the coordinates detected during the initial setup phase. The center coordinates of the irises/pupils are found through using moments in OpenCV, which find the centroid of the iris/pupil image (https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html). The center is with respect to a specific origin, which is the left edge of each iris (in other words, 0 is the left edge of each iris, not the left edge of the screen). Each eye actually has the same center because of this origin reference.

The center coordinates are calculated for each eye detection and they are stored into an array. After the 10 seconds of initial set-up (changed from 5 from the proposal presentation, because 5 seconds was too short), the average of all the center coordinates is taken to calculate the reference center coordinates (X and Y). This reference center is what the program refers to to calculate whether or not the user’s eyes are “off-center.” I also started doing some formal testing, where I keep track of whether or not a user is alerted within 5 seconds if their eyes are wandering around. If they are, then this constitutes a passing test. If they are not, then this constitutes a failing test (false negative). If the user is alerted, but their eyes were not wandering around, this is also a failing test (false positive).

I believe that I am on-schedule, as getting the off-center detection and initial set-up phase for the eye detection is a big part of the facial detention portion. Next week, I plan on continuing to test the eye detection part, particularly on other user’s eyes (I will ask my friends if they want to volunteer). I also want to start the screen alignment portion, and research more about facial landmark detection in OpenCV.

Team Status Update for 10/09/2020

This week, the team continued researching and implementing their parts as laid out in the schedule and Gantt chart. The implementation of the project is spilt up into three main parts: facial detection, signal processing, and machine learning. Each team member is assigned to one of these parts and has been working on making progress with the relevant tasks. A change that was made to the overall design is that for facial detection, we are no longer going to be detecting posture, because there are too many unknown factors surrounding the task. There were ideas thrown around for “good” posture to be sitting up straight (e.g. distance between shoulders and face) or not having your head down (e.g. mouth is missing). However, if you have long hair, for instance, shoulder detection is not possible (and we cannot force a user to tie up their hair). Additionally, if your mouth is missing, then we will be unable to detect a face at all, which is another problem. 

Jessica continued working on the facial detection portion, and was able to get the eye detection part from last week working in real-time with video. Now, when the script is run, OpenCV’s VideoCapture class is called to capture the frames of the video. The eye detection is then performed on each of these frames to attempt to detect the user’s face and eyes, and the irises/pupils within the eyes. A green circle is drawn around each iris/pupil to keep track of their locations. She is planning to get the off-center detection and initial set-up stage done next week, as well as start formally testing the eye detection. 

Mohini started researching how to best represent the audio the user submits as a finite signal. Currently, she is able to save the audio as a finite matrix representing the amplitude of the signal, using the Nyquist Theorem. She is working on identifying patterns for different signals representing the same letter through analysis of the Fourier Transform. Additionally, Mohini reviewed her knowledge of neural networks and started working on the basic implementation of it. While she still has a significant amount of work to complete the algorithm and improve the accuracy, she got a good understanding of what needs to be done. She will continue to work and research both the signal processing and machine learning components of the project in the coming week. 

Shilika continued to work on the web application website, questions databases, and the signal processing portion. She was able to complete the profile page and behavioral questions database. She made progress on the technical questions database and the signal processing or speech recognition. She hopes to have a first round of completed input for the neural network for speech recognition by next week.