Team Status Update for 10/09/2020

This week, the team continued researching and implementing their parts as laid out in the schedule and Gantt chart. The implementation of the project is spilt up into three main parts: facial detection, signal processing, and machine learning. Each team member is assigned to one of these parts and has been working on making progress with the relevant tasks. A change that was made to the overall design is that for facial detection, we are no longer going to be detecting posture, because there are too many unknown factors surrounding the task. There were ideas thrown around for “good” posture to be sitting up straight (e.g. distance between shoulders and face) or not having your head down (e.g. mouth is missing). However, if you have long hair, for instance, shoulder detection is not possible (and we cannot force a user to tie up their hair). Additionally, if your mouth is missing, then we will be unable to detect a face at all, which is another problem. 

Jessica continued working on the facial detection portion, and was able to get the eye detection part from last week working in real-time with video. Now, when the script is run, OpenCV’s VideoCapture class is called to capture the frames of the video. The eye detection is then performed on each of these frames to attempt to detect the user’s face and eyes, and the irises/pupils within the eyes. A green circle is drawn around each iris/pupil to keep track of their locations. She is planning to get the off-center detection and initial set-up stage done next week, as well as start formally testing the eye detection. 

Mohini started researching how to best represent the audio the user submits as a finite signal. Currently, she is able to save the audio as a finite matrix representing the amplitude of the signal, using the Nyquist Theorem. She is working on identifying patterns for different signals representing the same letter through analysis of the Fourier Transform. Additionally, Mohini reviewed her knowledge of neural networks and started working on the basic implementation of it. While she still has a significant amount of work to complete the algorithm and improve the accuracy, she got a good understanding of what needs to be done. She will continue to work and research both the signal processing and machine learning components of the project in the coming week. 

Shilika continued to work on the web application website, questions databases, and the signal processing portion. She was able to complete the profile page and behavioral questions database. She made progress on the technical questions database and the signal processing or speech recognition. She hopes to have a first round of completed input for the neural network for speech recognition by next week.

Mohini’s Status Report for 10/9/2020

This week, I decided to take a break from designing the web pages and focused on starting the research and implementation phases of the speech to text model. I used the sounddevice library in Python to experiment with recording my voice saying different letters of the alphabet. I saved the recordings and tried to identify patterns in the audio recording where I was speaking the same letter. I wasn’t able to identify any obvious patterns just by looking at the amplitude of the signal, so I took the Fourier transform of the signal. This led to viewing the signal in the frequency domain, where I was able to identify similarities between different recordings of the same letter. Next steps here include using the Fourier transform to extract the desired frequencies of each letter. 

Additionally, I reviewed the foundations of building a neural network from scratch. After completing this research component, I programmed a basic neural network where I formed the optimal parameter matrix through performing gradient descent on a training data set. I’ll explain this a little more. The goal of the neural net is to minimize the mean squared error of categorizing the letters. The input to the neural net is a sample of audio, represented by a vector of some dimension, n. There are a number of hidden layers connecting the input to the output, which is a probability distribution over the 26 letters. To get from the input to the output and form the hidden layer along the way, I form linear combinations with the input feature vector and the parameter weight matrix. The hidden layers are then represented by the above  linear combinations passed through a sigmoid function. In order to achieve the goal of minimizing the mean squared error, I need to find the optimal parameter weight matrix. This is done through stochastic gradient descent which involves choosing one sample from the training dataset, calculating the partial derivative of the current mean squared error equation with respect to each of the weights, and updating the weights by this derivative. This is repeated for each element in the dataset. 

I have finished most of the basic implementation of the neural net. However, currently the accuracy of my algorithm is approximately 35% and needs to be improved greatly. I need to research ways to improve the accuracy, most likely through increasing the number of epochs and adjusting the number of hidden layers in the model. Additionally, I need to test the neural net with the input from our signal processing of the audio. Since this component hasn’t been completed yet, I am currently using a dataset from the Internet that consists of images of letters, rather than signals of letters. 

I believe I am on schedule as this week I worked on both the signal processing and machine learning components of our project. I will continue to work on fine tuning the neural net algorithm as well as brainstorm ways to best represent the audio record as a finite signal.

 

Shilika’s Status Report for 10/09/2020

This week, I continued to work on the web application platform, questions database, and signal processing. I completed the html and css of the profile page on our website. This page allows the user to upload a picture of themselves, and contains links to setting up their initial video, picking their skillset, and accessing their previous videos. These links will lead to the pages once we have our facial detection and speech processing functioning.

I also continued to work on the questions database. I completed the behavioral database which contains approximately 100 questions which will randomly be assigned to the user. For the technical database, we are collecting a question, an output example, an output for the user to test their code with, and the correct answer for each question. Additionally, for each category (arrays, linked lists, etc.), we will have easy, medium, and hard questions. So far, I added nine questions with examples, outputs, and output answers using LeetCode and will continue to add questions routinely.

Lastly, I continued the work on the signal processing portion. Continuing the foundation from previous weeks, I gained an understanding of what the input into our neural network should look like. I refined and added to my previous code which stores in a more correct integer array of the audio, breaks the input into small chunks of audio, and outputs the values in a user-friendly format. I worked with Mohini to see if there are any patterns or similarities between each individual letter and were able to find commonalities in the audio signal.

I believe my progress is on schedule. Next week, I hope to continue adding to the technical database and have an input for our neural net. This input will have many iterations of refinement, but my goal is to have proper, calculated values.   

Jess’ Status Update for 10/09/2020

This week, I worked on implementing the real-time portion of the facial detection part of our project. I wanted to get eye detection working with video, so when a user eventually records themselves for their interview practice, we are able to track their eyes as they are recording. I was able to do this using Python’s OpenCV library, which has a VideoCapture class to capture video files, image sequences, and cameras. By utilizing this, we are able to continue reading video frames until the user quits out of the video capture. While we are reading video frames, we attempt to detect the user’s face and eyes, and then the irises/pupils within the eyes. The irises/pupils are detected using blob detection (available through the OpenCV library) and a threshold (to determine the cutoff of what becomes black and white), which allows us to image process the frame to reveal where the irises/pupils are. Currently, a green circle is drawn around each of iris/pupil, like so (looks slightly scary):

The eye detection works pretty well for the most part, although the user does have to be in a certain position and may have to adjust accordingly. This is why we plan on having the initial set-up phase at the beginning of the process. I believe that I am on-schedule, as getting the detection to work in real-time was a main goal for this part of the project. Next week, I plan on getting the off-center detection working as well as the initial set-up phase done. I want to give the user time to align themselves, so that the program can keep track of the “centered” eye coordinates, and then detect whether the eyes are off-center from there. I also need to start formally testing this part of the facial detection.

Team Status Update for 10/02/2020

This past week, the team mostly did initial set-up and began the research/implementation process. We wanted to get all of our environments up and running, so that we could have a centralized platform for implementing features. We decided to create a GitHub repository for everyone to access the code and make changes. Each team member is working on their own branch and will make a pull request to master when they are ready to merge. One of the risks that could jeopardize the success of the project would be having consistent merge conflicts, where team members are overwriting each other’s code. By making sure we make pull requests from our individual branches, we are making sure that master only contains the most up-to-date, working code. No major changes were made to the existing design of the system as this week was mostly spent familiarizing ourselves with the project environment and getting started with small components of the project. 

Jessica started researching and implementing the facial detection part for the behavioral interview portion. She followed the existing design of the system, where a user’s eyes will be detected and tracked to ensure that they make eye-contact with the camera. She used the Haar Cascades from OpenCV library in Python to detect a face and the eyes on an image. She is planning to complete the real-time portion next week, where eye detection is done on a video stream. 

Mohini started designing the basic web app pages and connecting the pages together through various links and buttons. A good portion of her time was dedicated to CSS and making the style of each element visually appealing. Towards the end of the week, she started researching different ways to extract the recorded audio from the user as well as the best way to analyze it.

Shilika focused on creating the navigation bars that will appear across the web pages, which will allow the user to easily go from one page to another. She also began creating databases for the behavioral and technical interview questions and began preliminary steps of the speech processing algorithm. Next week, she will continue working on the web pages on the application and populating the database.

Some photos of the initial wireframe of our web app:

Shilika’s Status Report for 10/02/2020

This week, I created the navigation bars that will be used across the pages in our web application. The top navigation bar has three main components – 1. The menu button allows you to open the side navigation bar. It has two additional buttons, one that leads you to the behavioral interview page and the other that leads you to the technical interview page. 2. The profile button leads you to your profile page. 3. The help button leads you to a page in which our web application features will be explained. The side navigation bar has two buttons that lead you to the behavioral and technical interview pages.

I also began creating the behavioral and technical databases. I used online websites and used common questions that are asked in behavioral/technical interviews for software engineering interviews roles. 

Lastly, I researched the steps of our speech processing algorithm to detect letters that the user will speak. So far, I have been able to successfully read the audio, convert it to an integer array, and graph the audio. These preliminary steps are the foundation of creating the data we will feed into our neural network.

I believe that my progress is on schedule. Next week, I aim to complete the css and html for the user profile page, complete collecting questions for the databases, and get a solid understanding of how the Fourier transform can be used in python to pre-process the audio signal we are receiving.

Mohini’s Status Report for 10/02/2020

For our capstone project this week, I set up the basic framework of our web app through Django and installed the necessary libraries and dependencies. Afterwards, I designed the basic webframes of our webapp. This included planning out the different web pages necessary for the complete design and refreshing my HTML/CSS and Django knowledge. After running it through my group, we decided to have at least 7 pages – home, login, register, dashboard, behavioral, technical, and profile. 

A quick breakdown of these pages:

    • Home: the user is first brought to this page and introduced to iRecruit
      • There are links to the login and register pages as well as a centralized iRecruit logo and background picture. 
    • Login: the user is able to login to their existing account
      • I created a login form that checks for the correct username and password combination. 
    • Register: the user is able to register for our site 
      • I created a registration form where the user can create an account, and the account information is stored in a built in Django database. 
    • Dashboard: after logging in, the user is brought to this page where they can choose which page they want to view 
      • Currently, this page consists of 3 buttons which lead to either the behavioral, technical or profile pages. 
    • Behavioral Interview: the user can practice video recording themselves here and iRecruit will give real time feedback 
      • There is a “press to start practicing” button which calls Jessica’s eye detection code. 
    • Technical Interview: the user will ask for a question which iRecruit will provide, and the user can input their answer once done solving the question
      • This page is still relatively empty. 
    • Profile: this page will store the user’s past behavioral interviews, recorded audio skills, and past technical questions answered 
      • This page is still relatively empty. I have started experimenting with different ways to retrieve the audio.

My progress is relatively on schedule as I designated about two weeks to complete the basic web app components. I was able to complete more than half of it during this first week. Next steps for me include starting the research component of how to use signal processing to analyze the audio data received from the user during recording their skills.

 

 

Jess’ Status Update for 10/02/2020

This week, I mostly got set up with our coding environment and began the implementation of the facial detection portion. Mohini and Shilika were able to set up our web application using Django, a Python web framework. I went through the code to get an idea of how Django works and what the various files and components do. I also installed the necessary dependencies and libraries, and learned how to run the iRecruit web application.

I also began implementing the facial detection portion for the behavioral interview part of iRecruit. I did some research into Haar Cascades and how they work in detecting a face (http://www.willberger.org/cascade-haar-explained/). I also read into Harr Cascades in the OpenCV library in Python (https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_objdetect/py_face_detection/py_face_detection.html). OpenCV contains many pre-trained classifiers for features like faces, eyes, and smiles, so we decided to use these for our facial detection. I was able to create a baseline script, with the help of many online tutorials, that is able to detect the face and eyes in an image (if they exist). All of the detection is done on the grayscale version of the image, but the computation is performed on the colored image (e.g. drawing rectangles around the face and eyes). I was able to get this eye detection working on stock photos.

I believe the progress that I made is on schedule, as we allocated a chunk of time (first 2-3 weeks) to researching the various implementation components. I was able to do research into facial detection in Python OpenCV, as well as start on the actual implementation. I hope to complete the real-time portion by next week, so that we can track a user’s eyes while they are video recording themselves. I also hope to be able to find the initial frame of reference coordinates of the pupils (for the set-up stage).