Rohan’s Status Report for 4/6

This week, I focused on flow state validation, developing/validating a focus detector, and applying Shapley values to provide explainability to our black box flow state neural network. 

For the flow state detection validation, I wore noise canceling headphones while playing Fireboy and Watergirl which is a simple online video game for 15 minutes. Then, for the next 15 minutes, I worked on implementing the model validation script without noise canceling headphones with ambient noises in the ECE work space and frequent distractions from Karen. At the end, we looked at the percentage of time the model classified as in flow for each recording and saw that .254% of the intended flow state recording was marked as flow and .544% of the intended not in flow state recording was marked as flow. These results obviously are not what we expected, but we have a few initial thoughts as to why this may be the case. First of all, Fireboy and Watergirl is a two person game and I was attempting to play by myself which was much more difficult than I expected and definitely not second nature to me (necessary condition to enter flow state). As such, we plan to test our flow state model on my roommate Ethan who plays video games frequently and apparently enters a flow state often while playing. By validating the model on an activity that is more likely to actually induce a flow state, we expect to see better results. If this ends up not working, we plan to return to the music setting and see how the model performs on the pianists we trained it on and then test on pianists we have not trained on before to further understand where the model may be overfitting.

I also developed a focus detection system which was trained on focused and distracted recordings of Arnav, Karen, and myself. We validated this data by collecting another set of focused and distracted data on Karen, but unfortunately this also had poor results. For the focused recording, our model predicted that 12.6% of the recording was in focus and for the distracted recording, it predicted that 12.3% of the recording was in focus. I realized soon after that the training set only had 34 high quality focus data points of Karen compared to 932 high quality distracted data points. This skew in the data is definitely a strong contributor to our model’s poor validation performance. We plan to incorporate this validation data into the training set and retry validating this coming week. As a backup, we inspected the Emotiv Focus Performance Metric values on this data and saw a clear distinction between the focus and distracted datasets which had average values of .4 and .311 respectively on a range from 0 to 1. 

Finally, I applied Shapley values to our flow state model to ensure that our model was picking up on logical features and not some garbage. The SHAP package has a wide variety of functionality, but I specifically explored the KernelExplainer Shapley value approximater and the summary_plot function to visualize the results. Because computing Shapley values over a large dataset, even via approximation methods, can be extremely computationally intensive, I randomly select 500 samples from the dataset to compute the Shapley values on. The basic summary plot  shows the contribution each feature makes on each of the 500 points towards a classification of flow. The bar summary plot shows the mean absolute contribution of each feature to the output classification in either direction (i.e. flow or not in flow). We see that the High Beta and Gamma frequency bands from the AF3 sensor (prefrontal cortex) as well as the Theta frequency band from the Pz sensor (parietal lobe) have high impact on the model’s classification of flow vs. not in flow state. These plots allow us to better understand which parts of the brain and more specifically, which EEG frequency bands have the most correlation with flow states in the music setting. Because flow states are difficult to induce and detect with high granularity, having access to high quality flow state ground truth from Professor Dueck is extremely valuable. Given her extensive experience with detecting flow states, and Shapley values’ ability to explain the contributions of each input to the final output, we can make new progress in understanding what kinds of brain activity corresponds with flow states.

Basic Summary Plot:

Bar Summary Plot:

Karen’s Status Report for 4/6

This week, I completed integration of phone pick-up and other people distraction detection into the backend and frontend of our web application. Now we can see the phone and other people distraction type displayed in the current session page.

I also have finished the facial recognition implementation. I decided on the Fast MT-CNN model for facial detection and the SFace model for facial embeddings. This produced the best results in terms of a balance between accuracy and speed. This is the core of the facial recognition module with the rest of the logic in the run.py and utils.py scripts. The program now recognizes when the user is no longer recognized or not in frame and reports how long the user was missing for. 

User not recognized:  08:54:02
User recognized:  08:54:20
User was away for 23.920616388320923 seconds

I also recognized that adding facial recognition significantly slowed down the programming since facial recognition requires a large amount of processing time. Because of this, I implemented asynchronous distraction detection using threading so that consecutive frames can be processed simultaneously. I am using the concurrent.futures package to achieve this.

executor = ThreadPoolExecutor(max_workers=8)

A next step would be recognizing when the user is simply not in frame vs. when there is an imposter taking place of the user. After that would be integrating the facial recognition data into the frontend and backend of the web app. In the following week, I will focus on facial recognition integration and properly testing to verify my individual components.

I have some initial testing of my distraction detection components. Arnav, Rohan, and I have all used yawning, sleeping, and gaze detection with success. From initial testing, these modules work well across different users and faces. Initial testing of other people detection has shown success and robustness for a variety of users. Phone pick-up detection needs more testing with different users and different colored phones, but initial testing shows success on my phone. I also need to begin verification that face recognition works for different users, but it has worked well for myself for now.

I have already performed some verification of individual components, such as the accuracy of the YOLOv8 phone object detector and the accuracy of MT-CNN and SFace. More thorough validation methods for the components integrated in the project as a whole are listed in our team progress report.

In the coming week I will work on the validation and verification methods. Now that all of the video processing distraction detections are implemented, I will work with Arnav on making the web application cleaner and more user friendly.

Karen’s Status Report for 3/30

This week I focused on integration and facial recognition. For integration, I worked with Arnav to understand the frontend and backend code. I now have a strong understanding of how the distraction data is sent to our custom API so that they can be displayed in the webpage. Now, sleep, yawning, gaze, phone, and other people detection are integrated into the frontend and backend.

I also worked on splitting calibration and distraction detection into separate scripts. This way, calibration data is saved to a file so that it can be retrieved when the user actually begins the work session and so it can be used in future sessions. I updated the backend so that the calibration script is triggered when the user navigates to the calibration page when starting a new session. After calibration is complete, the user will then click the finished calibration button which will trigger the distraction detection script.

After the initial testing of different facial recongition models, I have began implementation of the facial recognition module for our app. So far, the script will run facial recognition on the detected faces and print to the terminal if the user was recognized and the timestamp. The recognition runs around every one second, but this may need to be modified to improve performance.

I also began testing that the distraction detection works on users other than myself. Sleep, yawn, and gaze detection have performed very well on Arnav and Rohan, but we are running into some issues getting phone detection to work on Arnav and Rohan’s computers. I will investigate this issue in the following week.

Overall, my progress is on track. Although I did not finish implementing facial recognition, I have got a good start and was able to focus on integration in preparation for the interim demo.

Facial recognition output and screenshot:

User recognized:  21:55:37
User recognized:  21:55:38
User recognized:  21:55:39
User not recognized:  21:55:40
User not recognized:  21:55:41
User not recognized:  21:55:49

Team Status Report for 3/30

We have made significant progress with the integration of both camera based distraction detection and our EEG focus and flow state classifier into a holistic web application. At this point, we have all of the signal processing of detecting distractions by camera and identifying focus and flow states via EEG headset working well locally. Almost all of these modules have been integrated into the backend and by the time of our interim demo, we expect to have these modules showing up on the frontend as well. At this point, the greatest risks are mainly to do with our presentation of our technology not doing justice to the underlying technology we have built. Given that we have a few more weeks before the final demo, I think that we will be able to comfortably iron out any kinks in the integration process and figure out how to present our project in a user-friendly way.

While focusing on integration, we also considered and had some new ideas regarding the flow of the app as the user navigates through a work session. Here is one of the flows we have for when the user opens the app to start a new work session:

  1. Open website
  2. Click the new session button on the website
  3. Click the start calibration button on the website
    1. This triggers calibrate.py
      1. OpenCV window pops up with a video stream for calibration
      2. Press the space key to start neutral face calibration
        1. Press the r key to restart the neutral face calibration
      3. Press the space key to start yawning calibration
        1. Press the r key to restart the neutral face calibration
      4. Press the space key to start yawning calibration
        1. Press the r key to restart the yawning face calibration
      5. Save calibration metrics to a CSV file
      6. Press the space key to start the session
        1. This automatically closes the window
  4. Click the start session button on the website
    1. This triggers run.py

Rohan’s Status Report for 3/30

This week, in preparation for our interim demo, I have been working with Arnav to get the Emotiv Focus Performance Metric and the Flow State Detection from our custom neural network integrated with the backend. Next week I plan to apply Shapley values to further understand which inputs are contributing most significantly in the flow state classification. I will also test out various model parameters, trying to determine the lower and upper bounds on model complexity in terms of the number of layers and neurons per layer. I also need to look into how the Emotiv software computes the FFT for the power values within the frequency bands which are the inputs to our model. Finally, we will also try training our own model for measuring focus to see if we can detect focus using a similar model to our flow state classifier. My progress is on schedule and I was able to test the live flow state classifier on myself while doing an online typing tests and saw some reasonable fluctuations in and out of flow states.

Arnav’s Status Report for 3/30

This week, I made enhancements to the user interface and overall data presentation in preparation for the Interim Demo. 

I incorporated a graph into the React frontend to visualize the distraction data collected from yawning, sleep, and gazing detection. This interactive graph, built using the Chart.js library, dynamically displays the frequency of each type of distraction over the current session. Users can hover over the graph to see detailed statistics on the number of times each distraction has occurred as well as the exact time the distraction occurred. Currently, the graph displays all the data from the current session. 

To help users track the duration of their work or study sessions, I added a session timer to the webpage. This timer is displayed on the Current Session Page and starts automatically when the session begins and updates in real-time.

I also created a calibration page that allows a distinction between the Calibration and the Current Session page. This page features a simple interface with a green button that, when clicked, triggers the run.py Python script to start the OpenCV face detection process. This calibration step ensures that the distraction detection algorithms are finely tuned to the user’s current environment and camera setup.

To provide more comprehensive session summaries, I modified the data payload structure to include a “frequency” item. This addition stores the number of times each type of distraction occurred during the session. Once the user decides to stop the current session, they will be routed to the Session Summary Page which displays statistics on their distraction frequencies. 

Lastly, I worked with Rohan on integrating the EEG data into the Session Summary page. Leveraging Django REST API endpoints, we enabled the real-time display of EEG data. We created an EEGEvent model that stores the epoch_timestamp, formatted_timestamp, and all the relevant data needed to display the flow state detection for the user. 

The Current Session Page, Session Summary Page, and Calibration Page look like the following:

(A window pops up for the calibration when the user is on this page. This is not displayed in the picture above.)

My overall progress is doing great and I am on schedule. This week I will continue to work with Rohan to display the EEG data on the frontend in the Current Session and Session Summary Pages. The plan is to make a pie chart of the time the user is in a flow state vs. not in a flow state and also display this information in a graph format.

Team Status Report for 3/23

This week we realized that while focus and flow state are closely related, they are distinct states of mind. While people have a shared understanding of focus, flow state is a bit more of an elusive term which means that people have their internal mental models of what flow state is and looks like. Given that our ground truth data is based on Prof. Dueck’s labeling of flow states in her piano students, we are shifting from developing a model to measure focus, to instead identifying flow states. To stay on track with our initial use case of identifying focus vs. distracted states in work settings, we plan to use Emotiv’s Focus Performance Metric to monitor users’ focus levels and develop our own model to detect flow states. By implementing flow state detection, our project will apply to many fields beyond just traditional work settings including music, sports, and research.

Rohan also discussed our project with his information theory professor, Pulkit Grover, who is extremely knowledgeable about neuroscience, getting feedback on the flow state detection portion of our project. He told us that achieving model test accuracy better than random chance would be a strong result, which we have achieved in our first iteration of the flow detection model. 

We also began integration steps this week. Arnav and Karen collaborated on getting the yawn, gaze, and sleep detections to be sent to the backend, so now these distractions are displayed in the UI in a table format along with snapshots in real-time of when the distraction occurs. Our team also met together to try to get our code running locally on each of our machines. This led us to write a README with information about libraries that need to be installed and the steps to get the program running. This document will help us stay organized and make it easier for other users to use our application.

Regarding any challenges/ risks for the project this week, we were able to clear up some information that was unclear between the focused and flow states and we are still prepared to add in microphone detection if needed. Based on our progress this week, all three stages of the project (Camera, EEG, and Web App) are developing very well and we look forward to continue integrating all the features.

Rohan’s Status Report for 3/23

In order to better understand how to characterize flow states, I had conversations with friends in various fields and synthesized insights from multiple experts in cognitive psychology and neuroscience including Cal Newport and Andrew Huberman. Focus can be seen as a gateway to flow. Flow states can be thought of as a performance state; while training for sports or music can be quite difficult and requires conscious focus, one may enter a flow state once they have achieved mastery of a skill and are performing for an audience. A flow state also typically involves a loss of reflective self-consciousness (non-judgmental thinking). Interestingly, Prof. Dueck described this lack of self-judgment as a key factor in flow states in music, and when speaking with a friend this past week about his experience with cryptography research, he described something strikingly similar. Flow states typically involve a task or activity that is both second nature and enjoyable, striking a balance between not being too easy or tedious while also not being overwhelmingly difficult. When a person experiences a flow state, they may feel a more “energized” focus state, complete absorption in the task at hand, and as a result, they may lose track of time. 

Given our new understanding of the distinction between focus and flow states, I made some structural changes to our previous focus and now flow state detection model. First of all, instead of classifying inputs as Focused, Neutral, or Distracted, I switched the outputs to just Flow or Not in Flow. Secondly, last week, I was only filtering on high quality EEG signal in the parietal lobe (Pz sensor) which is relevant to focus. Here is the confusion matrix for classifying Flow vs Not in Flow using only the Pz sensor:

Research has shown that increased theta activities in the frontal areas of the brain and moderate alpha activities in the frontal and central areas are characteristic of flow states. This week, I continued filtering on the parietal lobe sensor and now also on the two frontal area sensors (AF3 and AF4) all having high quality. Here is the confusion matrix for classifying Flow vs Not in Flow using the Pz, AF3, and AF4 sensors:

This model incorporates the Pz, AF3, and AF4 sensors data and classifies input vectors which include overall power values at each of the sensors and within each of the 5 frequency bands at each of the sensors into either Flow or Not in Flow. It achieves a precision of 0.8644, recall of 0.8571, and an F1 score of 0.8608. The overall accuracy of this model is improved from the previous one, but the total amount of data is lower due to the additional conditions for filtering out low quality data.

I plan on applying Shapley values which are a concept that originated out of game theory, but in recent years has been applied to explainable AI.  This will give us a sense of which of our inputs are most relevant to the final classification. It will be interesting to see if what our model is picking up on ties into the existing neuroscience research on flow states or if it is seeing something new/different.

My Information Theory professor, Pulkit Grover, introduced me to a researcher in his group this week who is  working on a project to improve the equity of EEG headsets to interface with different types of hair, specifically coarse Black hair which often prevents standard EEG electrodes from getting a high quality signal. This is interesting to us because one of the biggest issues and highest risk factors of our project is getting a good EEG signal due to any kind of hair interfering with the electrodes which are meant to make skin contact. I also tested our headset on a bald friend to understand if our issue with signal quality is due to the headset itself or actually because of hair interference. I found that the signal quality was much higher on my bald friend which was very interesting. For our final demo, we are thinking of inviting this friend to wear the headset to make for a more compelling presentation because we only run the model on high quality data, so hair interference with non-bald participants will end up with the model making very few predictions during our demo. 

Arnav’s Status Report for 3/23

This week, I successfully integrated the yawning, gazing, and sleep detection data from the camera and also enabled a way to store a snapshot of the user when the distraction occurs. The yawning, gazing, and sleep detection data is now stored in a table format and the columns include Time, Distraction Type, and Image. The table is updated almost instantly with a couple of milliseconds delay and this is because I am polling the data from the API endpoints every 1 second. This can be updated if the data needs to be shown on the React page even faster, but it is most likely not needed since the user ideally will not be monitoring this page while they are in a work session. The table appears on the Current Session Page and is under the Real-Time Updates table. 

I was able to get the snapshot of the user by using the following steps: 

I first utilized the run.py Python script to capture images from the webcam which is being stored in current_frame (a NumPy array). Once a distraction state is identified, I encoded the associated image into a base64 string directly in the script. This conversion to a text-based format allowed me to send the image over HTTP by making a POST request to my Django backend through the requests library, along with other data like session ID and user ID. 

The Django backend, designed with the DetectionEventView class, handles these requests by decoding the base64 string back into a binary image format. Using the DetectionEventSerializer, the incoming data is serialized, and the image is saved in the server’s media path. I then generated a URL that points to the saved image, which can be accessed from the updated data payload. To make the images accessible in my React frontend, I configured Django with a MEDIA_URL, which allows the server to deliver media files. 

Within the React frontend, I implemented a useEffect hook to periodically fetch the latest detection data from the Django backend. This data now includes URLs for the images linked to each detection event. When the React component’s state is updated with this new data, it triggers a re-render, displaying the images using the <img> tag in a dynamically created table. I ensured the correct display of images by concatenating the base URL of my Django server with the relative URLs received from the backend. I then applied CSS to style the table, adjusting image sizing and the overall layout to provide a smooth and user-friendly interface.

 The Current Session Page looks like the following:

I made a lot of progress this week and I am definitely on schedule. I will add in data from phone detection and distractions from surroundings next week. I will also work on creating some sample graphs with the current data we have. If I have some additional time, I will connect with Rohan and start to look into the process of integrating the EEG data into the backend and frontend in real-time.

 

Karen’s Status Report for 3/23

This week I wrapped up the phone pick-up detection implementation. I completed another round of training of the YOLOv8 phone object detector, using over 1000 annotated images that I collected myself. This round of data contained more colors of phones and orientations of phones, making the detector more robust. I also integrated MediaPipe’s hand landmarker into the phone pick-up detector. By comparing the location of the phone detected and the hand over a series of frames, we can ensure that the phone detected is actually in the user’s hand. This further increases the robustness of the phone detection.

After this, I began working on facial recognition more. This is to ensure that the program is actually analyzing the user’s facial features and not someone else’s face in the frame. It will also ensure that it is actually the user working and that they did not replace themselves with another person to complete the work session for them.

I had first found a simple Python face recognition library, which I did some initial testing of. Although it has a very simple and usable interface, I realized the performance was not sufficient as it had too many false positives. Here you can see it identifies two people as “Karen” when only one of them is actually Karen.

I then looked into another Python face recognition library called DeepFace. This has a more complex interface, but provides much more customizability as it contains various different models that can be used for face detection and recognition. I did some extensive experimentation and research of the different model options for performance and speed, and have landed on using Fast-MTCNN for facial detection and SFace for facial recognition.

Here you can see the results of my tests for speed for each model:

❯ python3 evaluate_models.py
24-03-21 13:19:19 - Time taken for predictions with VGG-Face: 0.7759 seconds
24-03-21 13:19:20 - Time taken for predictions with Facenet: 0.5508 seconds
24-03-21 13:19:22 - Time taken for predictions with Facenet512: 0.5161 seconds
24-03-21 13:19:22 - Time taken for predictions with OpenFace: 0.3438 seconds
24-03-21 13:19:24 - Time taken for predictions with ArcFace: 0.5124 seconds
24-03-21 13:19:24 - Time taken for predictions with Dlib: 0.2902 seconds
24-03-21 13:19:24 - Time taken for predictions with SFace: 0.2892 seconds
24-03-21 13:19:26 - Time taken for predictions with GhostFaceNet: 0.4941 seconds

Here are some screenshots of tests I ran for performance and speed on different face detectors.

OpenCV face detector (poor performance):

Fast-MTCNN face detector (better performance):

Here is an outline of the overall implementation I would like to follow:

  • Use MediaPipe’s facial landmarking to rough crop out the face
  • During calibration
    • Do a rough crop out of face using MediaPipe
    • Extract face using face detector
    • Get template embedding
  • During work session
    • Do a rough crop out of face 0 using MediaPipe
    • Extract face using face detector
    • Get embedding and compare with template embedding
    • If below threshold, face 0 is a match
    • If face 0 is a match, everything is good so continue
    • If face 0 isn’t a match, do same process with face 1 to see if there is match
    • If face 0 and face 1 aren’t matches, fallback to using face 0
  • During work session
    • If there haven’t been any face matches in x minute, then user is no longer there

Although I hoped to have a full implementation of facial recognition completed, I spent more time this week just exploring and testing the different facial recognition options available to find the best option for our application, and outlining an implementation that would work with this option. Overall, my progress is still on schedule taking into account the slack time added.