Team Status Report for 3/16

This week, we ran through some initial analysis of the EEG data. Rohan created some data visualizations comparing the data during the focused vs. neutral vs. distracted states labeled by Professor Dueck. We were looking at the average and standard deviations of power values in the theta and alpha frequency bands which typically correspond to focus states to see if we could see any clear threshold to distinguish between focus and distracted states. The average and standard deviation values we saw as well as the data visualizations made it clear that a linear classifier would not work to distinguish between focus and distracted states. 

After examining the data, another consideration we realized was that Professor Dueck labeled the data with very high granularity, as she noted immediately when her students exited a flow state. This could be for a period as short as one 1 second as they turn a page. We realized that while our initial hypothesis was that these flow states would correspond closely to focus states in the work setting was incorrect. In fact, we determined that focus state is a completely distinct concept from flow state. Professor Dueck recognizes a deep flow state which can change with high granularity, whereas focus states are typically measured over longer periods of time.

Based on this newfound understanding, we plan to use the Emotiv performance metrics to determine a threshold value for focus vs distracted states. To maintain complexity, we are working on training a model to determine flow states based on the raw data we have collected and the flow state ground truth we have from Professor Dueck. 

We were able to do some preliminary analysis on the accuracy of Emotiv’s performance metrics, measuring the engagement and focus metrics of a user in a focused vs. distracted setting. Rohan first read an article while wearing noise-canceling headphones and minimal environmental distractions. He then completed the same task without more ambient noise and frequent conversational interruptions. This led to some promising results: the metrics had a lower mean and higher standard deviation in the distracted setting compared to the focused setting. This gives us some confidence that we have a solid contingency plan

There are still some challenges with using the Emotiv performance metrics directly. We will need to determine some thresholding or calibration methods to determine what is considered a “focused state” based on the performance metrics. This will need to work universally across all users despite the actual performance metric values potentially varying between individuals.

In terms of flow state detection, Rohan trained a 4 layer neural network with ReLU activation functions and a cross-entropy loss function and was able to achieve validation loss significantly better than random chance. We plan to experiment with a variety of network configurations, changing the loss function, number of layers, etc. to see if we can further improve our model’s performance. This initial proof of concept is very promising and could allow us to detect elusive flow states using EEG data which would have applications in music, sports, and traditional work settings.

Our progress for the frontend and backend, as well as camera-based detections, is on track.

After working through the ethics assignment this week, we also thought it would be important for our app to have features to promote mindfulness and make an effort for our app to not contribute to an existing culture of overworking and burnout.

Rohan’s Status Report for 3/16

This week I tried to implement a very simple thresholding based approach to detect flow state. Upon inspecting the average and standard deviation for the theta and alpha (focus related frequency bands) I saw that there was no clear distinction between the flow states and there was very high variance. I went on to visualize the data to see if there was any visible linear distinction between flow states, which there were not. This told me that we would need to introduce some sort of non-linearity into our model which led me to implement a simple 4-layer neural network with ReLU activation functions and cross-entropy loss. The visualizations are shown below. One of them uses the frontal lobe sensors AF3 and AF4 and the other uses the parietal lobe sensor Pz. The plots show overall power for each sensor and then the power values for the theta and alpha frequency bands at each sensor. On the x-axis is time and the y-axis is power. The green dots represent focused, red is distracted, and blue is neutral.

When I implemented this model, I trained it on only Ishan’s data, only Justin’s data, and then on all of the data. On Ishan’s data I saw the lowest validation loss of .1681, on Justin’s data the validation loss was a bit higher at .8485, and on all the data the validation loss was .8614 all of which are better than random chance which would yield a cross entropy loss of 1.098. I have attached the confusion matrices for each dataset below in order. For next steps I will experiment with different learning rates, using AdamW learning rate scheduling instead of Adam, try using more than 4 layers, different activation functions, only classifying flow vs not instead of neutral and distracted separately, and using a weighted loss function such as focal loss.

Overall my progress is ahead of schedule, as I expected to have to add significantly more complexity to the model to see any promising results. I am happy to see performance much better than random chance with a very simple model and before I have had a chance to play around with any of the hyperparameters. 



Karen’s Status Report for 3/16

This week I focused on improving phone detection. I familiarized myself with the Roboflow platform and how to train my own object detection model on it. Following this, I began the process of training the object detector by collecting a diverse dataset. I recorded videos of several different people with several different phones holding their phone in their hands. On the Roboflow platform, I was able to annotate and label the phone in each frame. I also applied some augmentations (changes in sheer, saturation, and brightness) and ended up with over 1000 images for the dataset. The results of the training are in images below. Overall, this process went much smoother than training locally using the analytics Python package. The training time was much shorter and I also obtained much better results using my own custom dataset.

After using the phone detector live, it performs much more robustly than my previous iteration. However, I noticed that it struggled detecting phones in certain orientations, especially when only the thin of the phone is visible in frame. In frame, this looks like a very thing rectangle or even a line, so I collected more videos of people holding phones in this orientation. I also noticed poor performance on colored phones, so will need to collect more data in these situationsI will have to label each frame and will then use the model I have already trained as a starting point to further train on this new data in the coming week.

I have integrated all of the individual detectors into a single module that prints when a behavior or distraction is detected along with the timestamp. It keeps track of behavior “states” as well, so that a distraction is not recorded for every individual frame. I am collaborating with Arnav to translate these print statements into calls to the API he has created to communicate with the backend.

This coming week, I will also integrate MediaPipe’s hand pose landmarker so that I can track the hand in frame as well. We only want to consider a phone pick up when the phone is detected in the hand, so I will need to check that the location of phone is in the vicinity of the user’s hand. Another feature I will be working on in the next week is facial recognition. If there are multiple people in frame, facial recognition will be used to distinguish between the user and any other people in frame. This will ensure that we run facial analysis (sleeping, yawning, and gaze detection) on the right face.

With these updates to the phone detector, my progress is on schedule.

Arnav’s Status Report for 3/16

This week I focused on integrating the camera data with the Django backend and React frontend in real-time. I worked mainly on getting the yawning feature to work and the other ones should be easily integrated now that I have the template in place. The current flow looks like the following: the run.py file which is used for detecting all distractions (gaze, yawn, phone pickups, microsleep) now sends a post request for the data to http://127.0.0.1:8000/api/detections/ and also sends a post request for the current session to http://127.0.0.1:8000/api/current_session. The current_session is used to ensure that previous data is not shown for the current session the user is working on. The data packet that is currently sent includes the session_id, user_id, distraction_type, timestamp, and aspect_ratio. For the backend, I created a  DetectionEventView, CurrentSessionView, and YawningDataView that handles the POST and GET requests and orders the data accordingly. Finally, the frontend fetches the data from these endpoints using fetch(‘http://127.0.0.1:8000/api/current_session‘) and fetch(`http://127.0.0.1:8000/api/yawning-data/?session_id=${sessionId}`) and polls the data every 1 second to ensure that it catches any distraction event in real-time. Below is a picture of the data that is shown on the react page every time a user yawns during a work session:

The data is ordered so that the latest timestamps are shown first. Once I have all the distractions displayed, then I will work on making the data look more presentable. 

My progress is on schedule and during the next week, I will continue to work on the backend to ensure that all the data is displayed and I will put the real-time data in a tabular format. I will also try to add a button to the frontend so that it automatically triggers the run.py file so that it does not need to be run manually. 

Arnav’s Status Report for 3/9

This week I worked with Rohan on building the data labeling platform for Professor Dueck and designing the system for how to collect and filter the data. The Python program is specifically designed for Professor Dueck to annotate students’ focus states as ‘Focused,’ ‘Distracted,’ or ‘Neutral’ during music practice sessions. The platform efficiently records these labels alongside precise timestamps in both Epoch and conventional formats, ensuring compatibility with EEG headset data and ease of analysis across sessions. We also outlined the framework for integrating this labeled data with our machine learning model, focusing on how EEG inputs will be processed to predict focus states. This preparation is crucial for our next steps: refining the model to accurately interpret EEG signals and provide meaningful insights into enhancing focus and productivity.

Additionally, I worked on integrating a webcam feed into our application. I developed a component named WebcamStream.js. This script prioritizes connecting with an external camera device, if available, before defaulting to the computer’s built-in camera. Users can now view a real-time video feed of themselves directly within the app’s interface. Below is an image of the user when on the application. I will move this to the Calibration page this week.

My progress is on schedule and during the next week, I plan to integrate the webcam feed using MediaPipe instead so that we can directly extract the data on the application itself. I will also continue to work with Rohan on developing the machine learning model for the EEG headset and hopefully have one ready by the end of the week. In addition, I will continue to write code for all the pages in the application.

Team Status Report for 3/9

Part A was written by Rohan, Part B was written by Karen, and Part C was written by Arnav. 

Global Factors

People in the workforce, across a wide variety of disciplines and geographic regions, spend significant amounts of time working at a desk with a laptop or monitor setup. While the average work day lasts 8 hours, most people are only actually productive for 2-3 hours. Improved focus and longer-lasting productivity have many benefits for individuals including personal fulfillment, pride in one’s performance, and improved standing in the workplace. At a larger scale, improving individuals’ productivity also leads to a more rapidly advancing society where the workforce as a whole can innovate and execute more efficiently. Overall, our product will improve individuals’ quality of life and self-satisfaction while simultaneously improving the rate of global societal advancement.

Cultural Factors

In today’s digital age, there’s a growing trend among people, particularly the younger generation and students, to embrace technology as a tool to improve their daily lives. This demographic is highly interested in leveraging technology to improve productivity, efficiency, and overall well-being. Also within a culture that values innovation and efficiency, there is a strong desire to optimize workflows and streamline tasks to achieve better outcomes in less time. Moreover, there’s an increasing awareness of the importance of mindfulness and focus in achieving work satisfaction and personal fulfillment. As a result, individuals seek tools and solutions that help them cultivate mindfulness, enhance focus, and maintain a healthy work-life balance amidst the distractions of the digital world. Our product aligns with these cultural trends by providing users with a user-friendly platform to monitor their focus levels, identify distractions, and ultimately enhance their productivity and overall satisfaction with their work.

Environmental Factors

The Focus Tracker App takes into account the surrounding environment, like background motion/ light, interruptions, and conversations to help users stay focused. It uses sensors and machine learning to understand and react to these conditions. By optimizing work conditions such as informing the user that the phone is being used too often or the light is too bright, it encourages a reduction in unnecessary energy consumption. Additionally, the app’s emphasis on creating a focused environment helps minimize disruptions that could affect both the user and its surroundings.

Team Progress

The majority of our time this week was spent working on the design report.

This week, we sorted out the issues we were experiencing with putting together the data collection system last week. In the end, we settled on a two-pronged design: we will utilize the EmotivPRO application’s built-in EEG data recording system to record power readings within each of the frequency bands from the AF3 and AF4 sensors (the two sensors corresponding to the prefrontal cortex) while simultaneously running a simple python program which takes in Professor Dueck’s keyboard input, ‘F’ for focused ‘D’ for distracted and ‘N’ for neutral. While this system felt natural to us, we were not sure if this type of stateful labeling system would match Professor Dueck’s mental model when observing her students. Furthermore, given that Professor Dueck would be deeply focused on observing her students, we were hoping that the system would be easy enough for her to use without having to apply much thought to it. On Monday of this week, we met with Professor Dueck after our weekly progress update with Professor Savvides and Jean for our first round of raw data collection and ground truth labeling. To our great relief, everything ran extremely smoothly with the EEG quality coming through with minimal noise and Professor Dueck finding our data labeling system to be extremely intuitive and natural to use. One of the significant risk factors for our project has been EEG-based focus detection. As with all types of signal processing and analysis, the quality of the raw data and ground truth labels are critical to training a highly performant model. This was a significant milestone because while we had tested the data labeling system that Arnav and Rohan designed, it was the first time Professor Dueck was using it. We continued to collect data on Wednesday on a different one from Professor Dueck, and this session went equally as smoothly. Having secured some initial high-fidelity data with high granularity ground truth labels, we feel that the EEG aspect of our project has been significantly de-risked. Going forward, we have to map the logged timestamps from the EEG readings to the timestamps from Professor Dueck’s ground truth labels so we can begin feeding our labeled data into a model for training. This coming week, we hope to have this linking of the raw data with the labels complete as well as an initial CNN trained on the resulting dataset. From there, we can assess the performance of the model, verify that the data has a high signal-to-noise ratio, and begin to fine-tune the model to improve upon our base model’s performance.

A new risk that could jeopardize the progress of our project is the performance of the phone object detection model. The custom YOLOv8 model that has been trained does not currently meet the design requirements of mAP ≥95%. We may need to lower this threshold, improve the model with further training, or use a pre-trained object detection model. We have already found other datasets that we can further train the model on (like this one) and have also found a pre-trained model on Roboflow that has higher performance than the custom model that we trained. This Roboflow model can be something we fall back on if we cannot get our custom model to perform sufficiently well.

The schedule for camera-based detections was updated to be broken down into the implementation of each type of distraction to be detected. Unit testing and then combining each of the detectors into one module will begin on March 18.

To mitigate the risks associated with EEG data reliability in predicting focus states, we have developed 3 different plans:

Plan A involves leveraging EEG data collected from musicians and Professor Jocelyn uses her expertise and visual cues to label states of focus and distraction during music practice sessions. This method relies heavily on her understanding of individual focus patterns within a specific, skill-based activity. 

Plan B broadens the data collection to include ourselves and other participants engaged in completing multiplication worksheets under time constraints. Here, focus states are identified in environments controlled for auditory distractions using noise-canceling headphones, while distracted states are simulated by introducing conversations during tasks. This strategy aims to diversify the conditions under which EEG data is collected. 

Plan C shifts towards using predefined performance metrics from the Emotiv EEG system, such as Attention and Engagement, setting thresholds to classify focus states. Recognizing the potential oversimplification in this method, we plan to correlate specific distractions or behaviors, such as phone pick-ups, with these metrics to draw more detailed insights into their impact on user focus and engagement. By using language model-generated suggestions, we can create personalized advice for improving focus and productivity based on observed patterns, such as recommending strategies for minimizing phone-induced distractions. This approach not only enhances the precision of focus state prediction through EEG data but also integrates behavioral insights to provide users with actionable feedback for optimizing their work environments and habits.

Additionally, we established a formula for the productivity score we will assign to users throughout the work session. The productivity score calculation in the Focus Tracker App quantifies an individual’s work efficiency by evaluating both focus duration and distraction frequency. It establishes a distraction score (D) by comparing the actual number of distractions (A) encountered during a work session against an expected number (E), calculated based on the session’s length with an assumption of one distraction every 5 minutes. The baseline distraction score (D) starts at 0.5. If A <= E: then D = 1 – 0.5 * A / E. If A > E, then 

This ensures the distraction score decreases but never turns negative.  The productivity score (P) is then determined by averaging the focus fraction and the distraction score. This method ensures a comprehensive assessment, with half of the productivity score derived from focus duration and the other half reflecting the impact of distractions.

Overall, our progress is on schedule.

 

Rohan’s Status Report for 3/9

This week I spent a couple hours working with Arnav to finalize our data collection and labeling system to prepare for our meeting with Professor Dueck. Once this system was implemented, I spent time with two different music students to get the headset calibrated and ready to record the raw EEG data. Finally, on Monday and Wednesday I brought it all together with the music students and Professor Dueck to orchestrate the data collection and labeling process. This involved getting the headset set up and calibrated on each student, helping Professor Dueck get the data labeling system running, and observing as the music students practiced and Professor Dueck labeled them as focused, distracted, or neutral. I watched Professor Dueck observe her students and tried to pick up on the kinds of things she was looking for while also making sure that she was using the system correctly/not encountering any issues.

I also spent a significant amount of time working on the design report. This involved doing some simple analysis on our first set of data we collected on Monday and making some key design decisions. Once we collected data for the first time on Monday, I looked through the EEG quality on the readings and found that we were generally hovering between 63 and 100 on overall EEG quality. Initially, I figured we would just live with the variable EEG quality, and go forward with our plan to pass in the power readings from each of the EEG frequency bands from each of the 5 sensors in the headset as input into the model and also add in the overall EEG quality value as input so that the model could take into account EEG quality variability. However, on Wednesday when we went to collect data again, we realized that the EEG quality from the two sensors on the forehead (AF3 and AF4) tended to be at 100 for a significant portion of the readings in our dataset. We also learned that brain activity in the prefrontal cortex(located near the forehead) is highly relevant to focus levels. This led us to decide to only work with readings where the EEG quality for both the AF3 and AF4 sensors were 100 and therefore avoid having to pass in the EEG quality as input into the model and depend on the model learning to account for variable levels of SNR in our training data. This was a key design decision because it means that we can have much higher confidence in the quality of our data going into the model because according to Emotiv, the contact quality and EEG quality is as strong as possible. 

My progress is on schedule, and this week I plan to link the raw EEG data with the ground truth labels from Professor Dueck as well as implement an initial CNN for focus, distracted, or neutral state detection based on EEG power values from the prefrontal cortex. At that point, I will continue to fine tune the model and retrain as we accumulate more training data from our collaboration with Professor Dueck and her students in the School of Music.

Karen’s Status Report for 3/9

This week I spent the majority of my time working on the design report. Outside of that, I experimented with object detection for phone pick-up detection. One component of the phone pick-up detection is phone object recognition, so I trained the YOLOv8 model to detect phones using the MUID-IITR dataset. This was the closest dataset I could find online to match scenarios for the Focus Tracker App. The dataset includes images of people using a phone while performing day-to-day activities as well as annotations of the coordinates of the phones in each image. The dataset required some converting to match the YOLOv8 formatting, and then I used the Python package Ultralytics to train the model. Below are results of the training with 100 epochs. The recall and mAP never exceed 0.8, which does not satisfy the design requirements we specified. Testing the model, I noticed that it sometimes predicted just a hand as a phone. The FPS is also fairly low ~10 FPS.

There are some other datasets (like this one) that I can try to continue training the model on that are just the phone itself, which could prevent the false negatives of the hand being classified as a phone. My risk mitigation plan for my custom YOLOv8 model not achieving sufficient performance is to use a model that has already been trained, available on Roboflow. This is a YOLOv5 model trained on 3000+ images of phones and people using phones. This model is linked here. This option may be better, because the training time is very costly (>12 hours for 100 epochs). The FPS for the Roboflow is also higher (~20 FPS).

I also have a plan to collect and annotate my own data. The MUID-IITR dataset puts a fairly large bounding box around the hand which may be the reason for so many false positives too. Roboflow has a very usable interface for collecting data, annotating images, and training a YOLO model.

Here is the directory with the code for manipulating the data and training my custom YOLOv8 model. And here is the directory with the code for facial recognition.

My progress is overall on schedule, but the custom YOLOv8 model not performing as well as desired is a bit of a setback. In the coming week, I plan to further train this custom model or fall back onto the Roboflow model if it is not successful. I will also integrate the hand landmarker to make the phone pick-up detection more robust by also taking into account the hand that is picking up the phone. I will also further experiment with the face recognition library that I will use for detecting interruptions from others.

Team Status Report for 2/24

This week we finalized our slides for the design presentation last Monday and used the feedback received from the students and Professors in our design report. We split up the work for the design report and plan to have it finalized by Wednesday so that we can get the appropriate feedback before the due date on Friday. We are also working on building a data labeling platform for Professor Dueck and plan to meet with her this week so that we can begin the data-gathering process. No changes have been made to our schedule and we are planning for risk mitigation by doing additional research for Microphone/ LLMs in case the EEG headset does not provide the accurate results we are looking for. Overall, we are all on schedule and have completed our individual tasks as well. We are looking forward to implementing more features of our design this week.

Rohan’s Status Report for 2/24

This week I secured the EEG EmotivPRO subscription which has been blocking our progress on EEG-based focus state detection. With the subscription, we can now build out the data labeling platform for Professor Dueck and being implementing some basic detection model which takes in the EmotivPRO performance metrics and outputs a focus state, either focused, distracted, or neutral. I was able to collect some initial readings wearing the headset myself while working at home. I began familiarizing myself with the Emotiv API, connected to the headset via python code, and collected performance metric data from the headset. I am currently encountering an error when trying to download the performance metric data from the headset to a CSV on my laptop, which I suspect is some sort of issue with the way the license is configured or from not properly passing in credentials somewhere in the script. I also spent a significant amount of time working on the design report which is due next week. Finally, I began researching what kinds of detection models would lend themselves to our EEG-based focus level detection and settled in on a 1D (time series tailored) Convolutional Neural Network which I will begin experimenting with as soon as we finalize our data collection platform and we have determined what format we will be reading in the data. Overall, my progress is still on schedule. Looking forward to next week, I plan to implement the data collection platform with Arnav, do some further CNN research/testing, and finalize our design report for submission.