Karen’s Status Report for 3/16

This week I focused on improving phone detection. I familiarized myself with the Roboflow platform and how to train my own object detection model on it. Following this, I began the process of training the object detector by collecting a diverse dataset. I recorded videos of several different people with several different phones holding their phone in their hands. On the Roboflow platform, I was able to annotate and label the phone in each frame. I also applied some augmentations (changes in sheer, saturation, and brightness) and ended up with over 1000 images for the dataset. The results of the training are in images below. Overall, this process went much smoother than training locally using the analytics Python package. The training time was much shorter and I also obtained much better results using my own custom dataset.

After using the phone detector live, it performs much more robustly than my previous iteration. However, I noticed that it struggled detecting phones in certain orientations, especially when only the thin of the phone is visible in frame. In frame, this looks like a very thing rectangle or even a line, so I collected more videos of people holding phones in this orientation. I also noticed poor performance on colored phones, so will need to collect more data in these situationsI will have to label each frame and will then use the model I have already trained as a starting point to further train on this new data in the coming week.

I have integrated all of the individual detectors into a single module that prints when a behavior or distraction is detected along with the timestamp. It keeps track of behavior “states” as well, so that a distraction is not recorded for every individual frame. I am collaborating with Arnav to translate these print statements into calls to the API he has created to communicate with the backend.

This coming week, I will also integrate MediaPipe’s hand pose landmarker so that I can track the hand in frame as well. We only want to consider a phone pick up when the phone is detected in the hand, so I will need to check that the location of phone is in the vicinity of the user’s hand. Another feature I will be working on in the next week is facial recognition. If there are multiple people in frame, facial recognition will be used to distinguish between the user and any other people in frame. This will ensure that we run facial analysis (sleeping, yawning, and gaze detection) on the right face.

With these updates to the phone detector, my progress is on schedule.

Arnav’s Status Report for 3/16

This week I focused on integrating the camera data with the Django backend and React frontend in real-time. I worked mainly on getting the yawning feature to work and the other ones should be easily integrated now that I have the template in place. The current flow looks like the following: the run.py file which is used for detecting all distractions (gaze, yawn, phone pickups, microsleep) now sends a post request for the data to http://127.0.0.1:8000/api/detections/ and also sends a post request for the current session to http://127.0.0.1:8000/api/current_session. The current_session is used to ensure that previous data is not shown for the current session the user is working on. The data packet that is currently sent includes the session_id, user_id, distraction_type, timestamp, and aspect_ratio. For the backend, I created a  DetectionEventView, CurrentSessionView, and YawningDataView that handles the POST and GET requests and orders the data accordingly. Finally, the frontend fetches the data from these endpoints using fetch(‘http://127.0.0.1:8000/api/current_session‘) and fetch(`http://127.0.0.1:8000/api/yawning-data/?session_id=${sessionId}`) and polls the data every 1 second to ensure that it catches any distraction event in real-time. Below is a picture of the data that is shown on the react page every time a user yawns during a work session:

The data is ordered so that the latest timestamps are shown first. Once I have all the distractions displayed, then I will work on making the data look more presentable. 

My progress is on schedule and during the next week, I will continue to work on the backend to ensure that all the data is displayed and I will put the real-time data in a tabular format. I will also try to add a button to the frontend so that it automatically triggers the run.py file so that it does not need to be run manually. 

Arnav’s Status Report for 3/9

This week I worked with Rohan on building the data labeling platform for Professor Dueck and designing the system for how to collect and filter the data. The Python program is specifically designed for Professor Dueck to annotate students’ focus states as ‘Focused,’ ‘Distracted,’ or ‘Neutral’ during music practice sessions. The platform efficiently records these labels alongside precise timestamps in both Epoch and conventional formats, ensuring compatibility with EEG headset data and ease of analysis across sessions. We also outlined the framework for integrating this labeled data with our machine learning model, focusing on how EEG inputs will be processed to predict focus states. This preparation is crucial for our next steps: refining the model to accurately interpret EEG signals and provide meaningful insights into enhancing focus and productivity.

Additionally, I worked on integrating a webcam feed into our application. I developed a component named WebcamStream.js. This script prioritizes connecting with an external camera device, if available, before defaulting to the computer’s built-in camera. Users can now view a real-time video feed of themselves directly within the app’s interface. Below is an image of the user when on the application. I will move this to the Calibration page this week.

My progress is on schedule and during the next week, I plan to integrate the webcam feed using MediaPipe instead so that we can directly extract the data on the application itself. I will also continue to work with Rohan on developing the machine learning model for the EEG headset and hopefully have one ready by the end of the week. In addition, I will continue to write code for all the pages in the application.

Team Status Report for 3/9

Part A was written by Rohan, Part B was written by Karen, and Part C was written by Arnav. 

Global Factors

People in the workforce, across a wide variety of disciplines and geographic regions, spend significant amounts of time working at a desk with a laptop or monitor setup. While the average work day lasts 8 hours, most people are only actually productive for 2-3 hours. Improved focus and longer-lasting productivity have many benefits for individuals including personal fulfillment, pride in one’s performance, and improved standing in the workplace. At a larger scale, improving individuals’ productivity also leads to a more rapidly advancing society where the workforce as a whole can innovate and execute more efficiently. Overall, our product will improve individuals’ quality of life and self-satisfaction while simultaneously improving the rate of global societal advancement.

Cultural Factors

In today’s digital age, there’s a growing trend among people, particularly the younger generation and students, to embrace technology as a tool to improve their daily lives. This demographic is highly interested in leveraging technology to improve productivity, efficiency, and overall well-being. Also within a culture that values innovation and efficiency, there is a strong desire to optimize workflows and streamline tasks to achieve better outcomes in less time. Moreover, there’s an increasing awareness of the importance of mindfulness and focus in achieving work satisfaction and personal fulfillment. As a result, individuals seek tools and solutions that help them cultivate mindfulness, enhance focus, and maintain a healthy work-life balance amidst the distractions of the digital world. Our product aligns with these cultural trends by providing users with a user-friendly platform to monitor their focus levels, identify distractions, and ultimately enhance their productivity and overall satisfaction with their work.

Environmental Factors

The Focus Tracker App takes into account the surrounding environment, like background motion/ light, interruptions, and conversations to help users stay focused. It uses sensors and machine learning to understand and react to these conditions. By optimizing work conditions such as informing the user that the phone is being used too often or the light is too bright, it encourages a reduction in unnecessary energy consumption. Additionally, the app’s emphasis on creating a focused environment helps minimize disruptions that could affect both the user and its surroundings.

Team Progress

The majority of our time this week was spent working on the design report.

This week, we sorted out the issues we were experiencing with putting together the data collection system last week. In the end, we settled on a two-pronged design: we will utilize the EmotivPRO application’s built-in EEG data recording system to record power readings within each of the frequency bands from the AF3 and AF4 sensors (the two sensors corresponding to the prefrontal cortex) while simultaneously running a simple python program which takes in Professor Dueck’s keyboard input, ‘F’ for focused ‘D’ for distracted and ‘N’ for neutral. While this system felt natural to us, we were not sure if this type of stateful labeling system would match Professor Dueck’s mental model when observing her students. Furthermore, given that Professor Dueck would be deeply focused on observing her students, we were hoping that the system would be easy enough for her to use without having to apply much thought to it. On Monday of this week, we met with Professor Dueck after our weekly progress update with Professor Savvides and Jean for our first round of raw data collection and ground truth labeling. To our great relief, everything ran extremely smoothly with the EEG quality coming through with minimal noise and Professor Dueck finding our data labeling system to be extremely intuitive and natural to use. One of the significant risk factors for our project has been EEG-based focus detection. As with all types of signal processing and analysis, the quality of the raw data and ground truth labels are critical to training a highly performant model. This was a significant milestone because while we had tested the data labeling system that Arnav and Rohan designed, it was the first time Professor Dueck was using it. We continued to collect data on Wednesday on a different one from Professor Dueck, and this session went equally as smoothly. Having secured some initial high-fidelity data with high granularity ground truth labels, we feel that the EEG aspect of our project has been significantly de-risked. Going forward, we have to map the logged timestamps from the EEG readings to the timestamps from Professor Dueck’s ground truth labels so we can begin feeding our labeled data into a model for training. This coming week, we hope to have this linking of the raw data with the labels complete as well as an initial CNN trained on the resulting dataset. From there, we can assess the performance of the model, verify that the data has a high signal-to-noise ratio, and begin to fine-tune the model to improve upon our base model’s performance.

A new risk that could jeopardize the progress of our project is the performance of the phone object detection model. The custom YOLOv8 model that has been trained does not currently meet the design requirements of mAP ≥95%. We may need to lower this threshold, improve the model with further training, or use a pre-trained object detection model. We have already found other datasets that we can further train the model on (like this one) and have also found a pre-trained model on Roboflow that has higher performance than the custom model that we trained. This Roboflow model can be something we fall back on if we cannot get our custom model to perform sufficiently well.

The schedule for camera-based detections was updated to be broken down into the implementation of each type of distraction to be detected. Unit testing and then combining each of the detectors into one module will begin on March 18.

To mitigate the risks associated with EEG data reliability in predicting focus states, we have developed 3 different plans:

Plan A involves leveraging EEG data collected from musicians and Professor Jocelyn uses her expertise and visual cues to label states of focus and distraction during music practice sessions. This method relies heavily on her understanding of individual focus patterns within a specific, skill-based activity. 

Plan B broadens the data collection to include ourselves and other participants engaged in completing multiplication worksheets under time constraints. Here, focus states are identified in environments controlled for auditory distractions using noise-canceling headphones, while distracted states are simulated by introducing conversations during tasks. This strategy aims to diversify the conditions under which EEG data is collected. 

Plan C shifts towards using predefined performance metrics from the Emotiv EEG system, such as Attention and Engagement, setting thresholds to classify focus states. Recognizing the potential oversimplification in this method, we plan to correlate specific distractions or behaviors, such as phone pick-ups, with these metrics to draw more detailed insights into their impact on user focus and engagement. By using language model-generated suggestions, we can create personalized advice for improving focus and productivity based on observed patterns, such as recommending strategies for minimizing phone-induced distractions. This approach not only enhances the precision of focus state prediction through EEG data but also integrates behavioral insights to provide users with actionable feedback for optimizing their work environments and habits.

Additionally, we established a formula for the productivity score we will assign to users throughout the work session. The productivity score calculation in the Focus Tracker App quantifies an individual’s work efficiency by evaluating both focus duration and distraction frequency. It establishes a distraction score (D) by comparing the actual number of distractions (A) encountered during a work session against an expected number (E), calculated based on the session’s length with an assumption of one distraction every 5 minutes. The baseline distraction score (D) starts at 0.5. If A <= E: then D = 1 – 0.5 * A / E. If A > E, then 

This ensures the distraction score decreases but never turns negative.  The productivity score (P) is then determined by averaging the focus fraction and the distraction score. This method ensures a comprehensive assessment, with half of the productivity score derived from focus duration and the other half reflecting the impact of distractions.

Overall, our progress is on schedule.

 

Rohan’s Status Report for 3/9

This week I spent a couple hours working with Arnav to finalize our data collection and labeling system to prepare for our meeting with Professor Dueck. Once this system was implemented, I spent time with two different music students to get the headset calibrated and ready to record the raw EEG data. Finally, on Monday and Wednesday I brought it all together with the music students and Professor Dueck to orchestrate the data collection and labeling process. This involved getting the headset set up and calibrated on each student, helping Professor Dueck get the data labeling system running, and observing as the music students practiced and Professor Dueck labeled them as focused, distracted, or neutral. I watched Professor Dueck observe her students and tried to pick up on the kinds of things she was looking for while also making sure that she was using the system correctly/not encountering any issues.

I also spent a significant amount of time working on the design report. This involved doing some simple analysis on our first set of data we collected on Monday and making some key design decisions. Once we collected data for the first time on Monday, I looked through the EEG quality on the readings and found that we were generally hovering between 63 and 100 on overall EEG quality. Initially, I figured we would just live with the variable EEG quality, and go forward with our plan to pass in the power readings from each of the EEG frequency bands from each of the 5 sensors in the headset as input into the model and also add in the overall EEG quality value as input so that the model could take into account EEG quality variability. However, on Wednesday when we went to collect data again, we realized that the EEG quality from the two sensors on the forehead (AF3 and AF4) tended to be at 100 for a significant portion of the readings in our dataset. We also learned that brain activity in the prefrontal cortex(located near the forehead) is highly relevant to focus levels. This led us to decide to only work with readings where the EEG quality for both the AF3 and AF4 sensors were 100 and therefore avoid having to pass in the EEG quality as input into the model and depend on the model learning to account for variable levels of SNR in our training data. This was a key design decision because it means that we can have much higher confidence in the quality of our data going into the model because according to Emotiv, the contact quality and EEG quality is as strong as possible. 

My progress is on schedule, and this week I plan to link the raw EEG data with the ground truth labels from Professor Dueck as well as implement an initial CNN for focus, distracted, or neutral state detection based on EEG power values from the prefrontal cortex. At that point, I will continue to fine tune the model and retrain as we accumulate more training data from our collaboration with Professor Dueck and her students in the School of Music.