Karen’s Status Report for 3/9

This week I spent the majority of my time working on the design report. Outside of that, I experimented with object detection for phone pick-up detection. One component of the phone pick-up detection is phone object recognition, so I trained the YOLOv8 model to detect phones using the MUID-IITR dataset. This was the closest dataset I could find online to match scenarios for the Focus Tracker App. The dataset includes images of people using a phone while performing day-to-day activities as well as annotations of the coordinates of the phones in each image. The dataset required some converting to match the YOLOv8 formatting, and then I used the Python package Ultralytics to train the model. Below are results of the training with 100 epochs. The recall and mAP never exceed 0.8, which does not satisfy the design requirements we specified. Testing the model, I noticed that it sometimes predicted just a hand as a phone. The FPS is also fairly low ~10 FPS.

There are some other datasets (like this one) that I can try to continue training the model on that are just the phone itself, which could prevent the false negatives of the hand being classified as a phone. My risk mitigation plan for my custom YOLOv8 model not achieving sufficient performance is to use a model that has already been trained, available on Roboflow. This is a YOLOv5 model trained on 3000+ images of phones and people using phones. This model is linked here. This option may be better, because the training time is very costly (>12 hours for 100 epochs). The FPS for the Roboflow is also higher (~20 FPS).

I also have a plan to collect and annotate my own data. The MUID-IITR dataset puts a fairly large bounding box around the hand which may be the reason for so many false positives too. Roboflow has a very usable interface for collecting data, annotating images, and training a YOLO model.

Here is the directory with the code for manipulating the data and training my custom YOLOv8 model. And here is the directory with the code for facial recognition.

My progress is overall on schedule, but the custom YOLOv8 model not performing as well as desired is a bit of a setback. In the coming week, I plan to further train this custom model or fall back onto the Roboflow model if it is not successful. I will also integrate the hand landmarker to make the phone pick-up detection more robust by also taking into account the hand that is picking up the phone. I will also further experiment with the face recognition library that I will use for detecting interruptions from others.

Team Status Report for 2/24

This week we finalized our slides for the design presentation last Monday and used the feedback received from the students and Professors in our design report. We split up the work for the design report and plan to have it finalized by Wednesday so that we can get the appropriate feedback before the due date on Friday. We are also working on building a data labeling platform for Professor Dueck and plan to meet with her this week so that we can begin the data-gathering process. No changes have been made to our schedule and we are planning for risk mitigation by doing additional research for Microphone/ LLMs in case the EEG headset does not provide the accurate results we are looking for. Overall, we are all on schedule and have completed our individual tasks as well. We are looking forward to implementing more features of our design this week.

Rohan’s Status Report for 2/24

This week I secured the EEG EmotivPRO subscription which has been blocking our progress on EEG-based focus state detection. With the subscription, we can now build out the data labeling platform for Professor Dueck and being implementing some basic detection model which takes in the EmotivPRO performance metrics and outputs a focus state, either focused, distracted, or neutral. I was able to collect some initial readings wearing the headset myself while working at home. I began familiarizing myself with the Emotiv API, connected to the headset via python code, and collected performance metric data from the headset. I am currently encountering an error when trying to download the performance metric data from the headset to a CSV on my laptop, which I suspect is some sort of issue with the way the license is configured or from not properly passing in credentials somewhere in the script. I also spent a significant amount of time working on the design report which is due next week. Finally, I began researching what kinds of detection models would lend themselves to our EEG-based focus level detection and settled in on a 1D (time series tailored) Convolutional Neural Network which I will begin experimenting with as soon as we finalize our data collection platform and we have determined what format we will be reading in the data. Overall, my progress is still on schedule. Looking forward to next week, I plan to implement the data collection platform with Arnav, do some further CNN research/testing, and finalize our design report for submission.

Arnav’s Status Report for 2/24

This week I worked on setting up the React Frontend, Django Backend, and the Database for our Web Application and made sure that all necessary packages/ libraries are installed. The Home/ Page looks very similar to the UI planned in Figma last week. I utilized react functional components for the layout of the page and was able to manage state and side effects efficiently. I integrated a bar graph, line graph, and scatter plot into the home page using Recharts (React library for creating interactive charts). I made sure that the application’s structure is modular, with reusable components so that it will be easy to add on future pages that are part of the UI design. Regarding the backend, I did some experimentation and research with Axios for API calls to see what would be the best way for the frontend and backend to interact with each other, especially for real-time updates. Django’s default database is SQLite and once we have our data ready to store in the database the migration to a PostgreSQL database will be very easy. All of the code written for the features mentioned above has been pushed on a separate branch to the shared GitHub repository for our team: https://github.com/karenjennyli/focus-tracker.

Lastly, I also did some more research on how we can use MediaPipe along with React/ Django to show the live camera feed of the user. The live camera feed can be embedded directly into the React application, utilizing the webcam through Web APIs like navigator.mediaDevices.getUserMedia. The processed data from MediaPipe, which might include landmarks or other analytical metrics will be sent to the Django backend via RESTful APIs. This data will then be serialized using Django’s REST framework and stored in the database.

My progress is currently on schedule and during the next week, I plan to write code for the layout of the Calibration and Current Session Pages and also get the web camera feed to show up on the application using MediaPipe. Additionally, I will do more research on how to integrate the data received from the Camera and EEG headset into the backend and try to write some basic code for that.

Karen’s Status Report for 2/24

This week I implemented head pose estimation. My implementation involves solving the perspective-n-point pose computation problem. The goal is to find the rotation that minimizes the reprojection error from 3D-2D point correspondences. I am using 5 points on the face for these point correspondences: I have the 3D points of a face that is looking forward without any points obtained using MediaPipe’s facial landmarks. I then solve for the Euler angles given the rotation matrix. This gives the roll, pitch, and yaw of the head, which tells us if the user’s head is pointed away from the screen or looking around the room. The head pose estimator module is on GitHub here.

I also began experimenting with phone pick-up detection. My idea is to use a combination of hand detection and phone object detection to detect the user picking up and using their phone. I am using MediaPipe’s hand landmark detection that can detect where a phone is detected in the frame. For object detection, I looked into various algorithms, including SSD (Single-Shot object Detection) and YOLO (You Only Look Once). After reviewing some papers [1, 2] on these algorithms, I decided to go with YOLO for its higher performance.

I was able to find some pre-trained YOLOv5 models for mobile phone detection on Roboflow. Roboflow is a platform that streamlines the process of building and deploying computer vision models and allows for the sharing of models and datasets. One of the models and datasets is linked here. Using Roboflow’s inference Python API, I can load this model and use it to perform inference on images. Two other models [1, 2] performed pretty similarly. They all had trouble recognizing the phone when it was tilted in the hand. I think I will need a better dataset with images of people holding the phone in hand rather than just the phone by itself. I was able to find this dataset on Kaggle.

Overall, my progress is on schedule. In the following week, I hope to train and test a smartphone object detection model that performs better than the pre-trained models I found online. I will then try to integrate it with the hand landmark detector to detect phone pick-ups.

In the screenshots below, the yaw is negative when looking left and the yaw is positive when looking right.

Below are screenshots of the pre-trained mobile phone object detector and MediaPipe’s hand landmark detector.

Arnav’s Status Report for 2/17

This week I worked on making a final draft of the wireframes and mockups for the Web Application. I finalized the Home/ Landing, Calibration, Current Session, Session Summary, and Session History pages. These are the main pages for our web application that the users will interact with. Below are the pictures of some of the updated pages:

I also did some research regarding integrating the camera feed/ metrics from the camera into the backend/ frontend code. We can break this process into the following steps: Capturing Camera Feed with MediaPipe and OpenCV, Frontend Integration with React, Backend Integration with Django, and Communication Between Frontend/ Backend. We can create a Python script using OpenCV to capture the camera feed. This will involve capturing the video feed, displaying video frames, and releasing the capture at the end of the script. We can use React to capture the processed metrics from the Python script and utilize the react-webcam library to get the video feed then send the metrics to the backend via API calls and the Django rest-framework. Our PostgreSQL database will be used to store user sessions, focus metrics, timestamps, and any other relevant data. Lastly, we will use Axios or the Fetch API to make asynchronous requests to the backend. For real-time data display, WebSockets (Django Channels) or long polling to continuously send data from the backend to the front end will be the best options.

Overall, my progress is on schedule. In the next week, I will start writing basic code for setting up the React frontend and the Django backend and begin to start implementing the UI I have created so far on Figma. I will set up the PostgreSQL database and make sure we can store any data accurately and efficiently. In addition, I will try to get the camera feed on the Calibration page of the Web Application using the steps I provided above.

Team Status Report for 2/17

Public Health, Social, and Economic Impacts

Concerning public health, our product will address the growing concern with digital distractions and their impact on mental well-being. By helping users monitor their focus and productivity levels during work sessions and their correlation with various environmental distractions such as digital devices, our product will give users insights into their work and phone usage, and potentially help improve their mental well-being in work environments and relationship with digital devices.

For social factors, our product addresses an issue that affects almost everyone today. Social media bridges people across various social groups but is also a significant distraction designed to efficiently draw and maintain users’ attention. Our product aims to empower users to track their focus and understand what factors play into their ability to enter focus states for extended periods of time.

The development and implementation of the Focus Tracker App can have significant economic implications. Firstly, by helping individuals improve their focus and productivity, our product can contribute to overall efficiency in the workforce. Increased productivity often translates to higher output per unit of labor, which can lead to economic growth. Businesses will benefit from a more focused and productive workforce, resulting in improved profitability and competitiveness in the market. Additionally, our app’s ability to help users identify distractions can lead to a better understanding of time management and resource allocation, which are crucial economic factors in optimizing production. In summary, our product will have a strong impact on economic factors by enhancing workforce efficiency, improving productivity, and aiding businesses in better-managing distractions and resources.

Progress Update

The Emotiv headset outputs metrics for various performance states via their EmotivPRO API including attention, relaxation, frustration, interest, cognitive stress, and more. We plan to compute metrics to understand correlations (perhaps inverse) between various performance metrics. Given further understanding of how some performance metrics interact with one another; for example, the effects of interest in a subject or cognitive stress on attention could prove to be extremely useful to users in evaluating what factors are affecting their ability to maintain focus on the task at hand. We also plan to look at this data in conjunction with Professor Dueck’s focus vs. distracted labeling to understand what threshold of performance metric values denote each state of mind.

On Monday, we met with Professor Dueck and her students to get some more background on how she works with her students and understands their flow states/focus levels. We discussed the best way for us to collaborate and collect data that would be useful for us. We plan to create a simple Python script that will record the start and end of focus and distracted states with timestamps using the laptop keyboard. This will give us a ground truth of focus states to compare with the EEG brainwave data provided by the Emotiv headset.

This week we also developed a concrete risk mitigation plan in case the EEG Headset does not produce accurate results. This plan integrates microphone data, PyAudioAnalysis/MediaPipe for audio analysis, and Meta’s LLaMA LLM for personalized feedback into the Focus Tracker App.

We will use the microphone on the user’s device to capture audio data during work sessions and implement real-time audio processing to analyze background sounds and detect potential distractions. The library PyAudioAnalysis will help us extract features from the audio data, such as speech, music, and background noise levels. MediaPipe will help us with real-time audio visualization, gesture recognition, and emotion detection from speech. PyAudioAnalysis/MediaPipe will help us categorize distractions based on audio cues and provide more insight into the user’s work environment. Next, we will integrate Meta’s LLaMA LLM to analyze the user’s focus patterns and distractions over time. We will train the LLM on a dataset of focus-related features, including audio data, task duration, and other relevant metrics. The LLM will generate personalized feedback and suggestions based on the user’s focus data.

In addition, we will provide actionable insights such as identifying common distractions, suggesting productivity techniques, or recommending changes to the work environment that will further help the user improve their productivity. Lastly, we will display the real-time focus metrics and detect distractions on multiple dashboards similar to the camera and EEG headset metrics we have planned. 

To test the integration of microphone data, we will conduct controlled experiments where users perform focused tasks while the app records audio data. We will analyze the audio recordings to detect distractions such as background noise, speech, and device notifications. Specifically, we will measure the accuracy of distraction detection by comparing it against manually annotated data, aiming for a detection accuracy of at least 90%. Additionally, we will assess the app’s real-time performance by evaluating the latency between detecting a distraction and providing feedback, aiming for a latency of less than 3 seconds. 

Lastly, we prepared for our design review presentation and considered our product’s public health, social, and economic impacts. Overall, we made great progress this week and are on schedule.

Karen’s Status Report for 2/17

This week I finished implementing yawning and microsleep detection. These behaviors will help understand a user’s productivity during a work session. I used this paper as inspiration for how to detect yawning and microsleeps. I calculate the mouth and eye aspect ratios, which tell us how open or closed the mouth and eyes are. If the ratios exceed a certain threshold for a set amount of time, it will trigger a yawn or microsleep detection. I implemented this using MediaPipe’s face landmark detection rather than Dlib as used in the paper because MediaPipe is reported to have higher accuracy and also provides more facial landmarks to work with.

Calibration and determining an appropriate threshold to trigger a yawn or microsleep detection proved to be more difficult than expected. For the detector to work on all users with different eye and mouth shapes, I added a calibration step at the start of the program. It first measures the ratios on a neutral face. It then measures the ratios for when the user is yawning, and then the ratios for when the user’s eyes are closed. This is used to determine the corresponding thresholds. I normalize the ratios by calculating a Z-score for each measurement. My implementation also ensures that the detectors are triggered once for each yawn and each instance of a microsleep regardless of their duration. After finishing the implementation, I spent some time organizing the detectors into individual modules so that the code could be refactored and understood more easily. The code with my most recent commit with yawning and microsleep detection can be accessed here.

I began exploring options for head pose detection and will follow a similar approach to that proposed in this paper.

Overall, I am on schedule and making good progress. In the coming week, I will finish implementing head pose estimation to track where the user’s head is facing. This will help us track how long the user is looking at/away from their computer screen, which can be correlated to their focus and productivity levels. If this is complete, I will look into and begin implementing object detection to detect phone pick-ups.

Below is a screenshot of the yawning and microsleep detection program with some debugging messages to show the ratios and their thresholds.

Rohan’s Status Report for 2/17

This week I spent time understanding how to improve the contact quality of the EEG headset. I set the headset up on myself and spent some time making adjustments, finally reaching 100% contact quality. I met with Justin, who is one of the piano players who Professor Dueck trains to teach him how to wear the headset and introduce him to the EmotivPRO software. I have also continued to research methods for detecting focus via EEG including training an SVM or CNN on EEG frequency bands delta, theta, and alpha which correspond closely to attention. We learned that EmotivPRO comes with detection of attention, interest, cognitive stress, and other brain states already in the form of a numerical performance metric. We are thinking of doing some further processing on these numbers to show a user a binary indicator of whether they are focused or not as well as providing the user with insight as to what factors are playing a role in their focus levels. My progress is on schedule, but I am waiting for purchase of the EmotivPRO subscription which will enable me to begin prototyping something with the EEG data from the headset which is currently blocking me. I will follow up with the ECE inventory/purchasing team to ensure that this does not become an issue given our schedule. In the next week, I hope to set up the EEG focus state data labeling system for Professor Dueck and begin researching/computing correlation metrics between various performance metrics.

Rohan’s Status Report for 2/10

This past week, I prepared and presented the proposal presentation. I acquired the Emotiv Insight headset from the ECE inventory and did some initial testing/investigation. I read research papers which studied attention using EEG data to get a better understanding of how to process the raw EEG data and what kinds of models people have had success working with in the past. I set up the headset on myself and got it connected to my laptop via bluetooth. At this point, I encountered some issues trying to get a high fidelity sensor connection to my head. I messed around with adjusting the headset on my head according to the specifications and applying saline solution to the sensor tips. Eventually, I was able to get the sensor contact quality to be steady in the 60-70% range. I also realized that we will need an EmotivPRO subscription to export any of the raw EEG data off the headset, so I filled out the order form and reached out to Quinn about how to go about getting the license. My progress is on schedule. In the next week, I need to chat with Jean to get feedback as to how to improve our sensor contact quality or at least understand what range is acceptable for signal processing. I need to secure the EmotivPRO subscription so we can port the raw EEG data from the headset. At this point, I will work with Arnav to develop a training data labeling platform that Professor Dueck will use to label her students’ raw EEG data as focused, distracted, or neutral. Finally, with the EmotivPRO subscription, I can also start setting up some simple signal processing processes to preprocess and interpret the EEG data from the headset to detect focus levels.