Karen’s Status Report for 2/24

This week I implemented head pose estimation. My implementation involves solving the perspective-n-point pose computation problem. The goal is to find the rotation that minimizes the reprojection error from 3D-2D point correspondences. I am using 5 points on the face for these point correspondences: I have the 3D points of a face that is looking forward without any points obtained using MediaPipe’s facial landmarks. I then solve for the Euler angles given the rotation matrix. This gives the roll, pitch, and yaw of the head, which tells us if the user’s head is pointed away from the screen or looking around the room. The head pose estimator module is on GitHub here.

I also began experimenting with phone pick-up detection. My idea is to use a combination of hand detection and phone object detection to detect the user picking up and using their phone. I am using MediaPipe’s hand landmark detection that can detect where a phone is detected in the frame. For object detection, I looked into various algorithms, including SSD (Single-Shot object Detection) and YOLO (You Only Look Once). After reviewing some papers [1, 2] on these algorithms, I decided to go with YOLO for its higher performance.

I was able to find some pre-trained YOLOv5 models for mobile phone detection on Roboflow. Roboflow is a platform that streamlines the process of building and deploying computer vision models and allows for the sharing of models and datasets. One of the models and datasets is linked here. Using Roboflow’s inference Python API, I can load this model and use it to perform inference on images. Two other models [1, 2] performed pretty similarly. They all had trouble recognizing the phone when it was tilted in the hand. I think I will need a better dataset with images of people holding the phone in hand rather than just the phone by itself. I was able to find this dataset on Kaggle.

Overall, my progress is on schedule. In the following week, I hope to train and test a smartphone object detection model that performs better than the pre-trained models I found online. I will then try to integrate it with the hand landmark detector to detect phone pick-ups.

In the screenshots below, the yaw is negative when looking left and the yaw is positive when looking right.

Below are screenshots of the pre-trained mobile phone object detector and MediaPipe’s hand landmark detector.

Arnav’s Status Report for 2/17

This week I worked on making a final draft of the wireframes and mockups for the Web Application. I finalized the Home/ Landing, Calibration, Current Session, Session Summary, and Session History pages. These are the main pages for our web application that the users will interact with. Below are the pictures of some of the updated pages:

I also did some research regarding integrating the camera feed/ metrics from the camera into the backend/ frontend code. We can break this process into the following steps: Capturing Camera Feed with MediaPipe and OpenCV, Frontend Integration with React, Backend Integration with Django, and Communication Between Frontend/ Backend. We can create a Python script using OpenCV to capture the camera feed. This will involve capturing the video feed, displaying video frames, and releasing the capture at the end of the script. We can use React to capture the processed metrics from the Python script and utilize the react-webcam library to get the video feed then send the metrics to the backend via API calls and the Django rest-framework. Our PostgreSQL database will be used to store user sessions, focus metrics, timestamps, and any other relevant data. Lastly, we will use Axios or the Fetch API to make asynchronous requests to the backend. For real-time data display, WebSockets (Django Channels) or long polling to continuously send data from the backend to the front end will be the best options.

Overall, my progress is on schedule. In the next week, I will start writing basic code for setting up the React frontend and the Django backend and begin to start implementing the UI I have created so far on Figma. I will set up the PostgreSQL database and make sure we can store any data accurately and efficiently. In addition, I will try to get the camera feed on the Calibration page of the Web Application using the steps I provided above.

Team Status Report for 2/17

Public Health, Social, and Economic Impacts

Concerning public health, our product will address the growing concern with digital distractions and their impact on mental well-being. By helping users monitor their focus and productivity levels during work sessions and their correlation with various environmental distractions such as digital devices, our product will give users insights into their work and phone usage, and potentially help improve their mental well-being in work environments and relationship with digital devices.

For social factors, our product addresses an issue that affects almost everyone today. Social media bridges people across various social groups but is also a significant distraction designed to efficiently draw and maintain users’ attention. Our product aims to empower users to track their focus and understand what factors play into their ability to enter focus states for extended periods of time.

The development and implementation of the Focus Tracker App can have significant economic implications. Firstly, by helping individuals improve their focus and productivity, our product can contribute to overall efficiency in the workforce. Increased productivity often translates to higher output per unit of labor, which can lead to economic growth. Businesses will benefit from a more focused and productive workforce, resulting in improved profitability and competitiveness in the market. Additionally, our app’s ability to help users identify distractions can lead to a better understanding of time management and resource allocation, which are crucial economic factors in optimizing production. In summary, our product will have a strong impact on economic factors by enhancing workforce efficiency, improving productivity, and aiding businesses in better-managing distractions and resources.

Progress Update

The Emotiv headset outputs metrics for various performance states via their EmotivPRO API including attention, relaxation, frustration, interest, cognitive stress, and more. We plan to compute metrics to understand correlations (perhaps inverse) between various performance metrics. Given further understanding of how some performance metrics interact with one another; for example, the effects of interest in a subject or cognitive stress on attention could prove to be extremely useful to users in evaluating what factors are affecting their ability to maintain focus on the task at hand. We also plan to look at this data in conjunction with Professor Dueck’s focus vs. distracted labeling to understand what threshold of performance metric values denote each state of mind.

On Monday, we met with Professor Dueck and her students to get some more background on how she works with her students and understands their flow states/focus levels. We discussed the best way for us to collaborate and collect data that would be useful for us. We plan to create a simple Python script that will record the start and end of focus and distracted states with timestamps using the laptop keyboard. This will give us a ground truth of focus states to compare with the EEG brainwave data provided by the Emotiv headset.

This week we also developed a concrete risk mitigation plan in case the EEG Headset does not produce accurate results. This plan integrates microphone data, PyAudioAnalysis/MediaPipe for audio analysis, and Meta’s LLaMA LLM for personalized feedback into the Focus Tracker App.

We will use the microphone on the user’s device to capture audio data during work sessions and implement real-time audio processing to analyze background sounds and detect potential distractions. The library PyAudioAnalysis will help us extract features from the audio data, such as speech, music, and background noise levels. MediaPipe will help us with real-time audio visualization, gesture recognition, and emotion detection from speech. PyAudioAnalysis/MediaPipe will help us categorize distractions based on audio cues and provide more insight into the user’s work environment. Next, we will integrate Meta’s LLaMA LLM to analyze the user’s focus patterns and distractions over time. We will train the LLM on a dataset of focus-related features, including audio data, task duration, and other relevant metrics. The LLM will generate personalized feedback and suggestions based on the user’s focus data.

In addition, we will provide actionable insights such as identifying common distractions, suggesting productivity techniques, or recommending changes to the work environment that will further help the user improve their productivity. Lastly, we will display the real-time focus metrics and detect distractions on multiple dashboards similar to the camera and EEG headset metrics we have planned. 

To test the integration of microphone data, we will conduct controlled experiments where users perform focused tasks while the app records audio data. We will analyze the audio recordings to detect distractions such as background noise, speech, and device notifications. Specifically, we will measure the accuracy of distraction detection by comparing it against manually annotated data, aiming for a detection accuracy of at least 90%. Additionally, we will assess the app’s real-time performance by evaluating the latency between detecting a distraction and providing feedback, aiming for a latency of less than 3 seconds. 

Lastly, we prepared for our design review presentation and considered our product’s public health, social, and economic impacts. Overall, we made great progress this week and are on schedule.

Karen’s Status Report for 2/17

This week I finished implementing yawning and microsleep detection. These behaviors will help understand a user’s productivity during a work session. I used this paper as inspiration for how to detect yawning and microsleeps. I calculate the mouth and eye aspect ratios, which tell us how open or closed the mouth and eyes are. If the ratios exceed a certain threshold for a set amount of time, it will trigger a yawn or microsleep detection. I implemented this using MediaPipe’s face landmark detection rather than Dlib as used in the paper because MediaPipe is reported to have higher accuracy and also provides more facial landmarks to work with.

Calibration and determining an appropriate threshold to trigger a yawn or microsleep detection proved to be more difficult than expected. For the detector to work on all users with different eye and mouth shapes, I added a calibration step at the start of the program. It first measures the ratios on a neutral face. It then measures the ratios for when the user is yawning, and then the ratios for when the user’s eyes are closed. This is used to determine the corresponding thresholds. I normalize the ratios by calculating a Z-score for each measurement. My implementation also ensures that the detectors are triggered once for each yawn and each instance of a microsleep regardless of their duration. After finishing the implementation, I spent some time organizing the detectors into individual modules so that the code could be refactored and understood more easily. The code with my most recent commit with yawning and microsleep detection can be accessed here.

I began exploring options for head pose detection and will follow a similar approach to that proposed in this paper.

Overall, I am on schedule and making good progress. In the coming week, I will finish implementing head pose estimation to track where the user’s head is facing. This will help us track how long the user is looking at/away from their computer screen, which can be correlated to their focus and productivity levels. If this is complete, I will look into and begin implementing object detection to detect phone pick-ups.

Below is a screenshot of the yawning and microsleep detection program with some debugging messages to show the ratios and their thresholds.

Team Status Report for 02/10

This week, as a team, we incorporated all the feedback from the proposal presentation and started coming up with a more concrete/ detailed plan of how we will implement each feature/ data collection for the camera and EEG headset. 

We defined what behaviors and environmental factors we would detect via camera. This includes Drowsiness (combo of eye and mouth/yawn tracking), Off-screen gazing (eye and head tracking), Background motion, Phone pick-ups, Lighting (research shows that bright blue light is better for promoting focus), and Interacting with or being interrupted by other people. We were able to order/pick up the Emotiv Headset from inventory and started to research more on the best way to utilize it. We came up with risk mitigation in the case of EEG focus level detection failure. This will shift the Focus Tracker’s App to more behavior and environmental distraction detection as we will use a microphone as an additional input source. This will help us track overall ambient noise levels, and instances of louder noises, such as construction, dog barking, and human conversation. There will also be a section for customized feedback and recommendations on ways to improve productivity, implemented via an LLM.

Lastly, we met with Dr. Jocelyn Dueck on the possibility of collaborating on our project. We will be using her expertise in her understanding of the flow/focus state of her students. She will help us collect training data for EEG-based focus level detection as she is very experienced in telling when her students are in a focused vs unfocused state while practicing. She proposed the idea of anti-myopia pinhole glasses to artificially induce higher focus levels, which can be used for collecting training data and evaluating performance. 

Overall, we made great progress this week and are on schedule. The main existing design of our project stayed the same, with only minor adjustments made to the content of our proposal/ design following the feedback from our presentation last week. We look forward to continuing our progress into next week.  

Arnav’s Status Report for 02/10

This week I spent time researching both frontend/ backend technologies for Web Application Development and UI design frameworks for creating wireframes and designing mockups. Regarding frontend/ backend technologies, The Focus Tracker App would benefit from the combination of React and Django. This is due to the component-based architecture which can easily render the dynamic and interactive UI elements needed for tracking focus levels and Django’s backend is ideal for handling user data and analytics. React’s virtual DOM also ensures efficient updates which is crucial for real-time feedback. However, this tech stack also has some trade-offs; Django is not as asynchronous as Node.js, which could be a consideration for real-time features, though Django Channels can mitigate this. Vue.js is more straightforward than React and is considered to be simpler but does not include as much functionality. React also offers better support for data visualization libraries (react-google-charts, D3.js, Recharts). Regarding the database, PostgreSQL is great for working with Python-based ML models and works very well with Django.

I also drafted some wireframes on Figma for our app’s Landing Page, Calibration Page (for the camera and EEG headset), and the Current Session Page. Below are pictures:

My progress is on schedule. In the next week, I plan to have the wireframes of all the pages complete as well as the mockup designs. This includes the following pages: Home/ Landing, Features, About, Calibration, Current Session, Session Summary, Session History, and Top Distractions. I will also implement a clear and detailed plan (including diagrams) for the code architecture. This will have all the details regarding how the frontend and backend will interact and what buttons will navigate the user to certain pages.

 

Karen’s Status Report for 02/10

I spent this week more thoroughly researching and exploring CV and ML libraries I can use to implement distraction and behavior detection via a camera. I found MediaPipe and Dlib, both libraries compatible with Python and can be used for facial landmark detection. I plan to use these libraries to help detect drowsiness, yawning, and off-screen gazing. MediaPipe can also be used for object recognition, which I plan to experiment with for phone pick-up detection. Here is a document summarizing my research and brainstorming for camera-based distraction and behavior detection.

I also looked into and experimented with a few existing implementations of drowsiness detection. From this research and experimentation, I plan to use facial landmark detection to calculate the eye aspect ratio and mouth aspect ratio, and potentially a trained neural network to predict the drowsiness of the user.

Lastly, I submitted an order for a 1080p web camera that I will use to produce consistent camera results.

Overall, my progress is on schedule.

In the coming week, I hope to have a preliminary implementation of drowsiness detection. I would like to have successful yawning and closed eye detection via eye aspect ratio and mouth aspect ratio. I will also collect data and train a preliminary neural network to classify images as drowsy vs. not. If time permits, I will also begin experimentation with head tracking and off-screen gaze detection.

Below is a screenshot of me experimenting with the MediaPipe face landmark detection.

Below is a screenshot of me experimenting with an existing drowsiness detector.