rdettmar – Page 2 – Team B1: CookAR

Team Status Report for March 8, 2025

Project Risks and Mitigation Strategies

At this point the only hardware delay we’re waiting on is the displays. The original displays ordered were cancelled, so similar alternate parts had to be identified and purchased- these come from sellers with much more recent activity, and before the parts were ordered we asked and received confirmation that they were still in stock, so we are much more hopeful that they will arrive when expected. We then, of course, have to hope they work- they fill the same technical niche as the original part, as small, lightweight displays with optical lenses meant for heads-up displays, but since they are slightly different we no longer have proof that the deconstruction we have planned will work exactly as expected.

Beyond hardware, another key area of risk is ensuring the gesture-based interactions function smoothly and intuitively. As part of my work on the gesture system, I have been refining the detection algorithms and ensuring that they align well with user expectations. The primary risk is potential latency or inconsistency in recognizing gestures, especially in dynamic environments. So I am looking vat sensitivity tuning and error correction methods to mitigate this.

Changes to System Design

No changes to the system design were made this week.

Schedule Progress

Independent schedule progress is addressed in our individual reports. While the order of some tasks has shuffled, our work is roughly on track.

Meeting Specific Needs

Part A was written by Diya, part B was written by Rebecca, and part C was written by __.

Part A: … with consideration of global factors.

The product solution we are designing aims to meet the need for intuitive and hands-free interaction in augmented reality environments. By incorporating a gesture-based input system, users can interact naturally without relying on physical controllers, improving accessibility and ease of use. The gesture recognition is designed to be responsive to gestures and adaptive to the users’ input.

Part B: … with consideration of cultural factors.

Food is, of course, a foundational part of just about every culture everywhere on Earth. It’s one of the first things people may turn to when trying to reconnect as an adult with a culture they may have missed out on, for various reasons, in childhood, or lost track of somewhere along the line- or when reaching out to friends and family of different cultures. But starting to learn how to cook, or learning how to cook in a style one is entirely unfamiliar with, can be a daunting undertaking; by lowering this barrier of entry, we hope that people will be more encouraged to attempt this particular path to connecting with a new culture, be it their own or their loved ones’.

Part C: … with consideration of environmental factors.

[ ]

We are trying to design as low of a cost environmentally as we can. Our physical frame is designed to last for long use, by being as sturdy as possible. We plan on using as small as an EC2 instance as needed to deploy our web app and store any databases. And inherently, cooking at home is better for the environment: less one time use plastics are used for packaging m, there are less delivery emissions, and food waste is lessened if someone is able to easier cook food that they want to eat when they want it. Our recipes included are all simple recipes, that aim to use ingredients that the user will probably already have: also reducing food waste. Overall, home cooking is more environmentally friendly than ordering takeout, which is an option that many people that don’t feel comfortable cooking at home will end up doing- CookAR hopes to bridge that gap and get people to start homecooking. Though there are additional environmental costs associated with creating a new physical glasses product and running a website, we aim to be intentional about what we design and what resources we use in a way that is as environmentally conscious as possible.

Rebecca’s Status Report for March 8, 2025

Report

I have learned that despite being supposedly a very mainstream device, the Raspberry Pi is… remarkably unintuitive. I’m using Raspberry Pi OS Lite to run the Rasppi headless and ssh into it, though for an as-of-yet unclear reason my computer does not seem to be able to resolve the Rasppi’s hostname and I have to use the IP address directly. This has only worked for this week’s development because I have direct access to my router and its IP address assignments at home, and will immediately have to resolve this issue upon returning to campus. Figuring out how to get into the Rasppi took just far, far too long because every single tutorial and answered question and Guide To Headless Rasppis that I could find online assumed that you could resolve the hostname, which is a very reasonable assumption, and simply bizarrely untrue in my case. I don’t know.

The Raspberry Pi OS Imager also doesn’t tell you what the name of the OS you’re using is, and even on the main website it’s just kind of… a throwaway inline parenthetical comment. Despite being the main thing the entire community uses to refer to the major versions of the OS. And so many things changing between them. It’s. This was a conscious decision. Why would you do it this way.

After figuring out the issue and getting into the board, getting it to talk to the camera was relatively simple (though I had the cable in upside down for a bit, which was deeply frustrating to discover after an hour and a half of debugging. So it goes). I’m using the native Raspberry Pi Camera Module, which is, you know, supposed to be the native camera and therefore straightforward to use, but you would just not believe the number of problems I have had because I’m using a native Pi camera instead of a USB camera.

First photograph captured from the Pi camera! It’s blurry and poorly exposed because I’ve left the protective plastic tab over the lens, since it still has to travel back to Pittsburgh. I expect the quality to be better once I take that off.

I also discovered that OpenCV’s primary image capture method VideoCapture(camera_id) is not compatible with libcamera, the regular Raspberry Pi camera library, because of course it isn’t. Surely nobody would ever want to use OpenCV straightforwardly on a minimal Raspberry Pi. Surely that couldn’t be an extremely common desire and mainstream goal. Can’t imagine.

However Picamera2, the Bookworm Python wrapper for libcamera, is configurable enough to be kind of compatible itself with MediaPipe.

(As an aside: all of the libraries I used this week I was able to access via pip, and that also seems to be the simplest way to use MediaPipe, except for Picamera2, which was only accessible with apt; I set the include-system-site-packages flag in my pyvenv.conf to true to be able to use it.)

This is the MediaPipe on Raspberry Pi tutorial I started from. It doesn’t work on its own, because it relies on the OpenCV method that doesn’t work, but I used it and the associated tutorials linked to set up the Python environment (sigh. why did it have to be Python) and MediaPipe installation.

I found this document, which was exactly what I want to do, with the sole caveat that it’s ten years out of date. Picamera has been displaced with Picamera2, which has been significantly streamlined and so the translation isn’t 1:1, and I’m not familiar enough with either library to do a quality translation. Sigh.

I ended up being able to scavenge bits and parts from this document and from the Picamera2 repo examples to make an trial script which captures images off the camera and streams them via OpenCV (in this case over my ssh tunnel, which was very slow, but I hope an amount of that is the ssh streaming and it will speed up when I cut that).

I was able to then graft the working Picamera image-capture script onto the MediaPipe script provided in the first tutorial. I’m just using a generic model right now, not our own custom gesture language, but it is a proof that the software works on the hardware. If only just barely. It ran at this point extraordinarily slowly, and there was truly just an untenable amount of lag between my hand motions and what I saw on the screen, and even more between the motion of the frames on the screen and the MediaPipe overlay. Making it run faster became a critical priority.

Image capture of the MediaPipe hand tracker running on the Raspberry Pi.

I modified the camera configuration to tell the software reading the camera both the resolution that I wanted out of it (which was already there) and the raw native resolution of the camera. This seemed to fix my zoom problems- the camera’s field of view was far smaller than I had expected or wanted; it seemed to have just been cutting out a 640×480 box out of the center of the FOV. With access to the native resolution, it appears to be binning the pixels to the desired resolution much more cleanly. Additionally, I fixed the framerate, which had previously just been at “whatever the software can handle”. Pinning it at 1.5fps sped up MediaPipe’s response time greatly, improved its accuracy, and all of the lag functionally disappeared (even still streaming the output). It also kept the board from getting so dang hot as it was before; Raspberry Pis since the 3 underclock when they hit 60C, and according to my temp gun that’s about where I was hanging before I fixed the framerate, so that was probably also contributing to lag.

Image capture of the MediaPipe hand tracker working on the Raspberry Pi.

1.5fps is a little lower than I wanted it to be, though. I switched the framerate and recognition outputs to feeding to a printline and turned off the streaming, and was able to trivially double my framerate to 3fps. This hits the spec requirement!

If possible, I’d like to try to pull OpenCV entirely out of the script (with the possible exception of its streaming feature for debugging purposes) since Picamera2 seems to have all of the functionality of OpenCV that I’m using, and in a much more lightweight, Raspberry Pi-native library. I believe this may help me improve the responsiveness of MediaPipe, and will certainly make the script cleaner, with fewer redundant, overkill tools. However, since it works just fine as is, this is not a high priority.

Progress Schedule

I’ve shuffled around my tasks slightly, accelerating the work on MediaPipe while pushing off the HDMI output slightly, so I’m ahead on one section while being behind on another. I’ve also had to put off measuring the power consumption of the Rasppi until I had the recognition model working- in retrospect, I don’t know why measuring the power consumption was placed ahead of getting the most power-hungry algorithm working. I’m not particularly worried about the lead time on the battery, so I’m fine with that getting estimated and selected a bit later than expected.

Next Week’s Deliverables

Originally next week was meant to be the MediaPipe recognition week, while this week was for the HDMI out, but this has been flipped; I plan on working on the code which will generate the display images next week. Additionally, I’ll have to figure out how to log into the Rasppi on the school’s internet connection when I don’t know its IP address directly, which may take a nontrivial amount of time.

Team Status Report for February 22, 2025

Project Risks and Mitigation Strategies

A key bottleneck in the project is the delay in receiving the Raspberry Pis which don’t arrive until Monday. This impacts our ability to test power consumption and system performance. To mitigate this delay, Rebecca has already obtained the SD cards and has flashed OS onto them, so the boards can be booted immediately upon arrival. This will allow us to immediately start testing early in the week.
If there are additional delays with hardware setup, we will proceed with software side testing on local machines and simulate hardware behavior to continue the development.
Charvi already has profile, following and registration functionality figured out and she is integrating and debugging these components.
Diya has set up gesture recognition locally and is currently testing its accuracy. If accuracy issues arise, we will adjust the model parameters, consider alternative gesture recognition models, or refine preprocessing techniques.
Rebecca has drafted the headset CAD so a base exists for the mount points and, as mentioned above, prepped the SD cards and found instructions for installing and running OpenCV on a Raspberry Pi, to jumpstart our work on this.

Changes to System Design

No changes to the system design were made this week.

Schedule Progress

A few of the Rasppi-testing-related tasks expected to be done this week have been pushed to next week on account of the boards not arriving. No other changes have been made. Some of next weeks tasks may be pushed into spring break, on account of this delay and possibly underestimating the time the design report will take to write, but this slack time should catch all of it.

Rebecca’s Status Report for February 22, 2025

Report

The remaining parts (Rasppis, SD cards, camera, HDMI cables for testing) were ordered Monday/Tuesday, and most of them arrived by Friday. What remains to arrive are, unfortunately, the Rasppis themselves, which is bottlenecking my progress.
I’ve drafted the CAD as expected (see below) which took. so long. Just so very many more hours than I thought, which, yeah, probably should have seen that one coming. Note for future modelling: do not use splines. Splines are the highway to incompletely constrained sketches. God, why did I use splines.

I’ve flashed the SD cards with the Raspberry Pi OS so I can boot them as soon as they arrive (expected Monday). Diya and I can sit down and check the model, and run the tests I need for power draw then.

Progress Schedule

A few of the tasks I expected to be done this Friday/Saturday did not get done because of the delivery delay. I cannot measure the power consumption of a board I do not have.
If I don’t get horribly unlucky, this should be done early next week; some of next week’s tasks may end up getting pushed into spring break, but we have that slack time there for that very reason. Most of the time dedicated to this class for the upcoming week is likely to be spent writing the design report.

Next Week’s Deliverables

The design report, obviously, is due at the end of next week. This is a team deliverable.
The Mediapipe/OpenCV-on-Rasppi tests I expected to do this week. We’ll know the power consumption, and then I can figure out what kind of battery I’ll need.

Team Status Report for February 15, 2025

Project Risks and Mitigation Strategies

The most significant risk is that the AR display won’t be arriving till mid March so the integration has had to be pushed. The board has an HDMI output so we can test the system using a computer monitor instead of the AR display. If the AR display arrives later than expected we will conduct most of our testing on an external monitor.
Two other significant risks are that the gesture recognition algorithm cannot run on the Raspberry Pi, or that the power demand of it running is too high for a reasonably-weighted battery. If either of these are true, we can offload some of the computational power onto the web app via the board’s wireless functionality.

Changes to System Design

We are adding a networking and social feature to the web app. This involves adding scoring incentive to completing recipes that are then translated to levels that are displayed on the user profile. Users can follow each other and view each other’s progress on profiles. We will also deploy our application on AWS EC2. This change was necessary to add back complexity into the project since we are directly using the pre-trained model from MediaPipe for gesture recognition.

Schedule Progress

We have reworked and detailed our Gantt chart.

Meeting Specific Needs

Part A was written by Diya, Part B was written by Charvi, and Part C was written by Rebecca.

Please write a paragraph or two describing how the product solution you are designing will meet a specified need…

Part A: … with respect to considerations of public health, safety or welfare.

By using the gesture interaction method, users can navigate recipes without touching the screen and this reduces cross contamination risks especially when handling raw ingredients. Additionally, by allowing the users to cook in a step by step manner, it helps the users to focus on one task at a time which allows beginner cooks to gain confidence. Also, with gesture control user’s will have less distractions such as phones or tablets and can minimize the risk of accidents in the kitchen. Moreover, the social network feature can allow users to track their progress and connect with other beginner cooks to promote a sense of community amongst new cooks.

Part B: … with consideration of social factors.

Our target user group is “new cooks” – this includes people that don’t cook often, younger adults and children that are in new environments where they have to start cooking for themselves, and people that have been bored or confused by cooking on their own. Our CookAR product will allow people to connect with one another on a platform focused on cooking and trying out new recipes, which will lead them to be motivated by like-minded peers to work on furthering their culinary knowledge and reach. In addition, by game-ifying the cooking process by allowing users to level up based on how many new recipes they tried, CookAR will also motivate people to try new recipes and engage with each other’s profiles, which will be displaying the same information about what recipes were tried and how many.

Part C: … with consideration of economic factors.

A lightweight headset made from relatively inexpensive parts- fifteen-dollar Raspberry Pi boards, a few dollars at most for each of the rest of the peripherals; the most expensive part is the FLCoS display, and even that is only a few tens of dollars for a single item off the shelf- is ideal for a target audience comprised of people trying to get into something they haven’t done much or any of before, and so a target audience which is unlikely to want to spend a lot of money on a tool like this. Compared to a more generalized heads-up-display on the market (or rather, formerly on the market) like the Google Glass, which retailed for $1500, this construction is cheap, while still being fairly resilient.

Additionally, this same hardware and software framework could be generalized to a wide variety of tasks with marginal changes, and a hypothetical “going-into-production” variant of this product would very easily be able to swap out the four-year-old Raspberry Pi Zero W for a something taking advantage of those years of silicon development- for instance additional accelerators like an NPU, tailored to our needs- in a manner such that scale offsets the increase in individual part price.

Rebecca’s Status Report for February 15, 2025

Report

I spent much of this week reworking the hardware decisions, because I realized in our meeting Tuesday morning after walking through the hardware specs of the ESP32s and the demands of the software that they almost certainly would not cut it. I decided to open the options to boards that demand 5V input, or recommend 5V input for heavy computation, and achieve this voltage by using a boost board on a 3.7V LiPo battery. After considering a wide variety of boards I narrowed my options down to two:
- The Luckfox Pico Mini, which is based on a Raspberry Pi Pico; it is extremely small (~5g) but has an image processing and neural network accelerators. It has more RAM than an ESP32 (64MB in the spec, about 34MB usable space according to previous users) but still not a huge amount.
- The Raspberry Pi Zero W, which has more RAM than the Luckfox (512MB) and a quad-core chip. It is also about twice the size of the Luckfox (~10g), but has a native AV out, which seems to be fairly unusual, and Bluetooth LE capability. This makes it ideal for running the microdisplay, which takes AV in, so I will not have to additionally purchase a converter board.
The decision was primarily which board to use for the camera input. Without intensive testing, it seems to me that if either are capable of running the MediaPipe/OpenCV algorithm we plan to use for gesture recognition, both would be- so it comes down to weight, speed, and ease of use.
Ultimately I’ve decided to go with two Raspberry Pi Zero W boards, as learning the development process for two different boards- even if closely related boards, as these are- would cost more time than I have to give. Additionally, if the Rasppi is not capable of running the algorithm, it already has wireless capability, so it is simpler to offload some of the computation onto the web app than it would be if I had to acquire an additional Bluetooth shield for the Luckfox, or pipe information through the other Rasppi’s wireless connection.
Power consumption will be an issue with these more powerful boards. After we get the algorithm running on one, I plan to test its loaded power consumption and judge the size of the battery I will need to meet our one-hour operation spec from there.
Additionally, considering the lightness of the program to run the display (as the Rasppi was chosen to run this part for its native AV out, not for its computational power) it may be possible to run both peripherals from a single board. I plan to test this once we have the recognition algorithm and a simple display generation program functional. If so, I will be able to trade the weight of the board I’m dropping into 10g more battery, which would give me more flexibility on lifetime.
Because of the display’s extremely long lead time, I plan to develop and test the display program using the Rasppi’s HDMI output, so it will be almost entirely functional- only needing to switch over to AV output- when the display arrives, and I can bring it online immediately.

Progress Schedule

Due to the display’s extremely long lead time and the changes we’ve made to the specs of the project, we’ve reworked our schedule from the ground up. The new Gantt chart can be found in the team status report for this week.

Next Week’s Deliverables

The initial CAD draft got put off because I sank so much time into board decisions. I believe this is okay because selecting the right hardware now will make our lives much easier later. Additionally, the time at which I’ll be able to print the headset frame has been pushed out significantly, so I’ve broken up the CAD into several steps, which are marked on the Gantt chart. The early draft, which is just the shape of the frame (and includes the time for me to become reacquainted with OnShape) should be mostly done by the end of next week. I expect this to take maybe four or five more hours.
The Rasppis will be ordered on Amazon Prime. They will arrive very quickly. At the same time I will order the camera, microHDMI->HDMI converter and an HDMI cable, so I can boot the boards immediately upon receipt and get their most basic I/O operational this week or very early next week.

Team Status Report for February 8, 2025

Project Risks and Mitigation Strategies

Gesture Recognition Accuracy and Performance Issues
- Risk: the accuracy of the gesture detection might be inconsistent or there might be limitations in the model chosen
- Mitigation: test multiple approaches (MediaPipe, CNNs, Optical Flow) to determine the most robust method and then fine tune the model
- If vision recognition is very unreliable, explore other sensor based alternatives such as integrating IMUs for gesture detection
Microcontroller compatibility
- Risk: the microcontroller needs to support the real time data processing for the gesture recognition and AR display without latency issues
- Mitigation: carefully evaluate microcontroller options to ensure compatibility with CV model. The intended camera board is designed for intensive visual processing.
  - If the microcontroller is not suitable for the CV model, we will look into offloading some of the processing power from the microcontroller to the laptop. This may require sending a great deal of data wirelessly and must be approached with caution.

Changes to the System Design

Finalizing the device selection: There are fewer development board options than modules; however, we need the development board as we do not have the time to sink into creating our own environment. So we will be using the ESP32-DevKitC-VE Development Board, which implements a WROVER-E controller. This has the most storage capacity for its form factor and reasonable price.
- See Rebecca’s status report for the same week for more information about the device selection.
Refining the computer vision model approach: Initially only considered a CNN based classification model for gesture recognition but after more research also testing MediaPipe and Optical Flow for potential improvements

Schedule Progress

Our deadlines do not start until next week, so our schedule remains the same.

Rebecca’s Status Report for February 8, 2025

Report

I have researched & decided upon specific devices for use in the project. I will need two microcontrollers, a microdisplay, a small camera, and a battery, all of which combined are reasonable to mount to a lightweight headset.
- The microcontroller I will use for the display is the ESP32-WROVER-E (datasheet linked), via the development kit ESP32-DevKitC-VE. I will additionally use an ESP32-Cam module for the camera and controller.
  - I considered a number of modules and development boards. I decided that it was necessary to purchase a development board rather than just the module as it is both less expensive and will save me time interfacing with the controller as the development board comes with a micro USB port for loading instructions from the computer as well as easily-accessible pinouts.
  - The datasheet for the ESP32-Cam notes that the 5V power supply is recommended, however it is possible to power on the 3.3V supply.
  - The ESP32-Cam module does not have a USB port on the board, so I will also need to use an ESP-32-CAM-MB Adapter. As this is always required, these are usually sold in conjunction with the camera board.
- The display I will use is a 0.2″ FLCoS display, which comes with an optics module so the image can be reflected from the display onto a lens.
- The camera I will use is an OV2640 camera as part of the ESP32-Cam module.
- The battery I will use is a 3.3V rechargeable battery. Likely a Li-PO or LiFePO4 battery, but I need to nail down current draw requirements for the rest of my devices before I finalize exactly which power supply I’ll use.
I have found an ESP32 library for generating composite video, which is the input that the microdisplay takes. The github is here.
I have set up & have begun to get used to a ESP32-IDF environment (works on VSCode). I also have used the Arduino IDE before, which seems to be the older preferred environment for programming ESP32s.
I have begun to draft the CAD for the 3D-printed headset.

Progress Schedule

Progress is on schedule. Our schedule’s deadlines do not begin until next week.
I’m worried about the lead time on the FLCoS display. I couldn’t find anyone selling a comparable device with a quicker lead time (though I could find several displays that were much larger and cost several hundred dollars). The very small size (0.2″) seems to be fairly unusual. I may have to reshuffle some tasks around if it does not arrive before the end of February/spring break. This could delay the finalization of our hardware.

Next Week’s Deliverables

By the end of the weekend (Sunday) I plan to have submitted the purchasing forms for the microcontrollers, camera, and display, so that I can talk to my TA Monday for approval, and the orders can go out on Tuesday. In the time between now and Tuesday, I’ll finalize my battery choice so it can hopefully go through on Thursday, or early the following week.
By the end of next week I plan to have the CAD for the 3D printed headset near-complete, with specific exception of the precise dimensions for the device mounting points, which I expect to need physical measurements that I can’t get from the spec sheets. Nailing down these dimensions should only require modification of a few constraints, assuming my preliminary estimates are accurate, so when the devices come in (the longest lead time is the display, which seems to be a little longer than two weeks) I expect CAD completion to take no more than an hour or so, and printing doable within a day or so thereafter.
I plan to finish reading through the ESP32 composite video library and begin to write the code for the display generation so that when it is delivered I can quickly proof successful communication and begin testing.
I plan to work through the ESP32-Cam guide so that when it arrives (much shorter lead time than the display) I can begin to test and code it, and we can validate the wireless connections.