Team’s Status Report for October 18

Most Significant Risks and Management

The primary risk identified was Fingertip Positional Accuracy, specifically along the keyboard’s depth (Z-axis). Previous geometric methods yielded significant positional errors, which threatened the system’s ability to distinguish between adjacent keys (e.g., confusing Q, A, or Z) and thus made reliable typing impossible. To manage this risk, our contingency plan was the rapid implementation of the Pixel Centroid Method. This technique calculates the statistically stable Center of Mass (Centroid) of the actual finger pixels, providing a highly stable point of contact that successfully mitigates the positional ambiguity risk.

Changes to System Design

A necessary change was introduced to the Fingertip Tracking Module design. We transitioned from geometric projection methods to an Image Processing Refinement Pipeline (the Pixel Centroid Method). This was required because the original methods lacked the vertical accuracy needed for key mapping. The cost was one additional week of time, but this is mitigated by the substantial increase in tracking stability and accuracy, preventing major integration costs down the line.

Updated Schedule

No significant changes have occurred to the overall project schedule.

Part A global factors
Across developing regions, many users rely primarily on smartphones or tablets as their only computing devices, yet struggle with slow or error-prone touchscreen typing due to small screen sizes or limited literacy in digital interfaces. By using the built-in camera and no additional hardware, our system provides a universally deployable typing interface that can work on any flat surface. It’s more practical for students, remote workers, and multilingual users worldwide. For instance, an English learner in rural India could practice typing essays on a table without needing a Bluetooth keyboard, or a freelance translator in South America could work comfortably on a tablet during travel. Because all computation happens locally on-device, the system can function without internet access, which is essential for regions with limited connectivity, while also ensuring user privacy. This design supports equitable access to digital productivity tools and aligns with sustainable technology trends by reducing electronic waste and dependence on specialized hardware.

Part B cultural factors
HoloKeys is designed to fit how people learn and use technology in classrooms, libraries, community centers, and travel settings. Because QWERTY is the most widely used layout, the interface aligns with familiar motor patterns and reduces training time. Instructions and tutorials are written in plain, idiom-free text that can be easily translated into other languages. Visual overlays are adjustable (font size, key size, contrast), allowing users to tune the interface to their needs. Because expectations around camera use vary, HoloKeys defaults to privacy-forward behavior: clear camera active indicators, no recording or image retention by default, and concise explanations of how and why the camera is used.

Part C environmental factors
Unlike traditional hardware keyboards, our solution requires minimal physical manufacturing, shipping, or disposal, thereby reducing material waste and overall carbon footprint. The system relies primarily on existing mobile devices, with only a small stand or holder as an optional accessory. This holder can also serve as a regular phone or tablet stand, further extending its lifespan and utility. By minimizing the need for new electronic components and leveraging devices users already own, our design helps reduce electronic waste and promotes more sustainable technology practices.

Part A was written by Hanning Wu, part B was written by Yilei Huang and part C was written by Joyce Zhu.

Hanning’s Status Report for October 18

This week I added a calibration instructor and a small finite-state machine (FSM) to the camera webpage. The FSM explicitly manages idle → calibrating → typing: when a handsDetected hook flips true, the UI enters calibrating for 10 s (driven by performance.now() inside requestAnimationFrame) and shows a banner with a live progress bar; on timeout it transitions to typing, where we’ll lock the keyboard pose. The module exposes setHandPresence(bool) for the real detector, is resilient to brief hand-detection dropouts, and keeps preview mirroring separate from processing so saved frames aren’t flipped. I also wired lifecycle guards (visibilitychange/pagehide) so tracks stop cleanly, and left stubs to bind the final homography commit at the typing entry.

I’m on schedule. Next week, I’ll integrate this web framework with Yilei’s calibration process: replace the simulated handsDetected with the real signal, feed Yilei’s pose/plane output into the FSM’s “commit” step to fix the keyboard layout, and run end-to-end tests on mobile over HTTPS (ngrok/Cloudflare Tunnel) to verify the calibration→typing flow works in the field.

current webpage view:

Yilei’s Status Report for October 18

This week, I added all the remaining keys and organized them to resemble a Mac keyboard. I expanded the layout to include the full number row, punctuation, Return, Shift, Delete, Space, arrows, and the Mac modifiers.

Although I didn’t finish as much as I had hoped (I also intended to add a feature to adjust the keyboard’s vertical scale (height)), I am still roughly on schedule. The calibration and overlay are functionally complete enough that I can wrap those controls next week without slipping the overall plan. To stay on track, I’ll start with a basic scaling slider and polish it after integration.

Next week I hope to add a slider to adjust the keyboard height and another to adjust the top-bottom ratio. In parallel, I’ll start working on the tap-decision logic and outline a testing plan for my tap-decision component by itself. The goal is to validate my decision module independently, then integrate with the actual tap-detection signals that my teammate is building toward the end of week 7.

 

Joyce’s Status Report for October 18

 What I did this week:

This week, I successfully resolved critical stability issues in fingertip tracking by implementing a new and highly effective technique: Pixel Centroid analysis. This robust solution moves beyond relying on a single, unstable MediaPipe landmark. It works by isolating the fingertip area in the video frame, applying a grayscale threshold to identify the finger’s precise contour, and then calculating the statistically stable Center of Mass (Centroid) as the final contact point. This system, demonstrated in our multi-method testing environment, includes a crucial fallback mechanism to the previous proportional projection method, completing the core task of establishing reliable, high-precision fingertip tracking.

Scheduling:

I am currently on schedule. The stability provided by the Pixel Centroid method has successfully mitigated the primary technical risk related to keypress accuracy.

What I plan to do next week:

Next week’s focus is on Task 4.1: Tap Detection Logic. I will implement the core logic for detecting a keypress by analyzing the fingertip’s movement along the Z-axis (depth). This task involves setting a movement threshold, integrating necessary debouncing logic to ensure accurate single keypress events, and evaluating the results to determine if complementary tap detection methods are required.

Joyce’s Status Report for October 4

What I did this week:
This week, I worked on implementing the second fingertip tracking method for our virtual keyboard system. While our first method expand on the direct landmark detection ofMediaPipe Hands to detect fingertips, this new approach applies OpenCV.js contour and convex hull analysis to identify fingertip points based on curvature and filtering. This method aims to improve robustness under varied lighting and situations when the color of the surface is similar to skin color. The implementation is mostly complete, but more testing, filter coding and parameter tuning are needed before comparing it fully with the MediaPipe approach.

Scheduling:
I am slightly behind schedule because fingertip detection has taken longer than expected. I decided to explore multiple methods to ensure reliable tracking accuracy, since fingertip detection directly impacts keypress precision. However, I plan to decrease the time spend on some minor tasks originally planed for the next few weeks, and potentially ask for help from teammates to catch up.

What I plan to do next week:
Next week, I will finish the second method, and test and compare both fingertip tracking methods to evaluate accuracy and responsiveness, then refine the better-performing one for integration into the main key detection pipeline.

Team’s Status Report for October 4

We made a design adjustment: The camera needs to sit at a different angle than originally planned. To find a suitable angle for our current gesture recognition model, we’ll first use an adjustable, angle-changeable device holder to test several angles, then commit by purchasing or fabricating a fixed-angle holder once we identify the ideal angle.
Current risk we found: Mobile devices couldn’t open the HTML directly (no real web origin, only Microsoft edge can open, but blocks the camera access)
Mitigation: We now host the page on a local server on the laptop and connect via the phone over the LAN (real origin/permissions work), with HTTPS or a tunnel available if needed.

Schedule updates: Junyu scheduled an additional week for fingertip recognition shifts out because the this task takes longer than expected since we want to test multiple different methods to ensure accuracy, and fingertip detection is curial to the accuracy of key detection. Hanning’s task for week3 and week4 switches to align with Yilei’s calibration process. Yilei reports no schedule change.

current schedule:

Hanning’s Status Report for October 4

This week I found and fixed the “mobile can’t open HTML” issue by serving the page from a local host on my laptop and connecting to it from a phone on the same Wi-Fi (instead of file://). I verified that modules load correctly and that camera permission works when accessed via a real origin, documenting the steps (bind to 0.0.0.0, visit http://<LAN-IP>:<port> or use HTTPS/tunnel when needed). I completed a basic text editor under the camera preview: independent <textarea> wired with helper APIs (insertText, pressKey, focusEditor) . I also began research on autocorrection methods, found lightweight approaches (rule-based edits, edit-distance/keyboard adjacency, and small n-gram/LM strategies) and noting how we can plug them into the editor’s input path.
I’m on schedule. Next week, I plan to display a calibration instruction panel on the webpage and push the autocorrection prototype further. There is also a slight change of my schedule—originally calibration instructions were slated for this week and the editor for next week, but I swapped them to align with my teammates’ timelines.

Yilei’s Status Report for October 4

This week, I replaced the placeholder trapezoid with a QWERTY keyboard overlay, corrected calibration so the left and right index fingertips land exactly on the F/J key centers, ensured the video is mirrored but the labels are not, set the keyboard to the correct orientation relative to our intended hand orientation, and added a 10-second calibration window that automatically freezes the keyboard, with a recalibrate button to restart (the timer only counts down when it’s seeing both hands). 

I’m on schedule. I have made progress on calibration, defining the keyboard’s size and position from fingertip separation, and the AR overlay. 

Next week, I plan to add the remaining keys (number row, punctuation, and Shift, Enter, and Space), introduce tilt-aware horizontal and vertical scaling based on a fixed keyboard tilt (moving from “looks like perspective” to a consistent homography), and work on the image keyboard mapping (imagePointToKey function).

Yilei’s Status Report for September 27

This week I built a file that integrates Joyce’s MediaPipe Hands component to auto-calibrate the keyboard from the left and right index fingertips (F/J) (a new method compared with last week’s four-tap approach) and renders a mirrored, perspective trapezoid outline aligned to those fingertips. I am on schedule for Tasks 1.2 and 1.3. Next week I plan to complete mapping keyboard positions to keys (Task 1.4) and finalize a basic calibration flow that locks the keyboard layout after a 10-second hold.

Joyce’s Status Report for September 27

Accomplishments:
This week I transitioned from using MediaPipe Hands in Python to testing its JavaScript version with my computer webcam for real-time detection. I integrated my part into Hanning’s in-browser pipeline and verified that fingertip landmarks display correctly in live video. During testing, I noticed that when the palm is viewed nearly edge-on (appearing more like a line than a triangle), the detection becomes unstable—positions shake significantly or the hand is not detected at all. To address this, we plan to tilt the phone or tablet so that the camera captures the palm from a more favorable angle.

After completing the initial hand landmark detection, I began work on fingertip detection. Since MediaPipe landmarks fall slightly behind the true fingertip tips, I researched three refinement methods:

  1. Axis-based local search: extend along the finger direction until leaving a hand mask to find the most distal pixel.
  2. Contour/convex hull: analyze the silhouette of the hand to locate fingertip extrema.
  3. CNN heatmap refinement: train a small model on fingertip patches to output sub-pixel tip locations.

I have started prototyping the first method using OpenCV.js and tested it live on my webcam to evaluate alignment between the refined points and the actual fingertips. This involved setting up OpenCV.js, building a convex hull mask from landmarks, and implementing an outward search routine.

Next Week’s Goals:

  1. Complete testing and evaluation of the axis-based local search method.
  2. Implement the contour/convex hull approach for fingertip refinement.
  3. Collect comparison results between the two methods, and decide whether implementing the CNN heatmap method is necessary.