Team’s Status Report for December 6

Most Significant Risks and Mitigation
Our major risk this week continued to be tap detection accuracy. Despite several rounds of tuning thresholds, filtering sudden CV glitches, and improving motion heuristics, the camera-only method still failed to meet our accuracy requirement.

To mitigate this risk, we made a decisive design adjustment: adding external hardware support through pressure-sensitive fingertip sensors. Each sensor is attached to a fingertip and connected to an Arduino mounted on the back of the hand. We use two Arduinos total (one per hand) each supporting four sensors. The Arduino performs simple edge detection (“tapped” vs “idle”) and sends these states to our web app, where we replace our existing tap module to sensor signal→ key → text-editor pipeline. This hardware-assisted approach reduces false negative, which was our biggest issue.

Changes to System Design
Now our system now supports two interchangeable tap-detection modes:

  1. Camera-based tap mode (our original pipeline).
  2. Pressure-sensor mode (hardware-assisted tap events from Arduino).

The rest of the system, including fingertip tracking, keyboard overlay mapping, and text-editor integration, remains unchanged. The new design preserves our AR keyboard’s interaction model while introducing a more robust and controllable input source. We are now testing both methods side by side to measure accuracy, latency, and overall usability, ensuring that we still meet our project requirements even if the pure CV solution remains unreliable.

Unit Tests (fingertip)
We evaluated fingertip accuracy by freezing frames, then manually clicking fingertip positions in a fixed left-to-right order and comparing them against our detected fingertip locations over 14 valid rounds (10 fingers each). The resulting mean error is only ~11 px (|dx| ≈ 7 px, |dy| ≈ 7 px), which corresponds to well under ¼ key-width in X and ½ key-height in Y. Thus, the fingertip localization subsystem meets our spatial accuracy requirement.

We also conducted unit tests for calibration by timing 20 independent calibration runs and confirming the average time met our ≤15 s requirement.

System Tests 
We measured tap-event latency by instrumenting four timestamps (A–D) in our pipeline: tap detection (A), event reception in app.js(B), typing-logic execution (C), and character insertion in the text editor (D). The result is 7.31ms, which is within expected timing bounds.
A→B: 7.09 ms
A→C: 7.13 ms
A→D: 7.31 ms
B→D: 0.22ms

For accuracy, we performed tap-accuracy experiments by collecting ground-truth taps and measuring detection and false-positive rates across extended typing sequences under controlled illuminance values (146, 307, and 671 lux).

  • Tap detection rate = correct / (correct + undetected) = 19.4%
  • Mistap (false positive) rate = false positives / (correct + undetected) = 12.9%

Hanning’s Status Report for December 6

This week I explored a new hardware-assisted tap detection method for our virtual keyboard. Instead of relying solely on camera-based motion, we connected pressure-sensitive resistors to each fingertip and fed the signals into an Arduino. Initially, we could only see the pressure values in the Arduino Serial Monitor, and I wasn’t sure how to turn that serial output into something JavaScript could use. To solve this, I implemented a new module, pressure.js, that uses the Web Serial API to read the Arduino’s serial stream directly in the browser. The Arduino sends simple newline-delimited messages like “sensorIndex,state” (e.g., 1,tapped), and pressure.js parses each line into a structured event with handIndex, sensorIndex, fingerName, fingerIndex, and state.
Within pressure.js, I added helper logic to normalize different tap labels (tapped, tap, pressed, 1, etc.) into a single “tap” event and then invoke user-provided callbacks (onState and onTap). This lets the rest of our web framework treat Arduino pressure events exactly like our previous tap detection output: we can map fingerIndex to the existing Mediapipe fingertip landmarks and reuse the same downstream pipeline that maps fingertips to keys and updates the text editor. In other words, Arduino signals are now fully integrated as JavaScript events/variables instead of being stuck in the serial monitor.

Schedule change is turning my slack week into implementing this pressure sensor solution.

Hanning’s Status Report for November 22

What I did this week
This week I restructuring the dataflow pipeline for the entire system. Previously, the virtual keyboard and physical keyboard behaved like two separate input paths feeding into the text editor. I rewrote the logic so that the editor only ever receives standard keystroke events, regardless of whether they originate from the physical keyboard or from our CV-driven virtual keyboard. This required reorganizing the flow of signals from the fingertip detection module, converting virtual taps into synthetic KeyboardEvent objects that fully match real hardware keyboard input. With this unified pipeline, the text editor no longer needs to know the difference between physical and virtual typing, simplifying integration and reducing bugs caused by mismatched interfaces.

I also fixed several UI behavior issues. First, I implemented the Shift UI logic we planned last week: Shift presses now update the on-screen keyboard layout dynamically, showing the shifted characters while Shift is active and reverting afterward. Second, I made “Focus on Editor” automatically triggered when the user starts typing. I limited camera selection to front camera only, since using the rear camera results in a mirrored video feed and breaks calibration symmetry.
I’m behind schedule since I haven’t start testing. I will try to catch up next week.
What I plan to do next week
Next week I will begin testing the new unified dataflow and verifying that the text editor behaves correctly under virtual keyboard input.

Over the course of these tasks, I had to pick up several new skills on my own. Most of my learning came from studying GitHub repositories, reading browser and JavaScript API documentation, and searching StackOverflow whenever I hit a problem. I learned how to debug JavaScript and HTML more effectively using the console (checking event flow, inspecting DOM elements, and printing intermediate fingertip and keystroke data). I also understand how mobile browsers handle camera usage, including permissions, mirroring behavior, and device-specific constraints. For local development, I learned how to properly run a local host to avoid camera-access issues. On the text editor side, I explored how keystrokes are handled, how autocorrection works, and how to integrate synthetic events cleanly.

Team’s Status Report for November 22

Most Significant Risks and Mitigation
This week, our main challenge came from tap detection instability. If the sensitivity is too high, the system picks up random glitches as taps; if we reduce sensitivity, normal taps get missed while glitches still sometimes pass through. Overall, it’s still difficult for the model to reliably distinguish between a real tap, natural hand movement, and a sudden CV glitch.
To mitigate this, we worked on two short-term fixes:
1. Filtering “teleport” motion — when fingertip coordinates jump too fast, we now label these as glitches and discard the frames.
2. Re-tuning tap sensitivity — we are testing a middle-ground threshold that keeps normal taps detectable without letting small jitters trigger fake keys.

Changes to System Design
While we continue tuning the glitch-filtering pipeline, we also started researching a new design direction: Reconstructing approximate 3D finger movement from the camera stream.
The idea is that true taps correspond to vertical motion toward the desk, whereas random movement is usually horizontal or diagonal. If we can estimate whether a finger is moving “downward” vs “across,” tap detection becomes much more robust.

Schedule Changes
We may need more time for testing and tuning, so we plan to convert our Week 11 slack week into a testing/verification week.

Hanning’s Status Report for November 15

What I did this week:
This week, I worked on additional features for autocorrection system for the text editor. I focused on building the rule-based autocorrection engine, which handles common typos, capitalization fixes, and safe word replacements using a dictionary-style approach. In parallel, I also explored possible machine learning–based autocorrection strategies. Additionally, I encountered an integration issue with the modifier keys (Shift, Control, Option, Command): after Joyce merged fingertip detection into the full pipeline, these modifiers stopped functioning correctly when tapped, so I have been debugging and tracing the event flow to restore proper toggle behavior.

Scheduling:
My progress is slightly behind schedule due to time spent diagnosing the modifier key integration issue.

What I plan to do next week:
Next week, I will add a visual signal for Shift presses and use that signal to dynamically update the on-screen keyboard layout, displaying the shifted key characters while Shift is active and reverting when it’s toggled off. I will also fix and refine several user interface elements, including the functions of “Focus on Editor” and “Select Camera” buttons. (we should always use the front camera of mobile devices, and that camera view should be mirrored to fit in our finger detection module.)

Verification (Web Framework + Text Editor):
For my part, most verification is functional rather than heavy-quantitative, but I still have a few tests planned. For the web framework, I’ll run basic end-to-end checks on different devices (laptop + phone) to make sure the camera loads, switching cameras works, calibration starts correctly, and the UI buttons (focus editor, select camera, black mode, etc.) behave consistently. I’ll also measure simple responsiveness by logging the time between a simulated key event and the character showing up in the editor, just to confirm the typing pipeline isn’t adding noticeable delay. For the text editor, I’ll use small scripted tests that feed in sequences of pressKey/insertText calls (including toggling Shift/Control/Option/Command) and check if the final text matches what we expect. I also prepared a small list of common typos to see whether the rule-based autocorrection fixes them correctly without breaking normal words. This gives me a quick accuracy snapshot and helps make sure nothing behaves unpredictably when we integrate everything together.

Hanning’s Status Report for November 8

What I did this week:

This week, I convert all the modifier keys—Shift, Control, Option, and Command—from hold-based behavior to tap-based toggles. Because our fingertip detection system can only recognize discrete tap events rather than continuous presses, this redesign ensures that these keys can now be activated or deactivated through single taps, behaving more naturally in our touchless environment. I implemented a new modifier state manager that tracks each key’s toggle state, allowing Shift to function like Caps Lock and ensuring that Control, Option, and Command can maintain their “pressed” states across multiple keystrokes. I am on schedule.

What I plan to do next week:
Next week, I will focus on preparing for the upcoming project showcase. My goal is to refine the demo experience by ensuring smooth integration between the fingertip detection, calibration, and typing modules, and to polish the user interface for a clear and stable demonstration.

Team’s Status Report for November 1

Most Significant Risks and Management
This week, we identified a new risk concerning hover versus contact ambiguity (the system’s difficulty in determining whether a user’s fingertip is truly resting on the keyboard plane or merely hovering above it.) This issue directly affects tap accuracy, as vertical finger movements in midair could be misinterpreted as valid keystrokes. To mitigate this, we refined our tap detection mechanism by incorporating gesture-based state validation. Specifically, the algorithm now verifies that every tap motion begins with an “in-air” finger gesture and ends with an “on-surface” gesture, as determined by the relative positions and flexion of the fingertips. Only if this air-to-surface transition coincides with a rapid downward motion is the tap event confirmed.
This approach reduces false positives from hovering fingers and improves robustness across users with different hand postures.

Changes to System Design
The system’s tap detection algorithm has been upgraded from a purely velocity-based method to a state-transition-driven model. The previous implementation relied solely on instantaneous speed, distance, and velocity drop thresholds to identify tap events, which worked well for clear, strong taps but struggled with subtle finger motions or resting gestures. The new design introduces two additional layers:

  1. Finger State Classification: Each fingertip is now labeled as either on-surface or in-air based on its relative position, curl, and height within the calibrated plane.

  2. State Transition Validation: A tap is recognized only when a downward motion sequence transitions from in-air → on-surface within a short temporal window.

By coupling spatial and temporal evidence, the system should be able to differentiate between deliberate keystrokes and incidental finger motion.

Updated Schedule
Hanning’s original plan for this week was to implement the keystroke event handling module. However, since fingertip output data is not yet fully stable, that task is postponed to next week. Instead, Hanning focused on developing the copy-paste function for the text editor and assisted in integrating existing components of the computer vision and calibration pipelines.

Hanning’s Status Report for November 1

What I did this week:
This week, I improved the built-in text editor module by adding new functionality, including copy, paste, and basic autocorrection features (not based on ML, but some based on some rules and dictionary) to make typing and editing more seamless within our web interface. I also recorded several reference videos using our mobile device holder, capturing typing from multiple camera angles to better analyze fingertip visibility and calibration alignment under different lighting and perspectives. In addition, I worked on integrating these updates with our existing calibration and camera framework to ensure that the editor can properly receive text input once fingertip-triggered keystrokes are implemented.

Scheduling:
There was a slight change to the task plan: originally, this week’s task was focused on keystroke event implementation, while the copy-paste functionality was scheduled for next week. However, due to delays in the fingertip detection component, I handled the functions for text editor this week instead, while the keystroke event logic will be postponed.

What I plan to do next week:
I’ll work with Joyce to integrate Joyce’s fingertip detection module into the unified web framework and wire its output to actual keystroke events in the text editor. Hopefully, this will complete the data flow from fingertip detection to visible character input.

Hanning’s Status Report for October 25

What I did this week: I merged our previously separate modules into a single, working page so that Joyce’s and Yilei’s parts could run together. Specifically, I unified camera_setup.html, fingertip_detection.html, and the calibration app.js from index0927.html (yilei’s calibration file) into one loop: a shared camera pipeline (device picker, mirrored preview, hidden frame buffer for pixel ops) feeds both fingertip detection and calibration; the F/J calibration computes a keyboard quad (variable-width rows, height/top-bottom shaping), and I render the QWERTY overlay on the same canvas. I added a method switcher for fingertip sourcing (M1 landmark tip, M2 projection, M5 gradient with threshold/extension knobs), normalized coordinates so preview can remain mirrored while detection/overlay run in un-mirrored pixel space, and exposed simple text I/O hooks (insertText/pressKey) so detected points can drive keystrokes. I also cleaned up merge artifacts, centralized the run loop and status controls (live/landmarks/black screen), and kept the 10-second “freeze on stable F/J” behavior for predictable calibration. I’m on schedule this week.

What I plan to do next week: I’ll pair with Joyce to fold her fingertip detector into this pipeline, add basic stabilization/debounce, and wire tip contacts to the keystroke path (tap FSM, modifiers, and key labeling). The goal is to land end-to-end typing from fingertip events and begin measuring latency/accuracy against our targets.

Hanning’s Status Report for October 18

This week I added a calibration instructor and a small finite-state machine (FSM) to the camera webpage. The FSM explicitly manages idle → calibrating → typing: when a handsDetected hook flips true, the UI enters calibrating for 10 s (driven by performance.now() inside requestAnimationFrame) and shows a banner with a live progress bar; on timeout it transitions to typing, where we’ll lock the keyboard pose. The module exposes setHandPresence(bool) for the real detector, is resilient to brief hand-detection dropouts, and keeps preview mirroring separate from processing so saved frames aren’t flipped. I also wired lifecycle guards (visibilitychange/pagehide) so tracks stop cleanly, and left stubs to bind the final homography commit at the typing entry.

I’m on schedule. Next week, I’ll integrate this web framework with Yilei’s calibration process: replace the simulated handsDetected with the real signal, feed Yilei’s pose/plane output into the FSM’s “commit” step to fix the keyboard layout, and run end-to-end tests on mobile over HTTPS (ngrok/Cloudflare Tunnel) to verify the calibration→typing flow works in the field.

current webpage view: