Most Significant Risks and Mitigation
Our major risk this week continued to be tap detection accuracy. Despite several rounds of tuning thresholds, filtering sudden CV glitches, and improving motion heuristics, the camera-only method still failed to meet our accuracy requirement.
To mitigate this risk, we made a decisive design adjustment: adding external hardware support through pressure-sensitive fingertip sensors. Each sensor is attached to a fingertip and connected to an Arduino mounted on the back of the hand. We use two Arduinos total (one per hand) each supporting four sensors. The Arduino performs simple edge detection (“tapped” vs “idle”) and sends these states to our web app, where we replace our existing tap module to sensor signal→ key → text-editor pipeline. This hardware-assisted approach reduces false negative, which was our biggest issue.
Changes to System Design
Now our system now supports two interchangeable tap-detection modes:
- Camera-based tap mode (our original pipeline).
- Pressure-sensor mode (hardware-assisted tap events from Arduino).
The rest of the system, including fingertip tracking, keyboard overlay mapping, and text-editor integration, remains unchanged. The new design preserves our AR keyboard’s interaction model while introducing a more robust and controllable input source. We are now testing both methods side by side to measure accuracy, latency, and overall usability, ensuring that we still meet our project requirements even if the pure CV solution remains unreliable.
Unit Tests (fingertip)
We evaluated fingertip accuracy by freezing frames, then manually clicking fingertip positions in a fixed left-to-right order and comparing them against our detected fingertip locations over 14 valid rounds (10 fingers each). The resulting mean error is only ~11 px (|dx| ≈ 7 px, |dy| ≈ 7 px), which corresponds to well under ¼ key-width in X and ½ key-height in Y. Thus, the fingertip localization subsystem meets our spatial accuracy requirement.
We also conducted unit tests for calibration by timing 20 independent calibration runs and confirming the average time met our ≤15 s requirement.
System Tests
We measured tap-event latency by instrumenting four timestamps (A–D) in our pipeline: tap detection (A), event reception in app.js(B), typing-logic execution (C), and character insertion in the text editor (D). The result is 7.31ms, which is within expected timing bounds.
A→B: 7.09 ms
A→C: 7.13 ms
A→D: 7.31 ms
B→D: 0.22ms
For accuracy, we performed tap-accuracy experiments by collecting ground-truth taps and measuring detection and false-positive rates across extended typing sequences under controlled illuminance values (146, 307, and 671 lux).
- Tap detection rate = correct / (correct + undetected) = 19.4%
- Mistap (false positive) rate = false positives / (correct + undetected) = 12.9%
