Team’s Status Report for December 6

Most Significant Risks and Mitigation
Our major risk this week continued to be tap detection accuracy. Despite several rounds of tuning thresholds, filtering sudden CV glitches, and improving motion heuristics, the camera-only method still failed to meet our accuracy requirement.

To mitigate this risk, we made a decisive design adjustment: adding external hardware support through pressure-sensitive fingertip sensors. Each sensor is attached to a fingertip and connected to an Arduino mounted on the back of the hand. We use two Arduinos total (one per hand) each supporting four sensors. The Arduino performs simple edge detection (“tapped” vs “idle”) and sends these states to our web app, where we replace our existing tap module to sensor signal→ key → text-editor pipeline. This hardware-assisted approach reduces false negative, which was our biggest issue.

Changes to System Design
Now our system now supports two interchangeable tap-detection modes:

  1. Camera-based tap mode (our original pipeline).
  2. Pressure-sensor mode (hardware-assisted tap events from Arduino).

The rest of the system, including fingertip tracking, keyboard overlay mapping, and text-editor integration, remains unchanged. The new design preserves our AR keyboard’s interaction model while introducing a more robust and controllable input source. We are now testing both methods side by side to measure accuracy, latency, and overall usability, ensuring that we still meet our project requirements even if the pure CV solution remains unreliable.

Unit Tests (fingertip)
We evaluated fingertip accuracy by freezing frames, then manually clicking fingertip positions in a fixed left-to-right order and comparing them against our detected fingertip locations over 14 valid rounds (10 fingers each). The resulting mean error is only ~11 px (|dx| ≈ 7 px, |dy| ≈ 7 px), which corresponds to well under ¼ key-width in X and ½ key-height in Y. Thus, the fingertip localization subsystem meets our spatial accuracy requirement.

We also conducted unit tests for calibration by timing 20 independent calibration runs and confirming the average time met our ≤15 s requirement.

System Tests 
We measured tap-event latency by instrumenting four timestamps (A–D) in our pipeline: tap detection (A), event reception in app.js(B), typing-logic execution (C), and character insertion in the text editor (D). The result is 7.31ms, which is within expected timing bounds.
A→B: 7.09 ms
A→C: 7.13 ms
A→D: 7.31 ms
B→D: 0.22ms

For accuracy, we performed tap-accuracy experiments by collecting ground-truth taps and measuring detection and false-positive rates across extended typing sequences under controlled illuminance values (146, 307, and 671 lux).

  • Tap detection rate = correct / (correct + undetected) = 19.4%
  • Mistap (false positive) rate = false positives / (correct + undetected) = 12.9%

Team’s Status Report for November 22

Most Significant Risks and Mitigation
This week, our main challenge came from tap detection instability. If the sensitivity is too high, the system picks up random glitches as taps; if we reduce sensitivity, normal taps get missed while glitches still sometimes pass through. Overall, it’s still difficult for the model to reliably distinguish between a real tap, natural hand movement, and a sudden CV glitch.
To mitigate this, we worked on two short-term fixes:
1. Filtering “teleport” motion — when fingertip coordinates jump too fast, we now label these as glitches and discard the frames.
2. Re-tuning tap sensitivity — we are testing a middle-ground threshold that keeps normal taps detectable without letting small jitters trigger fake keys.

Changes to System Design
While we continue tuning the glitch-filtering pipeline, we also started researching a new design direction: Reconstructing approximate 3D finger movement from the camera stream.
The idea is that true taps correspond to vertical motion toward the desk, whereas random movement is usually horizontal or diagonal. If we can estimate whether a finger is moving “downward” vs “across,” tap detection becomes much more robust.

Schedule Changes
We may need more time for testing and tuning, so we plan to convert our Week 11 slack week into a testing/verification week.

Team’s Status Report for November 15

Most Significant Risks and Management
The main risk we identified this week is that our original test plan may not be sufficient to convincingly demonstrate that the system meets its performance requirements. In particular, the earlier accuracy and usability tests did not clearly separate natural human typing errors from errors introduced by our system, and the single-key tap test was too basic to represent realistic typing behavior. To manage this, we reframed our evaluation around within-participant comparisons, where each user types comparable text using both our virtual keyboard and a standard keyboard. This paired design allows us to interpret performance differences as properties of our system, while retaining the single-key tap test only as a preliminary verification step before more comprehensive evaluations.

Design Changes, Rationale, and Cost Mitigation
No major changes were made to the core interaction or system architecture; instead, our design changes focus on verification and validation. We shifted from treating accuracy and usability as absolute metrics for our system alone to treating them as relative metrics benchmarked against a standard keyboard used by the same participants, making the results more interpretable and defensible. We also moved from a single basic accuracy test to a layered approach that combines the original single-key tap check with a more realistic continuous-typing evaluation supported by detailed logging. The primary cost is the additional effort to implement standardized logging and paired-data analysis, which we mitigate by reusing prompts, using a common logging format, and concentrating on a small number of carefully structured experiments.

Updated Schedule
Because these changes affect how we will test rather than what we are building, the overall scope and milestones are unchanged, but our near-term schedule has been adjusted. Our current priority is to complete integration of all subsystems and the logging infrastructure so that the system can generate the detailed event data required for the revised tests. Once logging is in place, we will run internal pilot trials to verify that prompts, logging, and analysis scripts work end to end, followed by full accuracy and usability studies in which participants use both our virtual keyboard and a baseline keyboard. The resulting paired data will then be used to assess whether we meet the performance requirements defined in the design report.

Validation Testing Plan
Accuracy testing: Each participant will type two similar paragraphs: one using our virtual keyboard and one using a standard physical keyboard. In each condition, they will type for one minute and may correct their own mistakes as they go. We will record the typing process and, because we know the reference paragraph, we can infer the intended key at each point in time and compare it to the key recognized by the system. We will then compute accuracy for both keyboards and compare them to separate user error from errors introduced by our keyboard. Our goal is for the virtual keyboard’s accuracy to be within 5 percentage points of each participant’s accuracy on the physical keyboard.
Usability / speed testing: For usability, each participant will again type similar paragraphs on both the physical keyboard and our virtual keyboard. In both conditions, they will type for one minute, correcting mistakes as needed, and are instructed to type as fast as they comfortably can. We will measure words per minute on each keyboard. For users whose typing speed on the physical keyboard is at or below 40 WPM, we require that their speed on the virtual keyboard drop by no more than 10%. For users who naturally type faster than this range, we will still record and analyze their speed drop to understand how performance scales with higher baseline typing speeds.

Team’s Status Report for November 8

Our most significant current risk is inaccurate tap detection, which can lead to mis-taps. Right now, taps are inferred largely from the vertical displacement of a fingertip. This causes two main failure modes: when one finger taps, a neighboring finger may move slightly and be incorrectly interpreted as a second tap, and when the entire hand shifts forward, the fingertips show a large vertical-displacement-like motion, so a tap is detected even though no single finger has actually tapped. To manage this risk, we added a per-hand cooldown between taps so that each hand maintains a short cooldown window after a detected tap. Further candidate taps from the same hand are suppressed during this period, which reduces false second taps caused by passive finger motion. We plan to introduce a user-adjustable tap sensitivity slider that controls the cooldown duration so users can tune the system to their own typing style and speed. To manage the second failure mode, we plan to monitor the landmarks on the back of the hand in addition to the fingertip. If both fingertip and back-of-hand landmarks move together, we will treat this as whole-hand motion and discard that tap candidate, whereas if the fingertip moves relative to a relatively stable back of the hand, we will accept it as a true tap.

Previously, our Top/Bottom slider only horizontally compressed the top of the keyboard, which meant that perspective was approximated along one dimension only and the top rows could appear misaligned relative to a real keyboard. We now apply a per-row vertical scaling derived from the same top-bottom ratio so that both width and height follow a consistent perspective model.

We don’t have any schedule changes this week.

Team’s Status Report for November 1

Most Significant Risks and Management
This week, we identified a new risk concerning hover versus contact ambiguity (the system’s difficulty in determining whether a user’s fingertip is truly resting on the keyboard plane or merely hovering above it.) This issue directly affects tap accuracy, as vertical finger movements in midair could be misinterpreted as valid keystrokes. To mitigate this, we refined our tap detection mechanism by incorporating gesture-based state validation. Specifically, the algorithm now verifies that every tap motion begins with an “in-air” finger gesture and ends with an “on-surface” gesture, as determined by the relative positions and flexion of the fingertips. Only if this air-to-surface transition coincides with a rapid downward motion is the tap event confirmed.
This approach reduces false positives from hovering fingers and improves robustness across users with different hand postures.

Changes to System Design
The system’s tap detection algorithm has been upgraded from a purely velocity-based method to a state-transition-driven model. The previous implementation relied solely on instantaneous speed, distance, and velocity drop thresholds to identify tap events, which worked well for clear, strong taps but struggled with subtle finger motions or resting gestures. The new design introduces two additional layers:

  1. Finger State Classification: Each fingertip is now labeled as either on-surface or in-air based on its relative position, curl, and height within the calibrated plane.

  2. State Transition Validation: A tap is recognized only when a downward motion sequence transitions from in-air → on-surface within a short temporal window.

By coupling spatial and temporal evidence, the system should be able to differentiate between deliberate keystrokes and incidental finger motion.

Updated Schedule
Hanning’s original plan for this week was to implement the keystroke event handling module. However, since fingertip output data is not yet fully stable, that task is postponed to next week. Instead, Hanning focused on developing the copy-paste function for the text editor and assisted in integrating existing components of the computer vision and calibration pipelines.

Team’s Status Report for October 25

Most Significant Risks and Management
The primary project risk was that the HTML/JavaScript web app might not run on mobile devices due to camera access restrictions—mobile browsers require a secure (HTTPS) context for getUserMedia. This could have blocked essential testing for calibration, overlay alignment, and latency on real devices. The team mitigated this risk by deploying the app to GitHub Pages (which provides automatic HTTPS), converting all asset links to relative paths, and adding a user-triggered “Start” button to request camera permissions. The solution was verified to load securely via https:// and successfully initialize the mobile camera stream.

Changes to System Design
The system has transitioned to a Gradient-Based Tip Detection method, addressing the core limitations of the previous Interest Box and Centroid Method. The earlier approach calculated the contact point by finding the pixel centroid within a fixed Region of Interest (ROI) after applying a single color threshold. While effective in controlled conditions—especially with stable lighting and a dark background—its performance degraded significantly under variable lighting or background changes. This dependency on a fixed threshold required constant manual tuning or complex adaptive algorithms. The new method overcomes these issues by projecting a search vector and detecting sharp color gradients between the fingertip and surface using a robust combination of RGB and HSL data. Although initially explored, the method’s improved calculation of color transitions now makes it more consistent and reliable. By focusing on the physical edge contrast, it achieves stable fingertip contact detection across diverse environments, enhancing both accuracy and practicality.

Updated Schedule
Joyce has spent additional time refining the fingertip detection algorithm after finding that the previous method was unstable under certain lighting and background conditions. Consequently, she plans to compress Task 4 (Tap Detection) into a shorter period and may request assistance from teammates for testing to ensure that project milestones remain on schedule.

 

Team’s Status Report for October 18

Most Significant Risks and Management

The primary risk identified was Fingertip Positional Accuracy, specifically along the keyboard’s depth (Z-axis). Previous geometric methods yielded significant positional errors, which threatened the system’s ability to distinguish between adjacent keys (e.g., confusing Q, A, or Z) and thus made reliable typing impossible. To manage this risk, our contingency plan was the rapid implementation of the Pixel Centroid Method. This technique calculates the statistically stable Center of Mass (Centroid) of the actual finger pixels, providing a highly stable point of contact that successfully mitigates the positional ambiguity risk.

Changes to System Design

A necessary change was introduced to the Fingertip Tracking Module design. We transitioned from geometric projection methods to an Image Processing Refinement Pipeline (the Pixel Centroid Method). This was required because the original methods lacked the vertical accuracy needed for key mapping. The cost was one additional week of time, but this is mitigated by the substantial increase in tracking stability and accuracy, preventing major integration costs down the line.

Updated Schedule

No significant changes have occurred to the overall project schedule.

Part A global factors
Across developing regions, many users rely primarily on smartphones or tablets as their only computing devices, yet struggle with slow or error-prone touchscreen typing due to small screen sizes or limited literacy in digital interfaces. By using the built-in camera and no additional hardware, our system provides a universally deployable typing interface that can work on any flat surface. It’s more practical for students, remote workers, and multilingual users worldwide. For instance, an English learner in rural India could practice typing essays on a table without needing a Bluetooth keyboard, or a freelance translator in South America could work comfortably on a tablet during travel. Because all computation happens locally on-device, the system can function without internet access, which is essential for regions with limited connectivity, while also ensuring user privacy. This design supports equitable access to digital productivity tools and aligns with sustainable technology trends by reducing electronic waste and dependence on specialized hardware.

Part B cultural factors
HoloKeys is designed to fit how people learn and use technology in classrooms, libraries, community centers, and travel settings. Because QWERTY is the most widely used layout, the interface aligns with familiar motor patterns and reduces training time. Instructions and tutorials are written in plain, idiom-free text that can be easily translated into other languages. Visual overlays are adjustable (font size, key size, contrast), allowing users to tune the interface to their needs. Because expectations around camera use vary, HoloKeys defaults to privacy-forward behavior: clear camera active indicators, no recording or image retention by default, and concise explanations of how and why the camera is used.

Part C environmental factors
Unlike traditional hardware keyboards, our solution requires minimal physical manufacturing, shipping, or disposal, thereby reducing material waste and overall carbon footprint. The system relies primarily on existing mobile devices, with only a small stand or holder as an optional accessory. This holder can also serve as a regular phone or tablet stand, further extending its lifespan and utility. By minimizing the need for new electronic components and leveraging devices users already own, our design helps reduce electronic waste and promotes more sustainable technology practices.

Part A was written by Hanning Wu, part B was written by Yilei Huang and part C was written by Joyce Zhu.

Team’s Status Report for October 4

We made a design adjustment: The camera needs to sit at a different angle than originally planned. To find a suitable angle for our current gesture recognition model, we’ll first use an adjustable, angle-changeable device holder to test several angles, then commit by purchasing or fabricating a fixed-angle holder once we identify the ideal angle.
Current risk we found: Mobile devices couldn’t open the HTML directly (no real web origin, only Microsoft edge can open, but blocks the camera access)
Mitigation: We now host the page on a local server on the laptop and connect via the phone over the LAN (real origin/permissions work), with HTTPS or a tunnel available if needed.

Schedule updates: Junyu scheduled an additional week for fingertip recognition shifts out because the this task takes longer than expected since we want to test multiple different methods to ensure accuracy, and fingertip detection is curial to the accuracy of key detection. Hanning’s task for week3 and week4 switches to align with Yilei’s calibration process. Yilei reports no schedule change.

current schedule:

Team’s Status Report for September 27

The main risk is the reliability of hand detection using MediaPipe. When the palm is viewed nearly edge-on (appearing more like a line than a triangle), the detected position shakes significantly or the hands may not be detected at all, which threatens accurate fingertip tracking. To manage this, we are tilting the camera to improve hand visibility and applying temporal smoothing to stabilize landmark positions. We have also recently updated the design to incorporate vibration detection into the tap detection pipeline, since relying on vision alone can make it difficult to distinguish between hovering and actual keystrokes.

Part A public health, safety or welfare

Our product supports public welfare by improving accessibility and comfort in digital interaction. Because it operates entirely through a camera interface, it can benefit users who find it difficult to press down physical keys due to mobility, dexterity, or strength limitations. By requiring no additional hardware or forceful contact, the system provides a low-effort and inclusive way to input text. In terms of psychological well-being, all processing is performed locally on the user’s device, and no video frames or images are stored or transmitted. This protects personal privacy and reduces anxiety related to data security or surveillance. By combining accessibility with privacy-preserving design, the system enhances both the welfare and peace of mind of its users.

Part B social factors

Our virtual keyboard system directly addresses the growing social need for inclusive, portable, and accessible computing. In many educational and professional settings—such as shared classrooms, libraries, and public workspaces—users must often type on the go without carrying physical hardware, which may be costly, impractical, or socially disruptive. By enabling natural typing on any flat surface, our design reduces barriers for mobile students, low-income users without access to external peripherals. For example, a commuter can take notes on a tray table during a train ride, or a student with limited finger dexterity can type with adaptive finger placement during lectures. Socially, this technology supports a more equitable digital experience by removing dependency on specialized devices, promoting inclusivity in both educational and workplace contexts. Moreover, it also respects users’ privacy by running entirely on-device and not transmitting camera data to the cloud.

Part C economic factors
As a web app, HoloKeys meets the need for portable, hardware-free typing while minimizing costs for both users and providers. Users don’t buy peripherals or install native software, they simply open a URL. This shifts total cost of ownership away from hardware toward a service with negligible marginal cost. This lowers adoption barriers for students, travelers, and anyone for whom carrying a keyboard is impractical. Additionally, HoloKeys may modestly substitute for portable Bluetooth keyboards but is largely complementary to laptops; its primary use cases are phone-and-tablet-first contexts where a full laptop is unnecessary or inconvenient.

Part A was written by Joyce Zhu, part B was written by Hanning Wu and part C was written by Yilei Huang.

Team Status Report for September 20

This week (ending Sep 20) we aligned on a web application as the primary UI, with an optional server path only if heavier models truly require it. We’re prioritizing an in-browser pipeline to keep latency low and deployment simple, while keeping a small Python fallback available. We also validated hand detection on iPad photos using MediaPipe Hands / MediaPipe Tasks – Hand Landmarker and found it sufficient for early fingertip landmarking.

On implementation, we added a simple browser camera capture to grab frames and a Python 3.9 script using MediaPipe Hands to run landmark detection on those frames. The model reliably detected one or two hands in our test images and produced annotated outputs.