Team’s Status Report for September 27

The main risk is the reliability of hand detection using MediaPipe. When the palm is viewed nearly edge-on (appearing more like a line than a triangle), the detected position shakes significantly or the hands may not be detected at all, which threatens accurate fingertip tracking. To manage this, we are tilting the camera to improve hand visibility and applying temporal smoothing to stabilize landmark positions. We have also recently updated the design to incorporate vibration detection into the tap detection pipeline, since relying on vision alone can make it difficult to distinguish between hovering and actual keystrokes.

Part A public health, safety or welfare

Our product supports public welfare by improving accessibility and comfort in digital interaction. Because it operates entirely through a camera interface, it can benefit users who find it difficult to press down physical keys due to mobility, dexterity, or strength limitations. By requiring no additional hardware or forceful contact, the system provides a low-effort and inclusive way to input text. In terms of psychological well-being, all processing is performed locally on the user’s device, and no video frames or images are stored or transmitted. This protects personal privacy and reduces anxiety related to data security or surveillance. By combining accessibility with privacy-preserving design, the system enhances both the welfare and peace of mind of its users.

Part B social factors

Our virtual keyboard system directly addresses the growing social need for inclusive, portable, and accessible computing. In many educational and professional settings—such as shared classrooms, libraries, and public workspaces—users must often type on the go without carrying physical hardware, which may be costly, impractical, or socially disruptive. By enabling natural typing on any flat surface, our design reduces barriers for mobile students, low-income users without access to external peripherals. For example, a commuter can take notes on a tray table during a train ride, or a student with limited finger dexterity can type with adaptive finger placement during lectures. Socially, this technology supports a more equitable digital experience by removing dependency on specialized devices, promoting inclusivity in both educational and workplace contexts. Moreover, it also respects users’ privacy by running entirely on-device and not transmitting camera data to the cloud.

Part C economic factors
As a web app, HoloKeys meets the need for portable, hardware-free typing while minimizing costs for both users and providers. Users don’t buy peripherals or install native software, they simply open a URL. This shifts total cost of ownership away from hardware toward a service with negligible marginal cost. This lowers adoption barriers for students, travelers, and anyone for whom carrying a keyboard is impractical. Additionally, HoloKeys may modestly substitute for portable Bluetooth keyboards but is largely complementary to laptops; its primary use cases are phone-and-tablet-first contexts where a full laptop is unnecessary or inconvenient.

Part A was written by Joyce Zhu, part B was written by Hanning Wu and part C was written by Yilei Huang.

Hanning’s Status Report for September 27

This week I focused on two coding tasks and one deliverable with the team. Early in the week I restructured the camera webpage code into three modules (HTML framework + camera.js + snapshot.js) Fixed local serving issues on specific browser (safari) and makes future integrations easier. I then started implementing a built-in text editor below the video preview (textarea + helper APIs like insertText/pressKey) so that we can type something into a real target. In parallel, I worked with my teammates to complete the design presentation slides (webapp part and testing part) Next week I plan to further work on the text editor and begin basic auto-correction implementations.

Joyce’s Status Report for September 20

This week I selected and validated a hand-detection model for our hardware-free keyboard prototype. I set up a Python 3.9 environment and integrated MediaPipe Hands, adding a script that processes static images and supports two-hand detection with annotated landmarks/bounding boxes. Using several test photos shot on an iPad under typical indoor lighting, the model consistently detected one or two hands and fingertips; failures occasionally occur and more test on failure reasons are needed. Next week I’ll keep editing the script so that the model consistently detect both hands, and then try to frame the landing points of finger tips.

Yilei’s Status Report for September 20

This week I worked on Task 1.1, surface detection and plane fitting. I integrated OpenCV.js into our project and implemented the calibration flow where the user taps four corners of the typing surface. From these taps, I now compute the homography matrix that maps image pixels to the keyboard plane. My progress is on schedule. For next week, I plan to add a minimal overlay to visualize the calibration results and begin preparing the mapping function for Task 1.4.

Team Status Report for September 20

This week (ending Sep 20) we aligned on a web application as the primary UI, with an optional server path only if heavier models truly require it. We’re prioritizing an in-browser pipeline to keep latency low and deployment simple, while keeping a small Python fallback available. We also validated hand detection on iPad photos using MediaPipe Hands / MediaPipe Tasks – Hand Landmarker and found it sufficient for early fingertip landmarking.

On implementation, we added a simple browser camera capture to grab frames and a Python 3.9 script using MediaPipe Hands to run landmark detection on those frames. The model reliably detected one or two hands in our test images and produced annotated outputs.

 

Hanning’s Status Report for September 20

This week I focused on building the camera input pipeline. Early in the week I set up a webpage that requests camera permission and streams live video using the MediaDevices API, with controls to start/stop, pick a camera (front/rear), and a frame loop that draws each frame to a hidden canvas for processing. Later in the week I added single-frame capture. I can now grab the current video frame and export it as a JPEG (via canvas, with optional ImageCapture when available). Next week I plan to write some API to wire these frames into the CV part and begin basic keystroke event prototyping.