This week I worked primarily on the poster, presentation slides, and video while final improvements were being made by my team members – Jenny improved the gesture detection and Zacchaeus added an eraser feature and additional colors. For my part of the video, I talked about our evaluation process and tradeoffs we made as the semester unfolded and recorded it. This week I will also be adding those sections to our final paper after making the necessary updates and changes to our paper.
Sebastien’s Status Report for 5/1/2021
Over the past two weeks I transitioned all of the existing code to use mediapipe instead of the pipeline I wrote from scratch using boost. Even though mediapipe is written almost entirely in C++, only the python api is documented, so a lot of the work was looking through the source code to figure out what was actually going on and figuring out how to pipe the existing hand pose estimator into our own drawing subsystem. I finally got everything together on tuesday, and since then I’ve added an input stream that reads key-presses so that we can add more drawing modes. When I did that, the pipeline came to a screeching halt. It turns out it was an issue with how mediapipe synchronizes the pipelines it is used to be built. After spelunking in the source again I figured out how to properly synchronize the graph so that problem was fixed. Finally I added a mode for drawing straight lines.
Sebastien’s Status Report for 4/3/2021
Earlier this week we got together and finally integrated everything together in C++, which entailed working out some dependency management troubles with my teammates which use different linux distributions. Though probably better to figure out that stuff now than latter. Our meeting with Professor Kim earlier this week was elucidating to the fact that “draw on screen” is far more useful than “draw on camera input”. So I spent some time reading the X11 manual to see how best to get a feed of what’s being drawn onto the screen. At first glance the `getImage` function seemed to provide that functionality but unfortunately it’s quite slow, as it blocks the rendering process while copying the entire frame, including transparency. Instead it turns out it’s faster to just create a secondary consumer of X11 input from the UI and read that directly into a CV matrix. I have something that *mostly* works – the images are still a bit messed up because the field ordering in X11’s RGB image format is different than OpenCV’s. Additionally I abstracted away our input so we should be able to swap between different input sources while the application is running.
Sebastien’s Status Report for 3/27/2021
This week I finished the pipeline in Rust, but unfortunately was having a lot of difficulty using existing wrappers around video4linux to output to the virtual camera device, so I ended up just rewriting it all in C++. Once there, I had to learn about various color pixel formats and figure out how to configure the output device with the right format, as the output from openCV is a matrix, but in the end the virtual camera device is a file descriptor – so I had to pick a format that openCV knew how to convert to but that also did not require re-ordering values in the matrix’s data buffer before writing it to file output. Now it works, so I can zoom into class through a virtual camera called “Whiteboard Pal” 😀
Sebastien’s Status Report for 3/13/2021
This week I implemented a significant portion of the our system’s pipeline, which is more or less a wrapper of threads and synchronization around the CV and drawing code (the part that maintains a mask of pixels to be applied to every frame). I quickly discovered that build, dependency management, and a lot of other things are quite clunky and tedious when using C++ compared to Rust, a language I am more comfortable with, so I used Rust for this instead. But since OpenCV itself is written in C++, I wrote some foreign-function-interface (FFI) bindings and a well-defined (and typed 😀) function signature for our CV models, which will be written in C++, to implement. In other words, the pipeline code can simply call functions written in C++ that perform any CV / ML tasks using OpenCV and return their respective outputs. And we can use Rust’s wonderful build, dependency management, installation, and testing tool, cargo, to compile and link the C++ code as well without any makefiles or mess of headers.
During the process I gathered more specific details about the each of the system’s functional blocks, how exactly to do the thread synchronization, what parameters they take, and what data structures they use – which we can include in our design report, though all of these things may change in the future.
Right now the C++ model functions are just dummies that always return the same result, since right now my focus is making sure that we can get from a camera feed of frames to ((x,y), gesture_state) pairs to a virtual loopback camera, and to have it be fast and free of concurrency bugs. At this point I’ve got the FFI (mostly) working and a rough first pass at thread synchronization working as well. Next week I’ll be fixing that minor FFI bug and working on the loopback camera.
Sebastien’s Status Report for 3/6/2021
This week I mostly spent time reading about how Linux and Mac handle virtual devices and mentally cataloguing OpenCV’s very large set of rather useful API’s. A lot of this was because we needed to make a precise decision about what we were doing and to some extent there wasn’t much of a consensus at first, so we all spent some time doing some more reading and learning about each of the possibilities and how much time they would take. In particular, I was looking into Professor Kim’s suggestion of being able to draw straight onto the screen and then pipe the result into zoom (or wherever in theory), which means a virtual camera interface. On linux, there’s actually a kernel module, a part of the Video 4 Linux project, that creates loopback devices, which are more or less a file that a program can write frame data to that has a corresponding file from which said data can be read by other programs. Mac has a dedicated library that has far less documentation and it seems far less simple to implement, so we decided to stick to Linux as our targeted platform and write the entire thing in C++. I also created an updated software architecture diagram for the system, which as of now will be a pipeline consisting of 4 threads that use message passing channels from the boost library for synchronization.
Sebastien’s Status Report for 2/27/21
This week, after I made the some notes, spent some times thinking about what I was going to say, and gave the presentation, on Friday, Zacchaeus, Jenny, and I went over feedback from the presentation and agreed that though the course wants quantifiable metrics, ultimately this project’s use case is rather hard to quantify. Aside from latency and framerate numbers, which are more concerned with the associated utility that can render the product unusable, there aren’t really any good and sane ways to measure anything that falls under the umbrella of “accuracy”. The former can be improved by optimizing a solution, but that means having a solution that’s “good” in the first place, and “good” in this case is hard to measure. So we more or less decided that in the end it’s best that we just iterate as fast as possible. So to those ends I opened up a team github repo and implemented a suite of abstractions that should make it as easy as possible to iterate on models and their associated pre-processing steps. I also started implementing a tool for collecting and labeling raw frame data for gesture detection, which is most likely the harder part of this.
Sebastien’s Status Report for 2/20/2021
This week the team spent a good deal of time going back/forth about narrowing down on a particular idea for our project after receiving some feedback from professor Kim. We considered what could be usable in conjunction with technical feasibility and ended up deciding to build an “air draw” whiteboard in the form of a “server” that consumes camera frames and produces a “drawing feed” that’s something along the lines of a sequence of (x, y, isDrawing) tuples sent over an inter-process-communication channel. A generic module like this allows many applications to use it. We want to build an actual UI with it, but we also want to focus on the whiteboard server itself rather than building a UI – luckily it’s possible to open a UNIX-domain socket in a browser via the WebAssembly System Interface (WASI), so right now I’m pretty sure we’re going accomplish this by forking excalidraw and adding a WebAssemply stub that reads the “drawing feed” and calls relevant javascript functions to draw on the canvas. I also set up a team notion as a place to stay organized that’s easier to use than google docs and has task-management built-in, including Kanban boards that are viewable as a Gantt chart.