Introduction and Project Summary

When we watch a video on Youtube, we can now enjoy the benefits of video captioning. However, when multiple speakers are speaking at the same time, these systems lack the ability to distinguish the speech from each speaker, often leading to very confusing captions.

Our group presents EyeHear, a visual-audio device and system. Our device is designed to be light and compact for easy mobility. Our system adopts state-of-the-art visual and audio-processing techniques to produce enhanced real-time videos with captions for each speaker. This allows users to not just hear but see what each speaker is saying at any given time.