Francesca’s Status Report for Oct 25

This week, what I personally have been working on is testing Tesseract. As our goal is to achieve 99% OCR accuracy, we need to heavily test the software first, as part of our risk mitigation strategy.  I researched data bases online to find appropriate (and a lot of) data to test the software on. So far, I’ve been working with FUNSD (Form Understanding in Noisy Scanned Documents) which should be helpful for scanned or printed text, and Tobacco800 which seems to have greater font variance and even handwritten components. So far, I’ve been working on writing the script to run these, and I would guess that I  should be mainly wrapped up with testing Tesseract by the end of the week.

This week,  I also contacted the CMU Director of the Office of Disability Resources, who has given us helpful advice on the feasibility of our design, and we have confirmed that the Western Pennsylvania School for Blind Children is interested in helping us in our testing.

I believe our progress is about on track. We had some delays in ordering due to the AWS outage, but we feel good otherwise, and are ordering ASAP.  This coming week, I will not be attending class on Monday because I will be in Washington, DC attending the American Institute for Medical and Biological Engineering Public Policy Institute, but I’m able to asynchronously work on my Tesseract testing and will coordinate with my group members to make sure we are still making progress.

Leave a Reply

Your email address will not be published. Required fields are marked *