Weekly Status Report #7: 10/27 – 11/3

Celine:

This week, I was to further examine the dewarping function and how to implement it myself. I came up with my own algorithm, but so far it is not completely functional. Because we have a currently working script for dewarping, I will leave it be and move on to other problems that I have been identifying. These problems are:
1) Removing images from the page, as the dewarping doesn’t seem to deal well with images. When given a page like this one:

the dewarping algorithm identifies contours within the image, but because of the complexity of the figure, some of the contours erroneously end up with an area of zero. This is an issue because to find the centroid of each contour, there is a division by the value of the area.

2) Skipping the “Chapter” or page title that shows up on each page. These often get misread (i.e. Monday with a Mad Genius = b’MWZJ/ WM a (91/611 genius) because of their fancier scripts.

3) correcting words that have a space inserted (i.e. “J ack”, “N either”)

If these issues are corrected, then the image processing/OCR problem will be relatively complete. My working/proposed solutions are as such:

1) By dilating the image (binarizing, inverting, and convolving with a non-unit rectangle), I am able to connect up the largest areas of high frequency content in the image, such as images and page folds. I then identify the connected components in the image using a builtin openCV function. This function efficiently finds and labels areas of connected components so that each component can be found completely by searching the image for it’s label. Large areas can be zeroed out, producing a result like:

2) a) A histogram along the vertical axis can be taken to identify where the lines of text in an image are. If the line of text is relatively far away from the other lines of text, then the selected line is most likely not a line of text that we want to keep, as the title is usually further away from the text body than the body lines are from each other.

b) Identify the starting position of each line, as the most common starting points will be the left margin, or the left margin and a tab. Anything starting after this is not a valid body text line.

3) a) Use autocorrect

b) Identify solo characters and identify if the character is valid as a solo character.

I also attempted to install all the necessary packages needed to run my code on the Raspberry Pi, but I ran into some trouble with conflicting package installations uninstalling one another. Additionally, when attempting to link all of the pieces of my code together, there were many conflicting input and output types, specifically around images being BRG or binarized.

This coming week, mainly tomorrow, I will focus on solving the second issue listed above, and make some attempts to solve the third. I will also compile all of the scripts I’ve written into one, to be used during the demo to showcase a complete run through of the OCR portion of our project. I will also work with Effie to get the installations done on the Pi. Our group goal this week is to be ready for the demo on Wednesday!!

Indu:

This week I got back the gear with the mounting hub inside it from the Maker Space employee who performed this service for me, after I had previously described to him in detail how I was planning on doing this. I then attached the gear with the mounting hub to the stepper motor, and my team and I were then able to make the conveyor chain move. The chain, however had issues turning smoothly due to not being flexible in all parts of the links. This week I am trying to resolve this by lubricating the chain with coconut oil, which so far appears to be loosening the chain.

In terms of the stand I am still in the process of building the mount with wood that will be attached to the magic arm stand that will be borrowed from Hunt Library for testing, but this should be stable by this weekend. In terms of the wheel pivot device, I have cut out all the pieces needed for it so far, but need to use the lathe to be able to connect the motor driving the pivot to the wood. As I do not have access/training to the machine shop, a Maker Space Employee informed me that he would be able to use the instrument for me on Monday morning. Thus, I hope to get that completed on Monday and complete the building of the device on Monday so that my team and I can test out all the components prior to Wednesday, when we are demoing.

Please see this link for the progress I’ve made this week: https://youtu.be/2qwMH35M7o4

Effie:

This week I worked on configuring a bluetooth speaker to interact with the pi which was actually a lot harder than I was expecting and researched with Celine a few text-to-audio open-source libraries to use. I tried out implementing each of them and for purposes of the demo I chose to use the Espeak library which can nicely take text and create an audio file which I can then play out through the speaker. I like this approach because in addition to reading the book out loud – I think it will be nice if we can keep all the pictures of the pages as well as the audio to construct a pdf and audiobook too while we’re at it! That said the voicing is not so nice so we’ll probably use another library at a later date. Additionally this week I got all my code for operating the servo, dc motor (for wheels), stepper motor (for the conveyer belt), the camera, and now text-to-speech all working together in one script in sync in a nice big loop! I will work on threading over the weekend so that playing the audio (and text detection once we sync that in too) doesn’t stall the operation of the motors. Professor Low noted I should focus this week on working on code for the demo so I will circle back to trying to motorize the zooming/panning on the camera later next week.. again not sure it’s possible but we’ll see!

This week I plan to work with my partners to get my code and current wiring of devices integrated together (both in software with Celine’s text-detection algorithms and in hardware physically with Indu’s stand and page turning arms) nicely for our demo, and to make my code more modular so that post-demo we’ll have an easier time fine-tuning how to operate and time each device.

Please see this video for my progress this week: https://youtu.be/b0slTXKf1DU

Leave a Reply Cancel reply