November 2018

November 17, 2018November 18, 2018

Weekly Status Report #9: 11/10 – 11/17

All:

This week we visited the library together and selected more books to test on. In the coming few weeks we will be testing on the different books to select which to select for our final demo.

Celine:

This week I focused on enabling SSH on my Windows computer, where I have been running and testing the runOCR code, and optimizing that code. From testing with an Ethernet cable connecting my laptop and the Raspberry Pi, I’ve found that the Ethernet cable eliminates the download/upload time of the images and text files. However I have only executed these commands from my laptop to the Pi, not the other way around. Over the course of the week I have been looking up how to setup OpenSSH on my laptop and make it available to the Pi to connect, over Ethernet or internet. Tomorrow I will be going to the lab to test uploading/downloading data from the Raspberry Pi to my laptop.

After recalling and reexamining some notes from previous signals classes, I down-sampled the image to promote faster computation time for the dewarping of the image. Where at full scale the image once took more than 100 seconds to fully process, now it only takes a little more than 30 seconds, and this is for the case when the page set is full of text.

There is no decrease in accuracy, too:

Page sets with full text have more lines in them, which in turn produce more data points to be analyzed. By down-sampling/decimating the image though, there are fewer data points over all! Thus fewer parameters need to be processed during parameter optimization, as can be seen in these snapshots:

Although this simple fix cut down the computation time around 60%, I will still investigate the remainder of the code to see where optimizations can be made, mostly in the removal of images and headers. I will also test this optimized code on the Raspberry Pi tomorrow.

I touched a little bit on using the autocorrect package for Python, and found that it only works on a word-by-word basis, it does not do anything to inputs of more than one word:

As a group, the main focus of this week was how to move from our mid-semester demo to having a final cohesive product in the next three weeks. In the event that the aforementioned decreased processing time is not enough when running on the Pi to make it worth staying on the Pi, then we will still need to have the Pi outsource image processing computation. First we will have it working with outsourcing over Ethernet to my laptop. After that we will move to having it connect to my laptop over WiFi. Once that is working, considering upload/download time will not be that different regardless of the other server, I will look at using AWS for running computations. The advantages of AWS are that we won’t need a physical machine available whenever we want to run the image processing on the Pi and that AWS has its own TTS API called Polly. We will be able to make use of AWS S3 to store our images and text files, and possibly setup an AWS Lambda function that is triggered when we upload a new image and runs the image processing code. I just set up an AWS account today, so I can begin looking into how to set that up soon.

In this next week, I will finish enabling connection over Ethernet, then move to connecting my laptop over WiFi, and at least start trying out AWS, scoping out whether or not it is worth picking up at this point in time. I will also fiddle around a bit with my OCR code to see if I can continue cutting down computation time. Additionally, I will be documenting my code and start preparing materials for our final design paper. Finally, I will collect more text outputs from image samples I will get from the books to quantify the accuracy of the OCR.

Indu:

This week my team and I made plans on what we should do moving forward, and how we are going to make sure all the parts of our project meet our constraints as we get closer to the final demo. I spent this week building the stand and spent a considerable amount of time trying to drill holes into acrylic to screw the camera in. After being unsuccessful with drilling by hand, I used SolidWorks to make a file that could be laser cut, but that was not successful either, and since it is taking too much trouble to do this, I will just stick with a wooden plank that will have screws for holding the camera up.

In the Maker Space, I was able to build the stand, so now it just needs to have the camera attached to it, which will be done by this weekend. I also made a 2nd wheel pivot device to do page flipping in reverse. As of now, both wheel pivot devices are done and can be held firmly into place on the board we are using.

I also ordered extra parts to be able to secure the page down after it has been turned as we have noticed that even when we are able to get a page to be separated and turning, sometimes it stands in the air, and needs something to fully bring it down.

In the coming week I will be looking more into how to use the microphone we ordered to connect to the Pi and how to use it to start up our device, as we mentioned in the stretch goals. I also plan to help Effie with text to speech issues as they arise.

Effie:

This week our class time was spent mostly on the ABET surveys and questions. That said I did spend considerable time outside of class getting the new servo hat for the raspberry pi soldered together and working with our servos as they weren’t ready yet for our demo last week since the hat hadn’t arrived yet. I simplified my code considerably and got the servos turning in series and timed with rotating the motors for the wheels so the operation of the page flipping is pretty much there – now we hopefully will be able to just try out a bunch of speeds/pulse widths on the servos/motors until we find the ideal speeds that make page turning work consistently.

I rewrote the basic threading implementation so that audio can be played at the same time as page flipping and text detection for the next page – but still have some more to do on that next week. I played with setting up my Mac to act as an SSH server so that I can actually scp files over from the pi to my Mac (until now we only could do it the other direction). As such next week I aim to tryout the text-detection code Celine wrote on my Mac (I’ll have to install the dependencies) since maybe running it on MacBook Pro would be the fastest and easiest to deal with (we’ve had trouble with connecting it through windows and it running to slow on the pi itself).

I also looked further into espeak and messed with some other voicings which do sound better, and tried to get Google’s text to speech library working on the pi but was unsuccessful. This coming week I aim to get google text to speech working on my Mac as well – which might make more sense anyways so that we’re effectively offloading the intensive computation (both text detection and text-to-audio production) onto the Mac which can do it much faster than the pi, and leaving the pi for really handling the mechatronics of the project.

Lastly as a group we spent time mapping out our progress and clarified responsibilities for going forward.

November 10, 2018November 11, 2018

Weekly Status Report #8: 11/3 – 11/10

Everyone:

Please see this link for a demo: https://www.youtube.com/watch?v=weT1ZYr_ntI&feature=youtu.be

Celine:

This week was a very busy week for us all! I finally got the opencv installation finished on the pi, but it had to be done in a virtualenv because of the conflicting python 2 and 3 installations on the pi. In the coming week or so I can try to get it working off of the virtualenv, as I’ve read online that virtual environments can slow down processes. Nonetheless, this setup worked for our purpose of being able to demonstrate our working page flipping and reading system.

For my individual work, I was able to implement the header removal that I mentioned last post by doing a connected components analysis (numbers all areas that are of the same intensity, 1, with a label corresponding to the area number) and removed everything besides the largest connected component, which is always the text. I know this is true for this book series because the book pictures do not cut the text into pieces, and I remove the images before running and dewarping or processing in the keepText function.

I fixed the issue where the script would hang indefinitely while running. It turned out to be a stdout flushing error. After adding flush=True as a parameter into my print statements, I no longer see problems with indefinite hanging while running the script.

To my code I’ve also begun adding debugging and timing statements so that I can work on optimizing the parts that take the longest to run. I know that the most time consuming part of the code now is the optimization of projection parameters in the dewarping phase of the image processing. The entire process lasts at most two minutes, which happens when running on a page set with two pages full of text, and the dewarping takes up around 80 to 100 seconds of this 120 seconds.

Because of the slow run-time of the OCR script on the Pi, there are two solutions I can implement to solve this problem: have the Pi ssh onto my laptop and run the code there, or to push the input image to AWS S3 and either trigger an AWS lambda function that would run the OCR code, or have it ssh and run the code on an AWS EC2 instance.

For the next few weeks I will be focusing on optimization of the OCR and enabling ssh on my laptop. As a team we will discuss whether or not we want to move our processing to the cloud by weighing the additional benefits running in the cloud might bring.

Effie:

This week I mostly spent on integrating all the moving components and helping my partners out to bring everything together for demo. In particular after having several path/dependency issues to resolve, I had to rewrite a good chunk of my code from last week to work properly with the text-detection script Celine worked on (by capturing images, spawning a new thread that hangs while waiting for a text file to be created based off that image, spawning another thread to output the audio, all while still running the page flipping. I wrote a basic version which worked for the demo – but still have some issues of threading and working with Espeak (our current choice of text-to-audio library that is, well, temperamental) to hammer out this week. As the week progresses, I am to have my code much cleaner and simpler (since I quickly scrapped together a current version to showcase for demo), and will look more into other text-to-speech libraries since Espeak sounds funny.

Indu:

On Monday, with the help of the employees at the Maker Space, I was able to finalize building of the wheel pivot device to separate pages. On Monday Effie and I then began testing out the device, but the servo motor wasn’t working with the Raspberry Pi. We all then tried again on Tuesday, but it did not work, so we are waiting for the Pi hat to come in so that we can use that with the motor, which will hopefully lead to the motor working on the device. Currently it can move up and down by hand, so it should work with the motor, as I looked up the torque and confirmed with a Maker Space employee that the torque statistics mean that it should work based on the dimensions I calculated for the device.

On Tuesday we also worked on our timeline for the rest of the semester, so I will be spending the rest of the semester building out all the parts for final demo, such as another wheel pivot device for the other side of the book (to flip pages backwards), a height adjustable stand, and a simpler page turning mechanism since the conveyor chain mechanism appears to be not consistent.

Thus, after the demo I went with Effie to get acrylic pieces at the IDEATE work space in Hunt Library. Later on in the week, Celine assisted me in the Maker Space and took more videos of me working, such as me cutting the acrylic.

Please see this video: https://www.youtube.com/watch?v=jl1ilr-w4_A&feature=youtu.be

November 4, 2018

Weekly Status Report #7: 10/27 – 11/3

Celine:

This week, I was to further examine the dewarping function and how to implement it myself. I came up with my own algorithm, but so far it is not completely functional. Because we have a currently working script for dewarping, I will leave it be and move on to other problems that I have been identifying. These problems are:
1) Removing images from the page, as the dewarping doesn’t seem to deal well with images. When given a page like this one:

the dewarping algorithm identifies contours within the image, but because of the complexity of the figure, some of the contours erroneously end up with an area of zero. This is an issue because to find the centroid of each contour, there is a division by the value of the area.

2) Skipping the “Chapter” or page title that shows up on each page. These often get misread (i.e. Monday with a Mad Genius = b’MWZJ/ WM a (91/611 genius) because of their fancier scripts.

3) correcting words that have a space inserted (i.e. “J ack”, “N either”)

If these issues are corrected, then the image processing/OCR problem will be relatively complete. My working/proposed solutions are as such:

1) By dilating the image (binarizing, inverting, and convolving with a non-unit rectangle), I am able to connect up the largest areas of high frequency content in the image, such as images and page folds. I then identify the connected components in the image using a builtin openCV function. This function efficiently finds and labels areas of connected components so that each component can be found completely by searching the image for it’s label. Large areas can be zeroed out, producing a result like:

2) a) A histogram along the vertical axis can be taken to identify where the lines of text in an image are. If the line of text is relatively far away from the other lines of text, then the selected line is most likely not a line of text that we want to keep, as the title is usually further away from the text body than the body lines are from each other.

b) Identify the starting position of each line, as the most common starting points will be the left margin, or the left margin and a tab. Anything starting after this is not a valid body text line.

3) a) Use autocorrect

b) Identify solo characters and identify if the character is valid as a solo character.

I also attempted to install all the necessary packages needed to run my code on the Raspberry Pi, but I ran into some trouble with conflicting package installations uninstalling one another. Additionally, when attempting to link all of the pieces of my code together, there were many conflicting input and output types, specifically around images being BRG or binarized.

This coming week, mainly tomorrow, I will focus on solving the second issue listed above, and make some attempts to solve the third. I will also compile all of the scripts I’ve written into one, to be used during the demo to showcase a complete run through of the OCR portion of our project. I will also work with Effie to get the installations done on the Pi. Our group goal this week is to be ready for the demo on Wednesday!!

Indu:

This week I got back the gear with the mounting hub inside it from the Maker Space employee who performed this service for me, after I had previously described to him in detail how I was planning on doing this. I then attached the gear with the mounting hub to the stepper motor, and my team and I were then able to make the conveyor chain move. The chain, however had issues turning smoothly due to not being flexible in all parts of the links. This week I am trying to resolve this by lubricating the chain with coconut oil, which so far appears to be loosening the chain.

In terms of the stand I am still in the process of building the mount with wood that will be attached to the magic arm stand that will be borrowed from Hunt Library for testing, but this should be stable by this weekend. In terms of the wheel pivot device, I have cut out all the pieces needed for it so far, but need to use the lathe to be able to connect the motor driving the pivot to the wood. As I do not have access/training to the machine shop, a Maker Space Employee informed me that he would be able to use the instrument for me on Monday morning. Thus, I hope to get that completed on Monday and complete the building of the device on Monday so that my team and I can test out all the components prior to Wednesday, when we are demoing.

Please see this link for the progress I’ve made this week: https://youtu.be/2qwMH35M7o4

Effie:

This week I worked on configuring a bluetooth speaker to interact with the pi which was actually a lot harder than I was expecting and researched with Celine a few text-to-audio open-source libraries to use. I tried out implementing each of them and for purposes of the demo I chose to use the Espeak library which can nicely take text and create an audio file which I can then play out through the speaker. I like this approach because in addition to reading the book out loud – I think it will be nice if we can keep all the pictures of the pages as well as the audio to construct a pdf and audiobook too while we’re at it! That said the voicing is not so nice so we’ll probably use another library at a later date. Additionally this week I got all my code for operating the servo, dc motor (for wheels), stepper motor (for the conveyer belt), the camera, and now text-to-speech all working together in one script in sync in a nice big loop! I will work on threading over the weekend so that playing the audio (and text detection once we sync that in too) doesn’t stall the operation of the motors. Professor Low noted I should focus this week on working on code for the demo so I will circle back to trying to motorize the zooming/panning on the camera later next week.. again not sure it’s possible but we’ll see!

This week I plan to work with my partners to get my code and current wiring of devices integrated together (both in software with Celine’s text-detection algorithms and in hardware physically with Indu’s stand and page turning arms) nicely for our demo, and to make my code more modular so that post-demo we’ll have an easier time fine-tuning how to operate and time each device.

Please see this video for my progress this week: https://youtu.be/b0slTXKf1DU