aesop – Just another Carnegie Mellon University: ECE Capstone Projects site

December 14, 2018

Final Post

In this post, please find the link to our project demonstration videos and our project poster.

Videos: https://www.youtube.com/playlist?list=PLHx3LBXlFOFvlAct12RH7VJdlIcVhgwt9

Thank you to all of the Teaching Assistants and Professors of 18-500, and all of the people who gave us their support along the way!

November 17, 2018November 18, 2018

Weekly Status Report #9: 11/10 – 11/17

All:

This week we visited the library together and selected more books to test on. In the coming few weeks we will be testing on the different books to select which to select for our final demo.

Celine:

This week I focused on enabling SSH on my Windows computer, where I have been running and testing the runOCR code, and optimizing that code. From testing with an Ethernet cable connecting my laptop and the Raspberry Pi, I’ve found that the Ethernet cable eliminates the download/upload time of the images and text files. However I have only executed these commands from my laptop to the Pi, not the other way around. Over the course of the week I have been looking up how to setup OpenSSH on my laptop and make it available to the Pi to connect, over Ethernet or internet. Tomorrow I will be going to the lab to test uploading/downloading data from the Raspberry Pi to my laptop.

After recalling and reexamining some notes from previous signals classes, I down-sampled the image to promote faster computation time for the dewarping of the image. Where at full scale the image once took more than 100 seconds to fully process, now it only takes a little more than 30 seconds, and this is for the case when the page set is full of text.

There is no decrease in accuracy, too:

Page sets with full text have more lines in them, which in turn produce more data points to be analyzed. By down-sampling/decimating the image though, there are fewer data points over all! Thus fewer parameters need to be processed during parameter optimization, as can be seen in these snapshots:

Although this simple fix cut down the computation time around 60%, I will still investigate the remainder of the code to see where optimizations can be made, mostly in the removal of images and headers. I will also test this optimized code on the Raspberry Pi tomorrow.

I touched a little bit on using the autocorrect package for Python, and found that it only works on a word-by-word basis, it does not do anything to inputs of more than one word:

As a group, the main focus of this week was how to move from our mid-semester demo to having a final cohesive product in the next three weeks. In the event that the aforementioned decreased processing time is not enough when running on the Pi to make it worth staying on the Pi, then we will still need to have the Pi outsource image processing computation. First we will have it working with outsourcing over Ethernet to my laptop. After that we will move to having it connect to my laptop over WiFi. Once that is working, considering upload/download time will not be that different regardless of the other server, I will look at using AWS for running computations. The advantages of AWS are that we won’t need a physical machine available whenever we want to run the image processing on the Pi and that AWS has its own TTS API called Polly. We will be able to make use of AWS S3 to store our images and text files, and possibly setup an AWS Lambda function that is triggered when we upload a new image and runs the image processing code. I just set up an AWS account today, so I can begin looking into how to set that up soon.

In this next week, I will finish enabling connection over Ethernet, then move to connecting my laptop over WiFi, and at least start trying out AWS, scoping out whether or not it is worth picking up at this point in time. I will also fiddle around a bit with my OCR code to see if I can continue cutting down computation time. Additionally, I will be documenting my code and start preparing materials for our final design paper. Finally, I will collect more text outputs from image samples I will get from the books to quantify the accuracy of the OCR.

Indu:

This week my team and I made plans on what we should do moving forward, and how we are going to make sure all the parts of our project meet our constraints as we get closer to the final demo. I spent this week building the stand and spent a considerable amount of time trying to drill holes into acrylic to screw the camera in. After being unsuccessful with drilling by hand, I used SolidWorks to make a file that could be laser cut, but that was not successful either, and since it is taking too much trouble to do this, I will just stick with a wooden plank that will have screws for holding the camera up.

In the Maker Space, I was able to build the stand, so now it just needs to have the camera attached to it, which will be done by this weekend. I also made a 2nd wheel pivot device to do page flipping in reverse. As of now, both wheel pivot devices are done and can be held firmly into place on the board we are using.

I also ordered extra parts to be able to secure the page down after it has been turned as we have noticed that even when we are able to get a page to be separated and turning, sometimes it stands in the air, and needs something to fully bring it down.

In the coming week I will be looking more into how to use the microphone we ordered to connect to the Pi and how to use it to start up our device, as we mentioned in the stretch goals. I also plan to help Effie with text to speech issues as they arise.

Effie:

This week our class time was spent mostly on the ABET surveys and questions. That said I did spend considerable time outside of class getting the new servo hat for the raspberry pi soldered together and working with our servos as they weren’t ready yet for our demo last week since the hat hadn’t arrived yet. I simplified my code considerably and got the servos turning in series and timed with rotating the motors for the wheels so the operation of the page flipping is pretty much there – now we hopefully will be able to just try out a bunch of speeds/pulse widths on the servos/motors until we find the ideal speeds that make page turning work consistently.

I rewrote the basic threading implementation so that audio can be played at the same time as page flipping and text detection for the next page – but still have some more to do on that next week. I played with setting up my Mac to act as an SSH server so that I can actually scp files over from the pi to my Mac (until now we only could do it the other direction). As such next week I aim to tryout the text-detection code Celine wrote on my Mac (I’ll have to install the dependencies) since maybe running it on MacBook Pro would be the fastest and easiest to deal with (we’ve had trouble with connecting it through windows and it running to slow on the pi itself).

I also looked further into espeak and messed with some other voicings which do sound better, and tried to get Google’s text to speech library working on the pi but was unsuccessful. This coming week I aim to get google text to speech working on my Mac as well – which might make more sense anyways so that we’re effectively offloading the intensive computation (both text detection and text-to-audio production) onto the Mac which can do it much faster than the pi, and leaving the pi for really handling the mechatronics of the project.

Lastly as a group we spent time mapping out our progress and clarified responsibilities for going forward.

November 10, 2018November 11, 2018

Weekly Status Report #8: 11/3 – 11/10

Everyone:

Please see this link for a demo: https://www.youtube.com/watch?v=weT1ZYr_ntI&feature=youtu.be

Celine:

This week was a very busy week for us all! I finally got the opencv installation finished on the pi, but it had to be done in a virtualenv because of the conflicting python 2 and 3 installations on the pi. In the coming week or so I can try to get it working off of the virtualenv, as I’ve read online that virtual environments can slow down processes. Nonetheless, this setup worked for our purpose of being able to demonstrate our working page flipping and reading system.

For my individual work, I was able to implement the header removal that I mentioned last post by doing a connected components analysis (numbers all areas that are of the same intensity, 1, with a label corresponding to the area number) and removed everything besides the largest connected component, which is always the text. I know this is true for this book series because the book pictures do not cut the text into pieces, and I remove the images before running and dewarping or processing in the keepText function.

I fixed the issue where the script would hang indefinitely while running. It turned out to be a stdout flushing error. After adding flush=True as a parameter into my print statements, I no longer see problems with indefinite hanging while running the script.

To my code I’ve also begun adding debugging and timing statements so that I can work on optimizing the parts that take the longest to run. I know that the most time consuming part of the code now is the optimization of projection parameters in the dewarping phase of the image processing. The entire process lasts at most two minutes, which happens when running on a page set with two pages full of text, and the dewarping takes up around 80 to 100 seconds of this 120 seconds.

Because of the slow run-time of the OCR script on the Pi, there are two solutions I can implement to solve this problem: have the Pi ssh onto my laptop and run the code there, or to push the input image to AWS S3 and either trigger an AWS lambda function that would run the OCR code, or have it ssh and run the code on an AWS EC2 instance.

For the next few weeks I will be focusing on optimization of the OCR and enabling ssh on my laptop. As a team we will discuss whether or not we want to move our processing to the cloud by weighing the additional benefits running in the cloud might bring.

Effie:

This week I mostly spent on integrating all the moving components and helping my partners out to bring everything together for demo. In particular after having several path/dependency issues to resolve, I had to rewrite a good chunk of my code from last week to work properly with the text-detection script Celine worked on (by capturing images, spawning a new thread that hangs while waiting for a text file to be created based off that image, spawning another thread to output the audio, all while still running the page flipping. I wrote a basic version which worked for the demo – but still have some issues of threading and working with Espeak (our current choice of text-to-audio library that is, well, temperamental) to hammer out this week. As the week progresses, I am to have my code much cleaner and simpler (since I quickly scrapped together a current version to showcase for demo), and will look more into other text-to-speech libraries since Espeak sounds funny.

Indu:

On Monday, with the help of the employees at the Maker Space, I was able to finalize building of the wheel pivot device to separate pages. On Monday Effie and I then began testing out the device, but the servo motor wasn’t working with the Raspberry Pi. We all then tried again on Tuesday, but it did not work, so we are waiting for the Pi hat to come in so that we can use that with the motor, which will hopefully lead to the motor working on the device. Currently it can move up and down by hand, so it should work with the motor, as I looked up the torque and confirmed with a Maker Space employee that the torque statistics mean that it should work based on the dimensions I calculated for the device.

On Tuesday we also worked on our timeline for the rest of the semester, so I will be spending the rest of the semester building out all the parts for final demo, such as another wheel pivot device for the other side of the book (to flip pages backwards), a height adjustable stand, and a simpler page turning mechanism since the conveyor chain mechanism appears to be not consistent.

Thus, after the demo I went with Effie to get acrylic pieces at the IDEATE work space in Hunt Library. Later on in the week, Celine assisted me in the Maker Space and took more videos of me working, such as me cutting the acrylic.

Please see this video: https://www.youtube.com/watch?v=jl1ilr-w4_A&feature=youtu.be

November 4, 2018

Weekly Status Report #7: 10/27 – 11/3

Celine:

This week, I was to further examine the dewarping function and how to implement it myself. I came up with my own algorithm, but so far it is not completely functional. Because we have a currently working script for dewarping, I will leave it be and move on to other problems that I have been identifying. These problems are:
1) Removing images from the page, as the dewarping doesn’t seem to deal well with images. When given a page like this one:

the dewarping algorithm identifies contours within the image, but because of the complexity of the figure, some of the contours erroneously end up with an area of zero. This is an issue because to find the centroid of each contour, there is a division by the value of the area.

2) Skipping the “Chapter” or page title that shows up on each page. These often get misread (i.e. Monday with a Mad Genius = b’MWZJ/ WM a (91/611 genius) because of their fancier scripts.

3) correcting words that have a space inserted (i.e. “J ack”, “N either”)

If these issues are corrected, then the image processing/OCR problem will be relatively complete. My working/proposed solutions are as such:

1) By dilating the image (binarizing, inverting, and convolving with a non-unit rectangle), I am able to connect up the largest areas of high frequency content in the image, such as images and page folds. I then identify the connected components in the image using a builtin openCV function. This function efficiently finds and labels areas of connected components so that each component can be found completely by searching the image for it’s label. Large areas can be zeroed out, producing a result like:

2) a) A histogram along the vertical axis can be taken to identify where the lines of text in an image are. If the line of text is relatively far away from the other lines of text, then the selected line is most likely not a line of text that we want to keep, as the title is usually further away from the text body than the body lines are from each other.

b) Identify the starting position of each line, as the most common starting points will be the left margin, or the left margin and a tab. Anything starting after this is not a valid body text line.

3) a) Use autocorrect

b) Identify solo characters and identify if the character is valid as a solo character.

I also attempted to install all the necessary packages needed to run my code on the Raspberry Pi, but I ran into some trouble with conflicting package installations uninstalling one another. Additionally, when attempting to link all of the pieces of my code together, there were many conflicting input and output types, specifically around images being BRG or binarized.

This coming week, mainly tomorrow, I will focus on solving the second issue listed above, and make some attempts to solve the third. I will also compile all of the scripts I’ve written into one, to be used during the demo to showcase a complete run through of the OCR portion of our project. I will also work with Effie to get the installations done on the Pi. Our group goal this week is to be ready for the demo on Wednesday!!

Indu:

This week I got back the gear with the mounting hub inside it from the Maker Space employee who performed this service for me, after I had previously described to him in detail how I was planning on doing this. I then attached the gear with the mounting hub to the stepper motor, and my team and I were then able to make the conveyor chain move. The chain, however had issues turning smoothly due to not being flexible in all parts of the links. This week I am trying to resolve this by lubricating the chain with coconut oil, which so far appears to be loosening the chain.

In terms of the stand I am still in the process of building the mount with wood that will be attached to the magic arm stand that will be borrowed from Hunt Library for testing, but this should be stable by this weekend. In terms of the wheel pivot device, I have cut out all the pieces needed for it so far, but need to use the lathe to be able to connect the motor driving the pivot to the wood. As I do not have access/training to the machine shop, a Maker Space Employee informed me that he would be able to use the instrument for me on Monday morning. Thus, I hope to get that completed on Monday and complete the building of the device on Monday so that my team and I can test out all the components prior to Wednesday, when we are demoing.

Please see this link for the progress I’ve made this week: https://youtu.be/2qwMH35M7o4

Effie:

This week I worked on configuring a bluetooth speaker to interact with the pi which was actually a lot harder than I was expecting and researched with Celine a few text-to-audio open-source libraries to use. I tried out implementing each of them and for purposes of the demo I chose to use the Espeak library which can nicely take text and create an audio file which I can then play out through the speaker. I like this approach because in addition to reading the book out loud – I think it will be nice if we can keep all the pictures of the pages as well as the audio to construct a pdf and audiobook too while we’re at it! That said the voicing is not so nice so we’ll probably use another library at a later date. Additionally this week I got all my code for operating the servo, dc motor (for wheels), stepper motor (for the conveyer belt), the camera, and now text-to-speech all working together in one script in sync in a nice big loop! I will work on threading over the weekend so that playing the audio (and text detection once we sync that in too) doesn’t stall the operation of the motors. Professor Low noted I should focus this week on working on code for the demo so I will circle back to trying to motorize the zooming/panning on the camera later next week.. again not sure it’s possible but we’ll see!

This week I plan to work with my partners to get my code and current wiring of devices integrated together (both in software with Celine’s text-detection algorithms and in hardware physically with Indu’s stand and page turning arms) nicely for our demo, and to make my code more modular so that post-demo we’ll have an easier time fine-tuning how to operate and time each device.

Please see this video for my progress this week: https://youtu.be/b0slTXKf1DU

October 28, 2018October 28, 2018

Weekly Status Report #6: 10/20 – 10/27

Celine:

This week I worked on preparing material for our mid-semester demo and was able to test performance of warped vs. unwarped pages. To show the necessity of taking the step to dewarp pages, I have below a comparison of the text from warping a slightly curved page vs. the text of a relatively flat page.

The left column is a curved page, and the right is a relatively flat page. The source pages look like so:

Thus I believe it is beneficial to pursue the dewarping strategy of improving OCR. I was able to test dewarping of the left image using a program I found online, and it produced text like so:

While it still isn’t perfect, this dewarping definitely improved the performance. I did some research and found that Python has some autocorrect packages we can use to check the outputs that Tesseract produces. I’ve also concluded that the structure of our device needs to include some lighting, as this greatly enhances the program’s ability to threshold and process the pages.

This coming week I will complete my own dewarping program and start looking at implementing autocorrect!

Effie:

This was an exciting week! Following up on setting up the pi last week (formatting the SD card, installing raspbian, and registering the pi on cmu wifi), getting some books to try out from the library, and working with Indu on talking through the design for the stand and wheel-arms she’ll be building (and going on adventures to find scrap wood!), this week I had fun getting things moving! As more parts came in I worked on soldering the motor-hat pins and connecting it to the pi to connect and drive a stepper motor (for the conveyer belt) and the “teensy” dc motor (for the wheel). Additionally I found some drivers online and tweaked them to operate a servo. I am able to drive and control the two motors, servo, and camera all independently. It is possible we might need to buy a separate hat for the servos (not sure they can be running on the same pins as they are used by the motors)… I hope I’ll know by next week. Last week we had connected together all the camera extension parts only to find out that our 8MP camera wasn’t working since the connector soldering was messed up – but thankfully Sam was able to fix our camera! So now we are able to get great pictures! I am working on writing a script to automatically take and save pictures at pre-determined time intervals (to then send off to Celine’s code for processing to text). I met up as well with this week with Greg Armstrong in the Robotics Institute who gave me valuable advice on how to potentially operate our wheel-arms.

So next week I want to work on a few things: I plan on integrating my code to speak to several of the components at once, to work with Indu on physically connecting components together for a prototype of the arm she is designing, and I will attempt to figure out how to motorize the camera we bought to programatically zoom/pan – though I fear it won’t be possible (and might not be necessary anyways), but would be cool if I could!

Indu:

This week I primarily worked on building out the page-turning device. I spent a few hours with a mechanical engineering student, Krishna Dave, and talked out what I had drawn out for the design of the entire device, to ensure I was thinking about everything properly and had it mapped out well. From there I started constructing the turning part with the gear. I went to the Maker Space and talked with an employee for a while about the best way for me to mount the motor hub into the gear (the hub’s diameter is larger than the gear’s diameter). Since I do not have machine shop training I left those parts with the employee for him to provide this service when possible.

I was originally planning on building the stand as well, but my mechanical engineering friend suggested I use a tripod to mount the camera instead of building another part of the system, at least for now, since that will be height-adjustable. Due to this advice I went to Hunt Library and looked at their assortment of tripods and found one that may possibly work for the design. I have yet to test this with the camera, as I need to mount the camera on wood before I can mount it to the tripod, which will be done this weekend.

I plan on continuing the building process of the device, as there is still more thinking that needs to go into the wheel-part of the page separation device, due to after talking with my mechanical engineering friend, seems to be that we should focus less on trying to have the wheel work by gravity.

October 20, 2018October 21, 2018

Weekly Status Report #5: 10/13 – 10/20

Celine:

This week, my groupmates and I wrote up our design paper together. I was sick for the most of this week, so have not been able to make significant progress. The progress that I did make was testing out text detection using a pre-trained EAST text detector convolutional neural network, in order to try to segment text from page sets with images. The results were not what I wanted though, so for this problem I will need to try something else. For now, the data I have tried so far has worked well using pytesseract, as it will ignore the illustration on the page (please see the second figure below).

I am in the middle of trying to implement some text/page skew correction to try to improve the output from using pytesseract. When I input an image that is clean and not skewed, with high resolution, pytesseract works very well:

However, when used on an actual image that I took with my phone, I get results like this (see linked image for better resolution):

I noticed that “Stuart” is recognized incorrectly when it is along the curved part of the page, but recognized correctly when it is on the flatter portion of the page, so I am hoping that some skew correction will improve recognition.

I hope to have this skew correction completed this weekend, and in the coming week will implement some image processing such as binarization to see if that will improve accuracy as well. During this coming week I hope to have a working python script that takes an image of a page set against a black background and have it perform with better accuracy than it does in the image shown above.

Effie:

Indu:

This week I worked more on the design of the page turning device, specifically how the wheel and gear should connect to the motor in order for it all to work for turning the page, by having drawings of the potential design being that it would include a pivoting mechanism in order to allow it to be lifted up when the page is being turned.

This Wednesday, our Raspberry Pi came in so Effie and I spent majority of class time, setting it up and further discussing how we think the Pi will be used to operate all the different components of the device, as it will involve both of us to integrate the mechanics of the device with the Pi. Also Effie went to the library and got us our test base for the books. We all spent the rest of the week trying to use the Arducam with the Pi in order to take pictures of the books, but kept getting that the camera was undetected, so we think the Arducam may be faulty. Celine contacted Arducam to ask about the issue we were having so hopefully we get a helpful response soon.

In terms of next steps, next week I will work on building a mock version of the stand and the page-turning device so that Effie and I can connect various parts of our device (e.g. the wheel for page-separating, the gear for page-turning) to the Pi in an attempt to make each part work individually. We would also like to test other page-turning methods, as we stated earlier that while we think the conveyor chain method is the gentlest, we want to test this to actually know for sure.

October 14, 2018October 21, 2018

Weekly Status Report #4: 10/6– 10/13

Celine:

The beginning of this week I mostly worked on the design presentation slides with Effie and Indu. For the computer vision work, I made progress getting Tesseract installed and getting it to work on some unprocessed images I just got off of the internet. My teammates and I have been working on the design paper too, which we will finish tomorrow.

Next week I plan to get into being able to segment a page/set of pages from the background and look into ways to improve Tesseract’s accuracy.

Effie:

This week I worked on my slides of the in-class presentation and several parts of the design paper. I met with a friend who had done a similar page turning project to get advice on how to build a page lifting mechanism with a servo arm and motor wheel which was very helpful.

This coming week we plan to finish the design paper, and hopefully the pi will come in soon so I can get going setting it up and connecting devices.

Indu:

This week I worked on the Design Presentation with Celine and Effie. It involved me thinking more along the lines of what our final device should look like. I also spent a significant amount of time working on a drawing of what the whole device should look like. As we only have one of our conveyor chain gears, I spoke with a Mechanical Engineering friend and we may laser cut another gear this weekend. I also worked significantly on our Design Report and helped Celine further flush out the block diagram so that more of the technical specifications are included.

October 7, 2018

Weekly Status Report #3: 9/29 – 10/6

Celine:

This week I finished setting up OpenCV on my laptop and was able to run OpenCV with both python and C++ programs. In addition to the setup, I further researched some methods we have to use to perform image processing on document images: document alignment using edge detection, which can be done with imutils in OpenCV, and image binarization by reading the image as grayscale then applying a threshold. There is the problem of segmenting apart the two pages in the image, so I am thinking to possibly just sum the image into a histogram to do this. For the gear replication, we were not able to work on it this week, but we have decided that we will laser cut the gear instead of 3D printing it, as it will be faster and produce a smoother gear. Finally, I wrote up a draft spreadsheet detailing what needs to be done in each component of the project.

This coming week, I will be working on the design presentation with Indu and Effie. I will also implement the above mentioned image processing methods on some experimental data. I will also get the gear designed and cut with Indu.

Effie:

This week I (hopefully) finalized our parts list and ordered the motors, servo, pi + sd card, pi hat to connect to motors and extensions, and wheels and mapped out how we will connect all the pins. While we await the parts arrival next week we will work on the design proposal, and do research on how to operate the motors with respective drivers and code up some pwm logic – so we’ll be ready to hit the ground running when they come in.

Indu:

This week we got our initial parts, which were mainly the Arducam, gear, and conveyor chain. After heavily analyzing all the parts we received on Monday with Celine, we worked on further flushing out what each part of our block diagram will consist of and involve.

This week I worked with Effie on further planning out how all the other parts will be working together, such as the Raspberry Pi and motors for our wheel and conveyor chain. We also decided to buy a servo motor as a backup method of turning the page if the conveyor chain method proves to not work the way we thought. This involved working with Effie to make sure each part we ordered can be integrated with all the others. It also involved me working on a rough sketch of what the whole design should look like to test the page turning part of the project. I want to make sure that the dimensions of the stand and conveyor chain will be practical for a binder, as that is what we want to test with first.

This coming week I will be working with my team on the design presentation. I will also be working with Celine on designing and cutting the other gear for the conveyor chain part of the design. If we receive our parts this coming week, then hopefully I will be able to build a prototype of the page-turning device.

September 30, 2018

Weekly Status Report #2: 9/22-9/29

Celine:

The first half of this week I spent researching cameras that can interface with a raspberry pi, and ordering the camera I found to be sufficient for our initial experimentation, other parts we as a team found to be necessary for an initial implementation, and the connecting cable for the camera. I found 8MP and 14MP Arducams, both of which can connect to raspberry pis. However, the 14MP camera requires a USB interface board in order to be able to connect to the pi. As the most recent USB version implemented for this camera is a USB3, and no raspberry pi currently supports USB3, I decided that it would be better to stick with the 8MP camera for now. After we buy the raspberry pi and gauge our budget, we can determine if a camera upgrade is worth it. Additionally this week I have been setting up OpenCV on my personal laptop. I will be testing the setup this weekend.

Next week, I hope to have tested out using OpenCV with some pre-trained Tesseract net. I also hope to have laser cut a duplicate gear to complement the one Indu ordered this week. We will be aiming to have the chain belt mechanism implemented this week, as we have the necessary parts ordered for it already.

Effie:

This week I looked further into some hats and began designing out how the pin connections will need to be laid out to properly connect the pi to some encoder motors and servos and the camera and a speaker and mic, and I looked up details of how to install drivers. We ordered some more parts and this week plan to order the remainder to get to testing initial ideas.

Indu:

This week I continued researching the methods used in page turning devices and worked on deciding what parts would be used for testing purposes. I also ordered some of these parts, and have an initial idea of the next set of parts I am planning to order next week, such as the motor that will drive the device.

I have only initial calculations for the page-turning device, such as for the conveyor chain being around 3 feet long in the final device. I need to further work on these calculations, to better design a schematic for the project. That is what I plan on doing next week. I also want to actually start building a test device next week with the parts that Celine and I have ordered.

September 22, 2018September 23, 2018

Introduction and Project Summary

Project Presentation Immediate Alert Cystem (Team 9) NarrAUTOr Proposal

NarrAUTOr is a reading robot assistant able to help in a variety of situations, whether it is used by a child, a student, or a person with a disability. It incorporates mechatronics, hardware, image processing, and computer vision in order to turn the pages of and read aloud the book you would like to hear.

The setup of the device itself will include multiple components controlled by a portable computing unit; two page scrunching mechanisms that uses friction to create a loop at the base of the page, where the page connects to the spine of the book, corresponding to the direction (forward or backward) the user would like to flip; a page flipping mechanism that flips the scrunched page; a camera sitting atop a stand that points down at the book and snaps a picture of it; and a computer vision program that takes the picture of the open pages, translates the image to text, and reads out the text.

While there are existing devices with similar function to the one we are proposing, our implementation takes the best features of the ones we have seen and combines them into one. The feature that sets our project apart is how we will include an algorithm that determines when after scanning a set of pages to turn to the next set of pages, so that there is no lag in the reading of the book to the user. An additional good feature is that we will be able to flip pages backwards, so that users can review what was previously read; to prevent the need to re-scan pages as the user goes backwards, we will enable Narrautor to save the last few audio files produced from previous pages.

In addition to this post, we have uploaded our initial project report. It is important to note, that this report’s content is actually different from the project description given in the previous paragraph. This is due to our team deciding to pivot from that project idea to this new one last Friday. Our initial project idea that we proposed to the course staff, was a real-time bus detection system that would use computer vision from camera data along Forbes Avenue to find a Pittsburgh Port Authority (PA) bus (specifically any of the 61s) coming to a nearby bus stop and notify users that it was coming within a few minutes. Some explanation of why we pivoted from this idea is noted in the following bullet points:

– learned that there was currently data like this being observed by CMU’s Traffic21 group, so contacted them to see their data, which they told us we could use for this project

– spoke with one of the researchers, saw their data, and went to see the cameras at their location on Friday, September 14th

– The camera data was not at the resolution we needed (was too low) as while we were able to detect a PA bus, we were not able to distinguish what type it was (e.g. a 61A).

– Apparently they were only allowed to be on the open WiFi, but that sometimes lost connection, and they had to consistently make sure that the camera had power.

– While we would have been able to use Traffic21’s data, after seeing it and their cameras, we determined that we would need different cameras, and more of them since we wanted to be able to track buses in advance.

– The cameras they used and the setup of it, was around $200 for one camera (we believe), so if we wanted 3 extra cameras we would have already exceeded our budget in that area with nothing left to spare for other items.

– Overall, while the idea seemed cool, and we were glad to have learned of a group on campus that is doing something similar, we decided that it was not a good idea to implement this semester.

Due to this pivot, we have also uploaded our Project Proposal Presentation, which was given on Wednesday, September 19th to the 18-500 class, and details the idea for project idea NarrAUTOr.