This week, I created an app on XCode that can perform the following functions: allow users to upload a PDF file, send this PDF to a Flask server, receive the extracted text from the server, and speak this text out loud. I also built the associated Flask server in Python. On the server, there is some python code to extract only the information on a single slide — only one slide needs to be spoken aloud at a time. For the final app, the actual screen description won’t be printed (this is just for testing).
Here are images of the working code on an iPhone 14 Pro simulator:
One issue I ran into was not being able to test out the haptics and accessibility measures I implemented, as well as the text to speech. This cannot be tested on a simulator, and to feel the physical vibrations from the haptics, I needed to test it on my personal phone. However, I need an Apple Developer account in order to test anything on my personal phone (rather than a built-in simulator). So, I emailed professors and faculty who know about iOS development (like Prof. Larry Heimann who teaches an iOS dev course at CMU), and am waiting on a response from them. However, for the most part, the application seems to be finished and working for the time being. The major additions I will have to make is implementing logic on the python code running on the Flask server, but this will be done after the ML model is completed.
I am still ahead of schedule, but I expected to get some work done with respect to gathering training data for the ML model. However, I was unable to do that since the bugs I had in my Swift code took a long time to debug.
Next week, I will primarily focus on gathering a large portion of the data we need to train the slide and graph recognition models, as well as spend a lot of time working on my design presentation.