Ray’s Status Report for 10/21/2023

For the past two weeks, my team and I worked on the design review of our project to finalize the application structure and implementation design. We worked on the design review document together and dicussed on how each subcomponent of our project should work. I worked specifically on finalizing the user interface section.

To accomplish the above task, I shifted my focus from tkinter to kivy, which is the new package we chose for implementing our application interface. I learned about the basics of initiating and configuring the widgets and wrote kv language files to simplify the design process. Using the skills I learned, I created the prototype pages for the main menu and the pose selection page. Below are the prototypes I created. (Consider this my answer to the ABET question.)

In order to let the application run as expected, I have to get used to kivy’s screen manager functionality to switch between different pages. I am currently working on it and I plan to work out a functional application frame next week. Also, the live camera embedded in the training page is also something I need to look into, and I will start working on it as soon as the previous task is mostly completed.

I am still trying to configure openpose on Windows; if necessary, I will ask Eric about the building process and make sure openpose run on my computer by next week.

I am overall on schedule for this week. Still, starting from next week, I have to put more attention on cooperating with my teammates since the integration process of our application should soon begin.

Team Status Report for 10/21/2023

For the week prior to the fall break, our group mainly focused on finishing and polishing the Design Review document. We split up the work evenly, with every team member taking up some part of the writing and filling in the part of the subsystem they are responsible for in the following developments. We were able to find existing journals and research to justify our design requirements, provide details for our system specs, and think of alternative approaches if some parts of the system would go wrong. The writing of the design documentation aided us to explain a lot more details not mentioned in our design review presentation due to time constraints (since we need to fit contents of two presentation into one as we redesigned our project after the proposal presentation), providing faculty a better picture of the project and helped clarifying details among group members.

While enjoying our fall break, we also made some progress in implementing some of the subsystems.

Thanks to Hongzhe (Eric), the Openpose posture detection is now working with imported images (handpicked high resolution from online video source) which we are going to use as reference postures and baseline in evaluating user poses. As we described inside our design documents, these images will be transformed into json files with key points of postures in 2D that will be passed into the comparison algorithm for calculation of the differences.

Shiheng worked on the Text-to-Speech engine, where it could take in text instructions generated from a comparison algorithm and pass them into the voice engine to generate real time instructions. The time required for the voice engine to generate output files is low (< 0.5 seconds for an average length instruction) and the output voices are clear and easy to understand. We will continue to look into voice engine outputs and determine the best way to present these vocal instructions to users.

On the frontend, Jerry looked into the project considering several Python packages for better UI and user considerations. He determined that the Kivy package will provide us with lots of widgets that could be easily implemented and a much better interface than Tkinter which was what we planned to use originally.

Ray focused on learning the kivy language and creating the prototype frontend application. He created a prototype page for the main menu and the pose selection page. He is also working on the screen manager feature of Kivy to support page switch in the application and expeirmenting on openpose data interfacing with Kivy UI.

Shiheng’ Status Report for 10/21/2023

I mainly contributed to the Design Requirement and Tradeoff part of the Design review document. Starting from what we have in the design review slides and notes shared among us, I was able to quatify and justify requirements we encoutered in our project. The tradeoff part also plays a vital role inside the design document, as it explains decisions we made in the past weeks and why do we think they are in our best interest.

For the past week, I mostly researched my part of TTS engine and its applications to our project. During the implementation, lots of compatibility problems have been discovered on Python version, package compatibility, and output supports. After trying out a few different Python versions and attempting to install the TTS package on my laptop, I determined that Python 3.10 was the best fit as it supports all the necessary packages inside the TTS engine with most packages up to date. Other versions either reached the end of their life cycle or had issues supporting the latest packages that the TTS engine required to use.

With the package successfully installed and all requirements fulfilled, I generated a few .wav files to for demonstration purposes. The .wav files sound ok and is compatible with running locally on command prompt using the default voice engine provided. I’ll continue to research on different voice engines to make sure that the best among them is picked and consider that user might want to pick different voices while they are being instructed. I will continue to work on this part and begin to integrate this voice function into our project once our pipeline is mostly done.

Here’s a Sample message using default engine:

“Raise your arm by 20 degrees”

 

ABET #7 Question:

I have looked into tools like Text-to-Speak engines and try to understand the logic and algorithm behind with different voice models.

Additional knowledge including understanding pronunciation, syllables in English and what speed should the voice engine use to speak the words. Those parameters need to be adjusted to meet our design requirements and provide efficient feedback to the user.

Also looked into various online forums and design documents for the installation and learnt similar problems that other users previously had in installing older versions of the package. Learnt a lot more about Python packages related to machine learning and voice generation.