Ray’s Status Report for 09/30/2023

This week, we had a new team member on our team. We had two three meetings together to discuss on labor division as well as technical details of the project and made solid progress on our implementation design. On my part, I researched on the API structure of Openpose and the packages for implementing our user interface.

The API of Openpose supports both Python and C++ for data transfer, and after discussing with my teammates (specifically Jerry), we decided on using Python since it has more functionalities available for user interface implementation. The most time consuming computation our system will do is in openpose, while the api is only responsible for reading outputs from it, so there should not be a huge sacrifice in runtime efficiency.

The API package that I plan to use is TKinter. Upon research, I find it having great potential as the tool for implementing our system. It supports graphics and also has extensive support on GUI elements. (Below is the tutorial I’m watching.)

I was taught briefly about how openpose work in Computer Graphics 15462 and had the chance to read about it in the past. However, I have not integrated the model into a full system before, so for the past week I read about openpose’s API usage on their Github page. Openpose official repository has some helpful tutorials for me to refer to. I read through some of them, but have not had the chance to try running them. I plan to install and build the openpose project on my laptop this weekend.

I am back on schedule this week, and my plan for next week is to get used to the TKinter package and the Openpose API to start creating a prototype UI for our system. Look forward to seeing how it will turn out!

 

Team Status Report for 09/30/2023

Week of 09/30/2023: A week of Research and Findings

We have a new team member Jerry Feng joining us following our new proposal being approved on Monday. Doing research works on different parts of the project and reintroducing the project to group members were the main focuses of the week.

During Monday and Wednesday meetups, Shiheng, Ray and Eric (Hongzhe), made a proper introduction to Jerry about our Taichi project, discussed its background, and explained why we chose this as our project. In addition, Ray and Eric discussed with Jerry in depth about the openpose algorithm  and existing pipeline in our original plan, while Shiheng focused more on the comparison algorithm and how voiceover should be implemented in parallel to the development of the posture pipelines. While meeting up with Professor Bryon and Eshita, we brainstormed about how Jerry could integrate into our existing framework and decided that developing an alternative pipeline allowing customization would be best in Jerry and the rest of the team’s interest in working parallelly.

After transitioning smoothly into our new workflow, we spent most of our time researching our own parts of the project. We discovered various compatibility issues when trying to set up environments on our own laptops, but fortunately they were all resolved by the end of the week through our discussions and research online, which we all had a good understanding of how to implement the project. Everyone on the team has done decent research on narrowing down the ideal programming language, packages, and algorithm, which could be justified from various aspects including but not limited to compatibility, offline support, efficiency, and ease to use.

For the rest of the week, we spend the majority working on design review slides and replanning out our project quantitatively with the new pipeline Jerry owned. We refined our proposal slides to include more quantitative values to measure our performance, specified measures dealing with pipeline failures, and brainstormed various test cases for future verification purposes. Additionally, we worked on creating a new Gantt chart to include current works and reorganizing the work as Ray and Jerry now have some overlapping they could collaborate on during the semester.

Gantt Chart is Attached Below to show our progress:

Shiheng’ Status Report for 09/30/2023

During the week of 9/30, my primary focus was on researching and understanding critical design tradeoffs related to our project. This entailed two key aspects: evaluating text-to-speech (TTS) options within the Python development environment and gaining insights into the implementation of cosine similarity-based posture analysis. Each option had its unique set of pros and cons, with considerations such as internet accessibility, voice quality, and language support. Furthermore, I delved into the idea of cosine similarity and its application in posture analysis, with a keen eye on setting an appropriate similarity threshold. These endeavors paved the way for informed design decisions in the upcoming phases of our project.

 

In the realm of Python, I examined three TTS solutions: gTTS API, pyttsx3, and Mozilla TTS. The gTTS API offers flexibility in preprocessing and tokenizing text, supporting multiple languages with customizable accents. However, it necessitates internet access due to its API nature. Conversely, pyttsx3 provides greater customization options but lacks the naturalness of gTTS. Mozilla TTS, while high-quality and offline-capable, requires research for voice training and personal selection of voice engine. These assessments have equipped us with a comprehensive understanding of TTS tools, which I determined that Mozilla TTS is the best option among all. I also made backup plans for the case of C++ and found TTS engines that fit that approach.

In parallel, I delved into the mathematical underpinnings of cosine similarity for posture analysis.  It offers scale-invariance and angular similarity comparison, making it apt for body pose analysis. The critical decision to set a similarity threshold, possibly ranging from 80% to 90%, emerged as a key design consideration. This threshold will be pivotal in assessing whether two postures are sufficiently similar or dissimilar. By thoroughly understanding these design tradeoffs, we are better equipped to make informed choices in developing our posture analysis system, balancing accuracy, and flexibility to accommodate varying body sizes and orientations.

The comprehensive evaluation of TTS tools has provided insights into the advantages and disadvantages of each option, enabling us to make an informed choice aligning with our project’s goals. These efforts represent significant progress toward the successful execution of our project, empowering us to make well-informed design decisions moving forward.

ABET: For consine similarity, I extracted the concept from 18-290 and 18-202 in terms of vector comparision and algebra.

For Python coding perspective, I took it from previous CS courses taken and my personal experience with Python. I researched on TTS packages this week through looking at github developing docs and developer websites about those concepts.

Jerry’s Status Report for 09/30/2023

  1. During this week, I was only able to be integrated into the team on Wednesday (9/27), so I spent Wednesday discussing how to integrate me into the team with everyone else and reworking the labor division between the team.  I also did research on my teams options for file storage and decided we would go with a simple directory system and use JSON files to store the user’s pose coordinates and reference pose coordinates, for ease of interfacing with Openpose, as it already uses a JSON format in its API’s.  I also looked into running Openpose on my local machine, and I found out that I would have to run Openpose on my CPU as opposed to my graphics card as my graphics card unfortunately does not support CUDA.  Additionally, I also made an official system diagram for our presentation and contributed to the design presentation slides.

2. I believe that our progress on the project is on schedule, as I am in a good position to finish the design presentation slides today and then move forward with running Openpose and getting into the meat of the project.

3. Next week, I plan to get Openpose up and running on my computer.  I also plan on helping to write the design report and coordinating with Ray about some of the specifics of the UI design so user’s can easily upload and access custom image sequences for their own training purposes.

4. Using JSON files and working with API’s is something I learned over the summer at my internship, and our idea of breaking our project into modules for ease of organization and implementation is something that was heavily emphasized to me in Structure and Design of Digital Systems (18-240).