Shiheng’ Status Report for 10/28/2023

My work this week mainly focused on comparison algorithms. Using the json data generated from Openpose thanks to Eric, I was able to craft my python script for comparing two different postures. There was some confusion at the beginning since there were some extra outputs than I previously expected, but it was figured out after communicating with Eric and looking at the documents from Openpose. The keypoints were narrowed down and point representing positions like eyes and ears were eliminated to improve the accuracy of determining overall body posture.

Only point 0-14 are used in judging posture right now for efficiency, I’ll see how certain keypoints (on the feet) impact the correctness of posture in testings following the implementation.

Using packages like numpy and json, I was able to pull files from the locally designated folder and compare the sets of data within. I added a few steps to process the data including the mentioned reduced datapoints, but also reforming vectors from the given limb datapoints. The reformed ‘limbs’ were then passed into the cosine similarity comparison. The response of the algorithm was quick, and results were easy to understand. The way we wanted to display the results and the mechanism is still to be determined, so I will display a raw output below from my algorithm to showcase one example of comparison based on two trials of a same image but which I did some minor changes in the output position of the upper limb of the second image.

The next steps will be to connect the comparison algorithm with the TTS engine to invoke vocal instructions and move on the integration steps accordingly.

Team Status Report for 10/28/2023

During the past week, our team has been diligently working on various aspects of our project and collaborating to work the magic.

 

Ray and Jerry are working on the UI design using Kivy about functions like image uploading and displays. They will continue to cooperate with other members of the team to integrate functions like Openpose and voice engine into the application. This will play a pivotal role in ensuring a user-friendly and visually appealing experience for our application compared to the more naïve implementation of Tkinter.

 

Eric has successfully integrated Openpose into our system, enabling it to accept image inputs and generate JSON outputs for the comparison algorithm for Shiheng. This is a pivotal step in our project, as it provides the foundation for our comparison algorithm and more detailed instructions on body postures. We’ll look at how we want to display Openpose feedback integrated with camera images to the user in the following weeks.

 

Shiheng has implemented a cosine similarity algorithm for comparing key points that represent body posture. This algorithm will allow us to quantify the similarity between different body postures, providing a strong basis for evaluation and giving out instructions. We can analyze and compare the alignment and positioning of individual limbs, offering a more detailed assessment of body posture.

 

We also discussed issues on ethics after we specified more about project details on our Friday meeting, during which we first exchanged opinions about the reading and answers we made on the project, then proceeded to argue more on privacy, autonomy, and potential issues of misusing the project.

 

Our team is making solid progress, and we’re on track to deliver our comprehensive Taichi instructor system. We’ll continue working on these components, work on feedbacks from our design review report, and aim to achieve our project milestones in the coming weeks.

Team Status Report for 10/21/2023

For the week prior to the fall break, our group mainly focused on finishing and polishing the Design Review document. We split up the work evenly, with every team member taking up some part of the writing and filling in the part of the subsystem they are responsible for in the following developments. We were able to find existing journals and research to justify our design requirements, provide details for our system specs, and think of alternative approaches if some parts of the system would go wrong. The writing of the design documentation aided us to explain a lot more details not mentioned in our design review presentation due to time constraints (since we need to fit contents of two presentation into one as we redesigned our project after the proposal presentation), providing faculty a better picture of the project and helped clarifying details among group members.

While enjoying our fall break, we also made some progress in implementing some of the subsystems.

Thanks to Hongzhe (Eric), the Openpose posture detection is now working with imported images (handpicked high resolution from online video source) which we are going to use as reference postures and baseline in evaluating user poses. As we described inside our design documents, these images will be transformed into json files with key points of postures in 2D that will be passed into the comparison algorithm for calculation of the differences.

Shiheng worked on the Text-to-Speech engine, where it could take in text instructions generated from a comparison algorithm and pass them into the voice engine to generate real time instructions. The time required for the voice engine to generate output files is low (< 0.5 seconds for an average length instruction) and the output voices are clear and easy to understand. We will continue to look into voice engine outputs and determine the best way to present these vocal instructions to users.

On the frontend, Jerry looked into the project considering several Python packages for better UI and user considerations. He determined that the Kivy package will provide us with lots of widgets that could be easily implemented and a much better interface than Tkinter which was what we planned to use originally.

Ray focused on learning the kivy language and creating the prototype frontend application. He created a prototype page for the main menu and the pose selection page. He is also working on the screen manager feature of Kivy to support page switch in the application and expeirmenting on openpose data interfacing with Kivy UI.

Shiheng’ Status Report for 10/21/2023

I mainly contributed to the Design Requirement and Tradeoff part of the Design review document. Starting from what we have in the design review slides and notes shared among us, I was able to quatify and justify requirements we encoutered in our project. The tradeoff part also plays a vital role inside the design document, as it explains decisions we made in the past weeks and why do we think they are in our best interest.

For the past week, I mostly researched my part of TTS engine and its applications to our project. During the implementation, lots of compatibility problems have been discovered on Python version, package compatibility, and output supports. After trying out a few different Python versions and attempting to install the TTS package on my laptop, I determined that Python 3.10 was the best fit as it supports all the necessary packages inside the TTS engine with most packages up to date. Other versions either reached the end of their life cycle or had issues supporting the latest packages that the TTS engine required to use.

With the package successfully installed and all requirements fulfilled, I generated a few .wav files to for demonstration purposes. The .wav files sound ok and is compatible with running locally on command prompt using the default voice engine provided. I’ll continue to research on different voice engines to make sure that the best among them is picked and consider that user might want to pick different voices while they are being instructed. I will continue to work on this part and begin to integrate this voice function into our project once our pipeline is mostly done.

Here’s a Sample message using default engine:

“Raise your arm by 20 degrees”

 

ABET #7 Question:

I have looked into tools like Text-to-Speak engines and try to understand the logic and algorithm behind with different voice models.

Additional knowledge including understanding pronunciation, syllables in English and what speed should the voice engine use to speak the words. Those parameters need to be adjusted to meet our design requirements and provide efficient feedback to the user.

Also looked into various online forums and design documents for the installation and learnt similar problems that other users previously had in installing older versions of the package. Learnt a lot more about Python packages related to machine learning and voice generation.

Team Status Report for 10/07/2023

Week 10/7/2023: Continuing to revise design and initial project development.

On Wednesday, we finished our design review presentation, which we continued our discussion on our design during our Thursday night meeting. We were joined by professor Tamal who gave us valuable feedback on our current project and how we could improve our MVP framework to include more functionality. One of the things we adapted was the idea of evaluating users’ performance over a short period of time to understand the fact that they need to setup and transit into the position, eliminating the chances that they are doing a completely different posture but accidentally got a higher score due to the frames grabbed by our system, and provide an internal evaluation/scoring system for body postures. We still need to polish these ideas before next Friday, when the design report is due, before which we would need to quantify these thresholds and reflect the changes in our report.

On Friday morning, we received detailed feedback thanks to hardworking faculties on our design presentation. We believed that a detailed presentation was given but it seemed we still needed to clarify some issues that were not showcased clearly within our presentation.

The team worked throughout the week to perfect the details and narrowed down how we wanted to approach this project, which we determined that minor changes need to be made to our schedule. We have begun coding and incorporating systems gradually into our project, for which we already have Openpose algorithms working ahead of time thanks to the extra effort Hongzhe (Eric) put in to configure and run the system.

We’ll continue to work with faculty members in the following week on our design report and start data collection through imported poses we handpicked online from Taichi professionals, integrating Openpose API, and start developing the comparison algorithm. We currently have no plans to order any hardware equipment, and please stay tuned for our next update and design documentation.

 

ABET question:

  1.       Engineering Ethics: We highly respect the privacy of users due to the necessity of using cameras to capture body posture. It is possible that the camera could collect more personal info than posture, so we planned to make this a local application without the need for exchanging information with a cloud server. All collected data will be stored and evaluated locally without the need for connection to the internet, and users can choose to close the camera during the usage of the application if they feel necessary. (e.g. reviewing posture, attending to other businesses, or feeling not to practice)
  2.       Math model in comparing body postures: Using the idea of cosine similarity, we attempt to compare body postures captured from users to our standard preset provided by Taichi professionals. To account for differences in heights and shapes of people, we directly measure the joint angles of a person instead of directly approximating how similar the person is due to the differences of absolute positions of joints. Normalization principle could also be applied to postures to account for magnitude differences in vectors for varied body sizes.
  3.   Machine Learning model in OpenPose: In the posture recognition application OpenPose, we are able to use the trained convolutional neural network for recognizing the core body coordinates from the input video/image. Convolutional neural networks are based on simple batch processing, dot product of matrices and many techniques such as pooling and regularization to avoid outliers and overfitting. It utilizes all kinds of principles and formulas from Mathematics as logarithmic calculation to enable convolution and activation function. It also uses differentiation calculus for training the network parameters when back propagation.

Shiheng’ Status Report for 10/07/2023

This week, I put my focus more on implementing the comparison algorithm and normalizing body sizes for better results which accounts for difference in body sizes. Abiding our Gantt Chart, here’s my progress:

Progress:

Since our pipeline from OpenPose is still under development, I constructed some datasets myself (purely random) to test my comparison algorithm using cosine similarity. Cosine similarity measures the angular similarity between vectors, making it ideal for assessing body orientations. Additionally, I will explore techniques to normalize body sizes to enhance the accuracy of these posture comparisons in the following week.

To facilitate comparison, each body posture is transformed into a vector within a multi-dimensional space. Each dimension within this space corresponds to a specific key point (in our case, joint) detected by OpenPose. For instance, if I am receiving from OpenPose output consists of 18 absolute positions, each posture is then represented as an 18-dimensional vector.

The implementation requires packages including numpy, normal Python environment and VSC for development. I used 3.11.5 in this case since 3.12 was just released a few days ago which could have compatibility issues with package supports. I’ll make sure to keep targeting the latest version for optimization and support of the packages.

 

Implementation on absolute position (Planned for next week):

To account for differences in body sizes and variations in the distances between body joints, it is imperative to normalize the posture vectors. The idea I have now is to normalize every person into my body size, which is around 5’10 and 170 lbs (need to be justified next week with other members of the group). This will be an add on to the cosine comparison idea to determine the absolute position of users. The idea of using absolute position eliminates the possibility that the user is doing a completely different posture and scores a high similarity due to the nature of cosine similarity. The normalization process involves dividing the coordinates of key points by reference length, which is typically the distance between two consistent points (two shoulders, the ratio between upper and lower body, the ratio between calf and thigh). This procedure scales all joints proportionally to this reference length, facilitating a relatively standard comparison.

 

Team Status Report for 09/30/2023

Week of 09/30/2023: A week of Research and Findings

We have a new team member Jerry Feng joining us following our new proposal being approved on Monday. Doing research works on different parts of the project and reintroducing the project to group members were the main focuses of the week.

During Monday and Wednesday meetups, Shiheng, Ray and Eric (Hongzhe), made a proper introduction to Jerry about our Taichi project, discussed its background, and explained why we chose this as our project. In addition, Ray and Eric discussed with Jerry in depth about the openpose algorithm  and existing pipeline in our original plan, while Shiheng focused more on the comparison algorithm and how voiceover should be implemented in parallel to the development of the posture pipelines. While meeting up with Professor Bryon and Eshita, we brainstormed about how Jerry could integrate into our existing framework and decided that developing an alternative pipeline allowing customization would be best in Jerry and the rest of the team’s interest in working parallelly.

After transitioning smoothly into our new workflow, we spent most of our time researching our own parts of the project. We discovered various compatibility issues when trying to set up environments on our own laptops, but fortunately they were all resolved by the end of the week through our discussions and research online, which we all had a good understanding of how to implement the project. Everyone on the team has done decent research on narrowing down the ideal programming language, packages, and algorithm, which could be justified from various aspects including but not limited to compatibility, offline support, efficiency, and ease to use.

For the rest of the week, we spend the majority working on design review slides and replanning out our project quantitatively with the new pipeline Jerry owned. We refined our proposal slides to include more quantitative values to measure our performance, specified measures dealing with pipeline failures, and brainstormed various test cases for future verification purposes. Additionally, we worked on creating a new Gantt chart to include current works and reorganizing the work as Ray and Jerry now have some overlapping they could collaborate on during the semester.

Gantt Chart is Attached Below to show our progress:

Shiheng’ Status Report for 09/30/2023

During the week of 9/30, my primary focus was on researching and understanding critical design tradeoffs related to our project. This entailed two key aspects: evaluating text-to-speech (TTS) options within the Python development environment and gaining insights into the implementation of cosine similarity-based posture analysis. Each option had its unique set of pros and cons, with considerations such as internet accessibility, voice quality, and language support. Furthermore, I delved into the idea of cosine similarity and its application in posture analysis, with a keen eye on setting an appropriate similarity threshold. These endeavors paved the way for informed design decisions in the upcoming phases of our project.

 

In the realm of Python, I examined three TTS solutions: gTTS API, pyttsx3, and Mozilla TTS. The gTTS API offers flexibility in preprocessing and tokenizing text, supporting multiple languages with customizable accents. However, it necessitates internet access due to its API nature. Conversely, pyttsx3 provides greater customization options but lacks the naturalness of gTTS. Mozilla TTS, while high-quality and offline-capable, requires research for voice training and personal selection of voice engine. These assessments have equipped us with a comprehensive understanding of TTS tools, which I determined that Mozilla TTS is the best option among all. I also made backup plans for the case of C++ and found TTS engines that fit that approach.

In parallel, I delved into the mathematical underpinnings of cosine similarity for posture analysis.  It offers scale-invariance and angular similarity comparison, making it apt for body pose analysis. The critical decision to set a similarity threshold, possibly ranging from 80% to 90%, emerged as a key design consideration. This threshold will be pivotal in assessing whether two postures are sufficiently similar or dissimilar. By thoroughly understanding these design tradeoffs, we are better equipped to make informed choices in developing our posture analysis system, balancing accuracy, and flexibility to accommodate varying body sizes and orientations.

The comprehensive evaluation of TTS tools has provided insights into the advantages and disadvantages of each option, enabling us to make an informed choice aligning with our project’s goals. These efforts represent significant progress toward the successful execution of our project, empowering us to make well-informed design decisions moving forward.

ABET: For consine similarity, I extracted the concept from 18-290 and 18-202 in terms of vector comparision and algebra.

For Python coding perspective, I took it from previous CS courses taken and my personal experience with Python. I researched on TTS packages this week through looking at github developing docs and developer websites about those concepts.

Shiheng’s Status Report for 09/23/2023

After an intense discussion with Professor Tamal during the Friday meeting, we discovered the justification for our initial plan in terms of costs and applicability was not strong and viable. Understanding the difficulties and hardships we are going to meet using Openpose in patient behavior detection, I contributed to the change of topic into Taiji instructor. With reference to yoga instructor available online, we decided to follow a similar trend but carry out it differently since Taiji is more concentrated on the flow of body motion instead of static data points. We could control the cost easily through justification provided to monitor larger motions of Taiji using Openpose and the setup of one camera to provide back real time justification and evaluation of the body posture, instead of the original plan on patients, which we ignored the class of patients which are bedridden and would have continuous minor movements instead of large motions which normal people would behave.

Gathering data and formulating them are the main progress I made in the past few days after the project change, using Ray’s reference body positions, I identified some body positions that are concise for beginners to learn and could possibly be identified through the Openpose algorithm. I reckon these positions will be clearly identified in low-cost cameras and meet the requirements of the Raspberry Pi we tend to use in the project.

Though we are behind on scheduling, we have advanced greatly in the data collection and classification part which hindered us greatly on the previous project. I believe we could catch up in the following weeks and advance greatly into our project.