Team B4: Taichine – Page 4 – Carnegie Mellon University: ECE Capstone Projects, Fall 2023|| Sirui Huang, Hongzhe Cheng, Shiheng Wu, Jerry Feng

October 22, 2023

Ray’s Status Report for 10/21/2023

For the past two weeks, my team and I worked on the design review of our project to finalize the application structure and implementation design. We worked on the design review document together and dicussed on how each subcomponent of our project should work. I worked specifically on finalizing the user interface section.

To accomplish the above task, I shifted my focus from tkinter to kivy, which is the new package we chose for implementing our application interface. I learned about the basics of initiating and configuring the widgets and wrote kv language files to simplify the design process. Using the skills I learned, I created the prototype pages for the main menu and the pose selection page. Below are the prototypes I created. (Consider this my answer to the ABET question.)

In order to let the application run as expected, I have to get used to kivy’s screen manager functionality to switch between different pages. I am currently working on it and I plan to work out a functional application frame next week. Also, the live camera embedded in the training page is also something I need to look into, and I will start working on it as soon as the previous task is mostly completed.

I am still trying to configure openpose on Windows; if necessary, I will ask Eric about the building process and make sure openpose run on my computer by next week.

I am overall on schedule for this week. Still, starting from next week, I have to put more attention on cooperating with my teammates since the integration process of our application should soon begin.

October 22, 2023

Team Status Report for 10/21/2023

For the week prior to the fall break, our group mainly focused on finishing and polishing the Design Review document. We split up the work evenly, with every team member taking up some part of the writing and filling in the part of the subsystem they are responsible for in the following developments. We were able to find existing journals and research to justify our design requirements, provide details for our system specs, and think of alternative approaches if some parts of the system would go wrong. The writing of the design documentation aided us to explain a lot more details not mentioned in our design review presentation due to time constraints (since we need to fit contents of two presentation into one as we redesigned our project after the proposal presentation), providing faculty a better picture of the project and helped clarifying details among group members.

While enjoying our fall break, we also made some progress in implementing some of the subsystems.

Thanks to Hongzhe (Eric), the Openpose posture detection is now working with imported images (handpicked high resolution from online video source) which we are going to use as reference postures and baseline in evaluating user poses. As we described inside our design documents, these images will be transformed into json files with key points of postures in 2D that will be passed into the comparison algorithm for calculation of the differences.

Shiheng worked on the Text-to-Speech engine, where it could take in text instructions generated from a comparison algorithm and pass them into the voice engine to generate real time instructions. The time required for the voice engine to generate output files is low (< 0.5 seconds for an average length instruction) and the output voices are clear and easy to understand. We will continue to look into voice engine outputs and determine the best way to present these vocal instructions to users.

On the frontend, Jerry looked into the project considering several Python packages for better UI and user considerations. He determined that the Kivy package will provide us with lots of widgets that could be easily implemented and a much better interface than Tkinter which was what we planned to use originally.

Ray focused on learning the kivy language and creating the prototype frontend application. He created a prototype page for the main menu and the pose selection page. He is also working on the screen manager feature of Kivy to support page switch in the application and expeirmenting on openpose data interfacing with Kivy UI.

October 22, 2023October 22, 2023

Jerry’s Status Report for 10/21

Over the past two weeks, I contributed to writing the “Introduction” and “Use-Case Requirements” sections of the design report, as well as my custom pose/pose sequence pipeline. I did some designing on what I wanted the UI to look like and a basic flowchart for how I want the image upload to proceed, and added the pictures to the design report.

I also looked into researching academic papers for how accurate we can expect our joint angle based system to be for pose detection and whether it would translate well to different body types. I did not find definitive evidence that joint angle based pose detection would translate well to different body types, just that it is a commonly used method for pose detection. I did find one study by De Gama et al. That showed a 100% accuracy for pose detection among a small sample size using a joint angle based pose detection system. Based on my findings, we can have some confidence in the accuracy of our joint angle based approach to lose detection.

I also looked into several options for packages other than TKinter to use for the frontend of our app. Ultimately, I brought forward the wxPython and Kivy packages forward for the team to consider. We decided to switch from TKinter to Kivy for our frontend development after some debate. What won us over to Kivy was its continuous support from professional developers and that it was compatible with a Python3+ versions.

For the next week, I plan to create the frontend of my app to be able to allow users to upload custom images to the app. I will also discuss more with Ray about how we plan to integrate our code together. I will also work with Hongzhe to get Openpose up and running on my machine as well.

ABET question #7:

The new tool I am looking into learning is the “Kivy” Python package that will allow us to make a much more sophisticated frontend for our application. In particular I am looking at the various Kivy widgets pertaining to my pipeline, such as the FileChooserListView widget.

October 21, 2023October 21, 2023

Hongzhe’s Status Report for 10/21/2023

For the past 2 weeks including the fall break, we developed our ideas deeper and with more details from the design review documentation. Each member of us is also making progress on practicing our respective responsible technology.

Personally, I made the outline of the design review documentation, listing out the key points we discussed with the faculty so that the document and content are structured. I was also in charge of filling up certain portions of the documentation mostly on the overall architecture and summary.

I have also pushed the OpenPose usage forward. I succeeded in using the compile OpenPose executable to process reference image files of Tai Chi poses and generate JSON output. I will iteratively do that next week to process all reference images. Below is a sample JSON output. I was also trying to enable the Python feature for OpenPose. While the Python support is built, the sample python program can not be executed. I will dig into this issue more in the future and we always have the backup option to use the executable from C++ compilation instead of the library version of Python OpenPose.

ABET: The new tool I learned is for sure the software of OpenPose. I learned from scratch on understanding the brief architecture to the means on establishing the current environment on Windows. Given that the software has not been updated for years, I also learned a lot to gather resource from the internet and the github page to solve the missing components or incompatible module versions.

{

“version”: 1.3,

“people”: [

{

“person_id”: [

-1

“pose_keypoints_2d”: [

411.349,

275.523,

… (ignored for view length)

875.764,

0.543042

“face_keypoints_2d”: [],

“hand_left_keypoints_2d”: [],

“hand_right_keypoints_2d”: [],

“pose_keypoints_3d”: [],

“face_keypoints_3d”: [],

“hand_left_keypoints_3d”: [],

“hand_right_keypoints_3d”: []

}

]

}

October 20, 2023October 22, 2023

Shiheng’ Status Report for 10/21/2023

I mainly contributed to the Design Requirement and Tradeoff part of the Design review document. Starting from what we have in the design review slides and notes shared among us, I was able to quatify and justify requirements we encoutered in our project. The tradeoff part also plays a vital role inside the design document, as it explains decisions we made in the past weeks and why do we think they are in our best interest.

For the past week, I mostly researched my part of TTS engine and its applications to our project. During the implementation, lots of compatibility problems have been discovered on Python version, package compatibility, and output supports. After trying out a few different Python versions and attempting to install the TTS package on my laptop, I determined that Python 3.10 was the best fit as it supports all the necessary packages inside the TTS engine with most packages up to date. Other versions either reached the end of their life cycle or had issues supporting the latest packages that the TTS engine required to use.

With the package successfully installed and all requirements fulfilled, I generated a few .wav files to for demonstration purposes. The .wav files sound ok and is compatible with running locally on command prompt using the default voice engine provided. I’ll continue to research on different voice engines to make sure that the best among them is picked and consider that user might want to pick different voices while they are being instructed. I will continue to work on this part and begin to integrate this voice function into our project once our pipeline is mostly done.

Here’s a Sample message using default engine:

“Raise your arm by 20 degrees”

ABET #7 Question:

I have looked into tools like Text-to-Speak engines and try to understand the logic and algorithm behind with different voice models.

Additional knowledge including understanding pronunciation, syllables in English and what speed should the voice engine use to speak the words. Those parameters need to be adjusted to meet our design requirements and provide efficient feedback to the user.

Also looked into various online forums and design documents for the installation and learnt similar problems that other users previously had in installing older versions of the package. Learnt a lot more about Python packages related to machine learning and voice generation.

October 8, 2023

Ray’s Status Report for 09/30/2023

We start on our implementation process this week and everyone gets to work on their respective sections of work. I started learning and writing UI for our system with TKinter.

I got through many of the UI elements over the week and I plan to planning to approach the image display, video display, and database functionalities tomorrow. Below is the video I will refer to.

I also tried to get openpose to work on my laptop, but I am still getting errors based on CMake versions. My plan is to make Openpose run on my laptop as soon as possible, hopefully in the next 2 days.

I also looked through whole_body_from_image.py and keypoints_from_image.py examples in the openpose official repository. Base on the whole body example, display a sequence of poses in order can be realized by reading each image , setting up the pose datapoints in Shiheng’s cost functions, and then waiting for the user to input a correct pose. The timing could be an issue to implement and I might need to look into the Python functionalities to realize them in our system. Meanwhile, static pose evaluation can be realized with the following code as a reference:

Still, through discussions on the design review we presented this week, we noticed aspects of our project that requires more detailed considerations. In particular, the way we want to convey our pose dectection results need to be intuitive enough for our users. Also, the time interval between each verbal instruction should be different from the interval between each pose detection. I will take these into account when designing the user interface next week and reflect them in the design report that is coming up next week.

My progress for this week is still on schedule, though work for next week might be heavier than regular weeks. I plan to work some time during the fall break if the schedule for next week turns out to be too packed, but I will put priority on planning out how the api and ui should communicate and decide on a design for the ui.

October 8, 2023October 8, 2023

Jerry’s Status Report for 10/7/2023

This week I spent time trying to get openpose to build. I was unsuccessful due to files to install dependencies being missing from the release of openpose I tried to install. To be more specific Caffe is a necessary dependency and I had to manually install Caffe into the openpose directory structure. The build_win.cmd file fails to run as I do not have “ninja” installed. I tried to set the “WITH_NINJA” flag to 0 so CMake would build caffe using the Visual Studio C++ compiler instead of ninja. When I do this and try to run the file the flag seems to somehow get set back to 1. This is a very mystifying issue and I plan to ask Eric about it tomorrow and see if he ran into the same issue. In addition to this the weeks ABET question prompted me to think more about user privacy which led me to look into options and techniques for doing sanitization of user inputs in python and analysis tools to find potential security issues. The main vulnerability I am worried about is if User A has images stored on the app, and User B logs into the app and is able to inject malicious code through fields of user input (how long to record user for). I think this is a reasonable worry as we have a decent amount of user input into our system so I think user sanitization is a very sensible idea. So far I think the bandit tool looks somewhat promising as it is distributed in PyPI so it can be downloaded through pip. There are customization options for us to ignore certain levels of security issues, allowing us to focus on the most critical issues. In terms of user sanitization I am still looking at options and need to discuss them with Sirui tomorrow, but as of right now “regular expressions” seems promising due to its ability to filter out certain items in user input and it is a standard package in python so there is no need to install anything to use it. I was also involved in team meetings where we clarified parts of how our design would work, specifically the specifics of how we were going to evaluate users poses.
My progress is on time with the group schedule.
Deliverables I hope to complete by the end of next week is to help Ray get a basic UI design out for the design report and get openpose to build. I also plan to maybe look into and discuss more advanced tools for UI instead of just using TKinter.
In terms of engineering principles that our team will use to develop solutions, I personally focused on the principle of ethics as I wanted to make sure this app would protect users privacy. Especially with our app where there would be pictures taken of people in incorrect poses, this might be embarrassing for some people and something they wish to keep private. Towards this end I looked into incorporating user sanitization into our app to prevent “injection” attacks as we will be taking in input from the user. Additionally, I looked into tools the team might want to incorporate into the testing process.

October 7, 2023

Team Status Report for 10/07/2023

Week 10/7/2023: Continuing to revise design and initial project development.

On Wednesday, we finished our design review presentation, which we continued our discussion on our design during our Thursday night meeting. We were joined by professor Tamal who gave us valuable feedback on our current project and how we could improve our MVP framework to include more functionality. One of the things we adapted was the idea of evaluating users’ performance over a short period of time to understand the fact that they need to setup and transit into the position, eliminating the chances that they are doing a completely different posture but accidentally got a higher score due to the frames grabbed by our system, and provide an internal evaluation/scoring system for body postures. We still need to polish these ideas before next Friday, when the design report is due, before which we would need to quantify these thresholds and reflect the changes in our report.

On Friday morning, we received detailed feedback thanks to hardworking faculties on our design presentation. We believed that a detailed presentation was given but it seemed we still needed to clarify some issues that were not showcased clearly within our presentation.

The team worked throughout the week to perfect the details and narrowed down how we wanted to approach this project, which we determined that minor changes need to be made to our schedule. We have begun coding and incorporating systems gradually into our project, for which we already have Openpose algorithms working ahead of time thanks to the extra effort Hongzhe (Eric) put in to configure and run the system.

We’ll continue to work with faculty members in the following week on our design report and start data collection through imported poses we handpicked online from Taichi professionals, integrating Openpose API, and start developing the comparison algorithm. We currently have no plans to order any hardware equipment, and please stay tuned for our next update and design documentation.

ABET question:

Engineering Ethics: We highly respect the privacy of users due to the necessity of using cameras to capture body posture. It is possible that the camera could collect more personal info than posture, so we planned to make this a local application without the need for exchanging information with a cloud server. All collected data will be stored and evaluated locally without the need for connection to the internet, and users can choose to close the camera during the usage of the application if they feel necessary. (e.g. reviewing posture, attending to other businesses, or feeling not to practice)
Math model in comparing body postures: Using the idea of cosine similarity, we attempt to compare body postures captured from users to our standard preset provided by Taichi professionals. To account for differences in heights and shapes of people, we directly measure the joint angles of a person instead of directly approximating how similar the person is due to the differences of absolute positions of joints. Normalization principle could also be applied to postures to account for magnitude differences in vectors for varied body sizes.
Machine Learning model in OpenPose: In the posture recognition application OpenPose, we are able to use the trained convolutional neural network for recognizing the core body coordinates from the input video/image. Convolutional neural networks are based on simple batch processing, dot product of matrices and many techniques such as pooling and regularization to avoid outliers and overfitting. It utilizes all kinds of principles and formulas from Mathematics as logarithmic calculation to enable convolution and activation function. It also uses differentiation calculus for training the network parameters when back propagation.

October 7, 2023

Shiheng’ Status Report for 10/07/2023

This week, I put my focus more on implementing the comparison algorithm and normalizing body sizes for better results which accounts for difference in body sizes. Abiding our Gantt Chart, here’s my progress:

Progress:

Since our pipeline from OpenPose is still under development, I constructed some datasets myself (purely random) to test my comparison algorithm using cosine similarity. Cosine similarity measures the angular similarity between vectors, making it ideal for assessing body orientations. Additionally, I will explore techniques to normalize body sizes to enhance the accuracy of these posture comparisons in the following week.

To facilitate comparison, each body posture is transformed into a vector within a multi-dimensional space. Each dimension within this space corresponds to a specific key point (in our case, joint) detected by OpenPose. For instance, if I am receiving from OpenPose output consists of 18 absolute positions, each posture is then represented as an 18-dimensional vector.

The implementation requires packages including numpy, normal Python environment and VSC for development. I used 3.11.5 in this case since 3.12 was just released a few days ago which could have compatibility issues with package supports. I’ll make sure to keep targeting the latest version for optimization and support of the packages.

Implementation on absolute position (Planned for next week):

To account for differences in body sizes and variations in the distances between body joints, it is imperative to normalize the posture vectors. The idea I have now is to normalize every person into my body size, which is around 5’10 and 170 lbs (need to be justified next week with other members of the group). This will be an add on to the cosine comparison idea to determine the absolute position of users. The idea of using absolute position eliminates the possibility that the user is doing a completely different posture and scores a high similarity due to the nature of cosine similarity. The normalization process involves dividing the coordinates of key points by reference length, which is typically the distance between two consistent points (two shoulders, the ratio between upper and lower body, the ratio between calf and thigh). This procedure scales all joints proportionally to this reference length, facilitating a relatively standard comparison.

October 7, 2023

Hongzhe’s Status Report for 10/07/2023

For this week, we are starting to make progress on the project’s detailed technical design in parallel. Specifically, I am pushing forward on the OpenPose application.

As specified in the last week, I was able to get OpenPose compiled on Mac with C++ but not to run. I first spent a couple hours debugging the library files on Mac. Then I decided to move to the Windows system given that more developers use it on Windows and the more customizability of the system. I was finally able to run the sample real-time OpenPose application on Window as shown in the image below. For our and future students interested in using OpenPose on Windows, here are the necessary things you need to do:

Download NVIDIA driver and CUDA support in case you want to use GPU mode
Download OpenPose models manually and copy to the models folder correspondingly
Download the third party modules and copy to the 3rdparty/windows folder
- https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/1602 is the link containing the above model/3rdparty download link
Update PyBind11 in the 3rdparty folder to the newest version to get rid of the error in compilation
Download CMake, Visual Studio 2019 Enterprise
For detailed information, go through the installment guide in OpenPose github page

I also represent the team in the Design Presentation to talk about our new project in general and detail. Personally I think my work is in good shape and pace.

For the next week, I am hoping to make OpenPose more stable, given that it currently crashes sometimes for incremental builds. I will also research the Python APIs and how to export OpenPose to custom applications.