E4: Final Project Video

Description: 18-500 ECE Capstone Final Project Video

Technical Note to our Professors: The app is almost fully-integrated, with 2/3 components already combined and the last almost there. Components work completely individually, and the outputs of each component is the only input required for the next. The output of the song selection and step detector sensors are the only inputs needed for the time modification algorithm. The output of the warping algorithm is the music to be player. Although it does not work in real-time, proof of concept was established and given a little extra time, could be completed.

Aarushi’s Status Report for 4/26

  • DTCWT PVOC adjusting matrix dimensions for each level of transformation. Goal was so that DTCWT PVOC could time stretch & shrink signals by any factor. Current implementation ONLY allows for time stretching by 1/x where x is a whole number. – Spoke to Jeffrey Livingston about his implementation.
  • Adapted DTCWT PVOC functionality to be referenced from function that performs signal pre-processing for the app.
  • ran numerous experiments on STFT PVOC & DTCWT PVOC to compare their performance. STFT is by far faster & more accurate. Experiments varied music choices, and speed of music change.
  • Ran numerous experiments on STFT PVOC alone with varied speed of music change to compare to our system requirements music tempo range, and computational speed. Not completely sure how to interpret the results.
  • Completed final presentation slide deck

Team Status Report for 4/18

This week we focused on integrating our independent components.

The matlab code was ported to C++. Documentation for how to integrate the C++ files/functions were created. Upon integration, we ran into issues with (1) typing input variables from Java to C++, (2) translating some matlab functions to Java functionalities — namely, audioread. This is to be done by the Android OS. However, inputting a signal from Java to C++ causes a challenge in typing and storing variables.

To work around this, we are considering creating a C++ function manually to “audioread” wav files. This implementation, however, would then require our java functionality, that initiates music modification every 60 seconds, to be written and integrated in C++. In parallel, we are experimenting with performing “audioread” in Java through the Android OS and storing, typing, and transforming the signal matrices as necessary.

Details about the high-level approach for JNI usage can be found in Mayur’s status report, and details about the time-warping integration and audioread function can be found in Aarushi’s.

Aarushi’s Status Report for 4/18

I completed porting the matlab code to C++ for the phase vocoder & stft & istft files – I had to refactor parts of the original matlab code and variable types for this conversion to be compile successful. I also created documentation for how the C++ files should be integrated / used.  This documentation was for communication between our team members, for later reference for myself, and for reference for any future user:

This is for IF we modify a song only once, not every minute. It’ll be easier to do that after.
 
I think the android class we need is AudioTrack (linked).
 
What you want to do is create a function as follows (i’m gonna write this in pseudo python/java/matlab/comments code sos):
def TSM(song_name, original_bpm, desired_bpm):
     ratio = desired_bpm / original_bpm
     n = 1024
     # audioread is only a matlab function. doesn’t port to C++
     # based on my google searches so far…
     # we want an analogous JAVA function that is native to androids OS like AudioTrack fns
     [original_signal, sampling_rate] = audioread(song_name) 
     # can be broken up into
     audioTrack.getSampleRate() of song_name
     # reading the song needs to output a matrix / array of the audio signal
     read song_name to signal (not sure which fn in AudioTrack does this) (this link may help)
     modified_signal = pvoc(original_signal, ratio, n)
     return (modified_signal, sampling_rate)
# calling the function
desired_BPM = 160 # equivalent to running pace
original_BPM = 140 # avg of eye of the tiger bpm got from online
playback_song, sampling_Rate = TSM(“IntroEyeoftheTiger1.wav”, original_BPM, desired_BPM)
audioTrack.play() of playback_song
So far, in actually working with integration, we may not actually be able to use AudioTrack. An idea we are tossing around is writing “audioread” in C++. I am skeptical about this because matlab Coder creates C++ files for all matlab built in functions. Coder does not do this for audioread, however. This being a missing basic functionality of Coder leads me to believe that what we are searching for is NOT a basic functionality. Which is why, blog posts I have read suggest performing “audioread” from the integrated device’s OS. However, there are complicated typing and inputting signal matrices from  Java to C++ with this method as well. Both not ideal methods will need to continue to be experimented with.

(DTCWT is still a work in progress.)

Team Status Report for 4/12

After our demo, our team decided that we would work on finishing up loose ends on our individual components. This included (1) porting the song selection algorithm from python to java, (2) porting the audio modification component to C++, and (3) writing code that would break up a song into 60 second chunks before warping. This would allow us to start integration the following week.

We made this decision based on our conversation with Professor Sullivan and Jens as they heavily reemphasized our goals for the end of the project. Additionally, we realized that our work progress was behind in relation to our Gantt chart. According to the plans, we would have completed integration this week and would move on to extended features in our following two weeks of slack time.

Thus, we will take what we have so far, integrate, and continue advancing on our individual parts in the remainder of our slack time, and then integrate again.

Aarushi’s Status Report for 4/12

After our demo, our team decided that ensuring we could integrate is our highest priority considering our timelines. According to our Gantt Chart, we were to be done integrating by the end of this week. We are grateful for having budgeted sufficient slack time to work through our project. Thus, along with working on the ethics assignment this week, I focused my energy on porting my MatLab code to C++. This is essential for integration purposes.

Aarushi’s Status Report for 4/4

I had the phase vocoder based on an STFT working on a monophonic signal last week. I tried playing a snipper of a polyphonic song that may be used while running. I used the intro of “Eye of the Tiger”. This did not directly work. The main problem was that the song’s signal as an array was too long to play on MatLab online. Despite having been able to complete my advanced signals classes with the online version, it seemed necessary for me to download the software on my laptop so that I could ACTUALLY test songs. I started this process, but ran into issues here because my mac is 8GB and has been low on storage/disk space for 3 years now. I spent a few hours to moving large folders and files online but still did not have enough space for the software. As a result, I was unable to download the software program. However, I have a new laptop coming in within the week. I will be able to test full songs on matlab on that.

In the meantime, I still wanted to make relevant progress this week. I passed a smaller snippet of the polyphonic signal through the phase vocoder based on stft.

I also deconstructed & reconstructed a signal with DTCWT. I did this with DWT last week. However, DWT results in shift invariance post reconstruction. DTCWT incurs a drastically lesser shift invariance. I tested this out to find this concept to be true.

 

Team Status Report for 3/28

Status Report

Risk Management Update (most remains the same)

We have built risk management into our project in a couple ways. First in terms of the schedule, as we mentioned before, we added slack time into the schedule to make sure we can account for any issues we run into as we develop our project. This allows us to work out these issues without running out of time at the end of the semester.

From the design side, we have backups for potential problems that will come up while working on the project. We have four specific cases for our metrics that we laid out earlier. The main risk factor is using the Dual-Tree Complex Wavelet Transform Phase Vocoder. If it does not work, we will fall back on using the STFT based Phase Vocoder that we know works well for music. We may attempt to implement our own, or use a library on GitHub. We have done our research and are putting most of our time into this aspect of the project since it is the focus point that differentiates us from other apps like this, while also being the primary risk factor. 

The second biggest risk factor is the accuracy of the step detection from the smartphones. We have done testing and seen that the phone meets our accuracy requirements, so hopefully this will not be an issue, but if we find out during implementation and testing that the accuracy is not as good as we thought, we are going to order a pedometer that we can collect the data from instead. 

The only change in our risk management is to account for lost time and lost ability to work physically together. This may deter integration, and full system testing capabilities. This is because we decided to use the Samsung S9 as our base test device since it had sufficient step count accuracy. With our new situation, Akash is the only member who has access to this device and he was not an initial test subject of our running data. Our discussion with our professor, Professor Sullivan, and TA, Jens, helped generate a reasonable plan to possibly adjust. This will serve as our backup plan. We will each write our individual parts, test them independently, and create deliverables that convey their functionalities. The latter mentioned deliverable is a new addition to our project. This new addition is to account for the case that we are not able to integrate the individual components. Thus, while we will aim to integrate the components, it will be a challenge and stretch goal.

The smaller risk factors involve the timing of the application, which will involve us widening the timing windows for our refresh rate, and minimizing the time it takes for the app to start when initially opened.

In terms of budget, we have not run into any issues yet, and do not plan on, since we have most everything we need already. 

Aarushi’s Status Report for 3/28

In working on the time-scale audio modification program component for this project, Prof Sullivan suggested I start in Matlab and write a program that can read an audio file, break the signal into its wavelet components, and reconstruct the signal from its wavelet components. I played around with this. In the wavelet decomposition at levels 2 and 3, the reconstruction sound was faster, thinner, and of considerably higher pitch. Wavelet decomposition at level 1, however, allowed for a reconstructed signal that sounded the same as the original. This is true for ‘db2’ and ‘db1’ wavelet types – we will be using ‘db2’ since it is biorthogonal and our design report details why we chose a biorthogonal implementation. I completed this task this week.

In addition, I worked on a STFT based phase vocoder in Matlab. I was able to successfully test time-scale audio modification through this method with simple one layered musical signals. There were no interfering audio artifacts after the modification of the signals. Having this done means that 1. I can use this phase vocoder implementation as an example as to how to integrate the DTCWT in place of the STFT, and 2. the backup plan for how we would modify the signals is complete in Matlab.

Next steps are to test these on more complex signals, and implement the phase vocoder with DTCWT.