This week we had a lot of progress. I worked closely with Aleks in order to fully integrate the STT scoring into our webapp. I made some minor tweaks to our scoring in order to improve Deepgram’s performance with singing. The improvement was not significant, but it did show good signs and if I have more time I will go back and make some more changes before the demo. Aleks and I worked together to decide on the UI and actual score system and came up with a mostly final version as well as finishing up the full integration. More on my own, I have made large strides on the ESP32 functionality for receiving the bluetooth signals form the arduino. At the moment, it is capable of receiving both audio and accelerometer data, but we are having issues with actually playing the audio. The general idea (hardware wise) seems to be working, but we are currently making changes to the sample rates and sending rates between the two systems as well as the delays in writing to the DAC on the ESP32 side. My current hypothesis is that the issue relates to the buffering systems we are using, as both sides have a buffer. On the Arduino side, we are stacking up a certain amount of samples on and then sending them all in one block to be buffered out by the ESP32. The ESP32 then receives and also queues into a buffer, and outputs to the DAC on a timer. This is where the issue comes into play. Either we are sending too many samples at once, or we are writing the samples to the DAC too quickly. We are continuing to troubleshoot this now but it is the final major roadblock in our project so we are almost at the end.
Final Presentation Slides
Status Report 4/19
This week, I was able to get the first functional iteration of the left-right subtraction working. After some testing, I found that it is attenuated almost as much as I had hoped originally, however, it is acceptable and overpowered by the instrumental track. This update means that now our UI is capable of playing spotify tracks throught the vocal removal system. In the coming weeks I am continuing to make tweaks in order to make the overall output volume significantly louder, as the current system does not output a strong enough signal.
The largest remaining task for me is coming from integrating the bluetooth receiver system and connecting the microphone signal to the speaker.
Hugo Status Report 4/12
Over the last two weeks I have had a lot of good progress with regards to finalizing our hardware. After all of the parts arrived last week, I was able to build the first full prototype of the filter and had a lot of issues with it. The sound quality was very poor, with mostly crackling and static at the speaker output. I looked into what issues could cause this, and decided that the most important part to fix were the loose connections. Because I am using a breadboard for prototyping, I did not have the ideal protoboard pin connections to hook up the input jacks that I ordered, and so I had originillay just tried to essentially tie on the wires, but these connections were extremely poor and I feel that was a large reason for the struggle. However, I have now soldered on the wires and made firm connections which hopefully will create a big boost in performance.
Additionally, I realized another obvious issue was that I was building my setup with a single 9V battery with a +Vdd and ground. This is an issue because the opamps require +Vdd and -Vdd, and so obviously this was leading to cutoffs with my audio output. I am now looking into two possible solutions, either using two batteries with the intersection as ground, or creating a virtual ground and splitting the 9V. I am currently leaning towards the former and am hoping to test it out this week. Additionally, I have started building the circuitry for connecting the microphone input into the circuit. There is some worries about how well the ESP32 bluetooth part we are using will work but I have found some examples online about how to use it to build systems like bluetooth speakers and so I am feeling better about the forecast for that.
That will be the first major part of implementation for my work, as the vocal removal is already connected to our webapp.
Hugo Status Report 3/22
ReplyForward
|
Team Status Report 3/15
What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?
We still have worries about getting our components and pieces delivered on time, but we have now placed orders for most of our fundamentals. Because of this, we are feeling on track to be able to overcome this block. A new risk is in recent changes made to our design. We scrapped our original scoring idea and so we are now a little bit behind schedule again and working to get back to speed. As far as contingency, we have our most simplistic method of scoring ready to put in if it does not work.
Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?
We have adapted our scoring system from the original method. Our original method would execute the scoring primarily via hardware, subtracting the final speaker output from the original music track. We changed because due to advice from Professor Sullivan, our scoring would be inaccurate and not particularly useful. Even the natural differences in people’s voices would cause unpredictable differences in the output signal. There is only a small additional latency which will come from using software speech-to-text systems, but this will be the only change and will not largely affect our ability to provide a response in real time.
Status Report 3/15
Accomplishments this week:
This week, we sought to address our number one concern which was not having our parts on time. I ordered most of my crucial hardware components, mainly the speaker and splitter wires I needed to get started building the filter system. In addition to this, after some feedback from Prof. Sullivan, I reassessed our options for scoring the user’s audio. Originally, we had a feedback system which would subtract the final combined output from the original song. Because this would be overly complicated and provide very poor quality feedback, I looked into a new system to do it all on the software side. I helped pivot our design to include a speech to text system that compares the lyrics the user sings which we will now use for scoring instead.
Schedule Update:
I am still behind schedule because we have not prototyped or built anything. The GANTT chart states that I should’ve been wrapping up most of the work for vocal removal and scoring by now. However, we will redistribute because since all of the design work is laid out and most has been tested, I should be able to quickly catch up with the real prototypes for these parts.
Next week:
I will start by trying to source op amps and other fundamental components for breadboarding the filter. Once I know if this is possible or not I will order the components on Monday in order to make progress. By the end of next week, I want to have either prototyped our filter or made the first iteration for our scoring.
Design Presentation
D1 Team Status Report 2/15
What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?
Our biggest risks at the moment are coming from our signal processing for scoring and from our ability to do vocal removal with low latency. Scoring is risky because as we have recently moved away from pitch detection, we are now developing new ideas for metrics and if we are not able to select one and start testing it as soon as possible, we leave a risk of missing out on a major component of our gamification. For our vocal removal, we have confirmed that it is possible with low latency via software, but we still need this to become a hardware system ideally. Both risks are being managed by quick decision making, as we are finalizing design choices right now and are hoping to be able to quickly get to a testing phase and prototype these pieces as soon as possible. As far as contingency, for our scoring, we have a range of options including some extremely easy (but unideal) work arounds. For the vocal removal, there is always the option to use an AI model to do the work which is proven to be possible.
Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?
The primary change is switching from a bandpass / bandstop filter to the subtraction method taking advantage of stereo output. There is not much as far as surface level cost, but this does limit our potential music library to only songs that exist in stereo format. It will simply require a small check with the Spotify API to verify before allowing the user to confirm.
Additional Questions:
Hugo’s Status Report 2/15
Continuing on where I left off last week, this week I spent my time looking into the specific methods for implementing the system. Because I am our primary audio processing lead, my work revolved around fleshing out not only systems for vocal removal, but also how generally getting the full outline for how our data is being passed around and what kind of processing we need to do. First, after our proposal presentation, we had a slight change of path with regards to how we want to do scoring for our game. Originally, we had intended to work with pitch detection but now wanted to find something that more accurately captured the karaoke experience for the average user. We came up with a series of new metrics and strategies, and are continuing to analyze and pick a specific plan. Then, I took time to investigate the vocal removal aspect. Because this is currently such a fundamental part of the project it is imperative that this works as expected. I used matlab to do some tests by passing through audio files to see the effects of our original bandpass and bandstop filtering idea. In the end, this was not effective. Although the bandpass could kind of get a weak signal that almost isolated the vocal (often leaving in percussion), the bandstop filter was next to useless at removing the vocals from the backing track. So, I moved on to testing methods that take advantage of audio in stereo format to cancel out the vocals by subtracting the left and right channels. In the end, this did provide favorable results while still allowing us to work with a hardware system as we had originally hoped. Then I took time to read into the actual wiring and built up a design for splitting the two audio channels, passing through our subtractor, adding in the microphone input, and outputting to a speaker. I also took some time to look at possible speaker options and assessed whether this was a part that we wanted to allocate substantial amount of the budget to, as it is crucial that the sound quality is high for the product to reach our user requirements.