This week, I was able to connect a live pipe between the Scarlett audio interface and Aubio to generate a real-time stream of pitch values . From initial testing I was able to generate a mapping to understand the pitch value output. It follows that middle C (C4) maps to a value of 60, with every semi-tone increasing or decreasing the value by one point. I did notice some potential issues from the live audio testing.
- Any elongated or plosive consonants (“ssss”, “ttt”, “pppp”) caused a spike into the 100+ range which is out of the bounds we are considering for expected pitch as it would put the sound in and above the 7th octave (above what a piano can play). Accounting for that is important but since it is so high, it might be easiest to simply ignore values outside the expected range.
- Chords behave abnormally. When two notes or more are played simultaneously, the output value was either lower than any of the pitches played or simply read “0.0” which is the default no-input or error value output. I believe there is a specific way to handle chords but this requires further digging into the documentation.
- Speaking seems to consistently generate an output of “0.0” which is good, however some quick transitions from speaking to singing yielded mixed results. Sometimes the pitch detection would work immediately and other times it took a second to kick in.
- Lastly, the pitch value provided has a decimal that corresponds to the number of cents each pitch is off by relative to the fundamental pitch. Accounting for notes that are within +/- 0.5 from a just pitch will be important. Vibrato varies from person to person but for me, at least, it seemed to be within that tolerance which is a good thing.
Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?
I am just about on schedule. This is an ongoing learning progress but being able to sing live or play live and have a pitch value output is promising.
What deliverables do you hope to complete in the next week?
I hope to refine and create a more robust output that takes the average pitch over a short duration to create a pitch event list that is more parseable for the timing algorithm.