This week, I conducted considerable research into features a singer’s vocal performance that can be used to discriminate between good and bad singing. I stumbled upon a few papers that discussed how we can do this, most importantly this one and this one.
In the first paper, the authors described 12 desirable characteristics used to define good singing, as described by experts in the field, and they adapted existing methods that measure those traits and aggregated them to generate a metric which they call Perceptual Evaluation of Singing Quality (PESnQ). However, to obtain this measure, a singer’s performance must be compared to an exemplary performance, which is out of the scope of our project. Most literature in this field of singer evaluation follows this methodology of using a template by which to compare a performance to.
The second paper, on the other hand, details a method for automatic singing skill evaluation using only two of the features described in the first paper, namely pitch interval accuracy and vibrato. Since our project now aims to improve specific aspects of a user’s voice through tailored exercises, we found exercises that can be used to improve a singer’s vibrato, but I could not think of a way to use the calculated the vibrato feature, as detailed in the paper, to show that the user’s singing has improved over time. Therefore, I decided that the only features of singing we will focus on are if the user’s pitch is on key, if the user is straining their voice at a particular note, and if the user is struggling to transition from one note to the next.
In my literature review, I also decided that we will be using the Yin fundamental frequency estimator, as described here, to estimate users’ pitch. This algorithm is based on the autocorrelation method for pitch detection, but performs several post-processing steps that dramatically improve the pitch accuracy. I found an implementation of this algorithm online which I plan to use instead of implementing myself.
This upcoming week, I will be testing the Yin PDA implementation against randomly generated pure tones within the range of tones we are considering (C2-C6). I have also started working on a clap detector this week, which I plan on completing next week.