I once again worked on improving the accuracy of my CNN network this week. I expanded the training set to now include training set A and B versus A from last week. On initial testing, the results actually decreased to a consistent 71% from 79% from last week. When analyzing the testing data more thoroughly, the data varies from 5 seconds to 120 seconds which is a huge variation. This affects the spectograms because when it is computed, I have to specify a segment duration since it compares frequency to time. If the sounds are not as long as the segment duration that portions of the spectogram will be blank. Therefore to improve the data, Eri and I decided to shorten all the sound files to 5 seconds long and test with these results before researching a more complicated LTSM method. We are also planning on using Eri’s Shannon Expansion method to reduce noise in the samples. Upon testing her code, we noticed that some samples weren’t normalized to be centered around the 0-axis, which messed with the calculations of the shannon expansion. Therefore, we also worked on normalizing the data to now be centered around 0. Tests have to be done tomorrow to see if any improvements to accuracy will happen before our mid-semester presentations.
We will be on schedule once we test the newly processed data.
Next week deliverables will depend on the results we get this week, but if improvements have to be made to the preprocessing, we will try and use LTSM’s instead of just segmenting to a strict 5 seconds.