For this week, I worked on improving the accuracy of my CNN by training it on more data. I found a dataset through PhysioNet/CinC Challenge 2016’s dataset (https://physionet.org/physiobank/database/challenge/2016/), which contains a total of 3,126 heart sound recordings. Previously, there were only a little over a hundred instances of data in my previous dataset, I manually preprocessed the data to split the training/validation/testing set, so this time I had to code python scripts to do this for me since I am handling a larger dataset. I first tested on the training-a folder which contains 409 audio files, so the training/validation/testing set each contains around 136 sounds, which is a huge increase from the prior 60. When training my CNN on this data now, it consistently produced an accuracy of 79% accuracy which is a huge improvement and close to our goal of 85%. Although almost always around 79% accuracy, since the indices are being randomized at initialization, the accuracies reach 29% at times, which definitely has to be fixed. I have to improve the initialization of my CNN to prevent this issue. I also plan on testing it on the full dataset in the coming week. The reason I test on only the A set is because the description wasn’t very clear on the differences between the sets, so I’m not very sure if the audio between set A and set B were taken differently. This is something I definitely have to test with to increase the amount of data I can train my network with. I am currently behind schedule because I had to switch datasets midway through, but I should be on schedule next week once I finish training my network on the entire dataset.