I did further research into alternative OMR technology earlier in the week as I was having trouble building a custom version Audiveris on Mac. Since the default output is an MXL file and not XML, I wanted to edit the source code to build it to my needs. I figured out all the modifications necessary to make, however, ran into a dependencies issue. Audiveris requires an older version of an OCR (optical character recognition) library called tesseract. Since my Mac is of the M2 type, it was practically impossible to get a hold of the older version of this software for M2. This was as far as I could get:
I was able to download the relevant jar files, and made sure to specify the classpath to link the files, but it seems that I need dynamic link files as well which will be impossible to get.
This led me to look for an alternative python-based OMR solution Oemer, which essentially runs a pretrained ML model on a pdf of music. The simplicity of usage was great, however runs take a few minutes to complete and upon reconverting the xml back to pdf form I was very dissatisfied with its accuracy on the Charlie Brown example from the Team Status Report (probably like 50%).
Last week, I mentioned how Audiveris was able to run fairly well on windows, though it was outputting MXL files which was annoying as it read like a compressed binary. I eventually discovered that these MXL files are just zipped XML files, and unzipping a few kB per page would hardly be expensive for meeting the parsing time requirements that we set.
Eventually, I will write a bash script to run the OMR, callable by the GUI application from our Proposal. The only thing to keep in mind is that it will have to use Windows commands (yikes). This is a sample of what the commands would look like.
Running the OMR:
This is able to generate the MXL in the default output directory (though there is another parameter that can be used to specify the directory). It also produces a log of what was executed:
If you check the time stamps of the log, you will see this took roughly 11 seconds to parse the single page, which is very reasonable and should not be too cumbersome for our end user.
Previously I was solely running the omr from the Audiveris GUI which though pretty would not be ideal for our pipeline app.
Audiveris GUI build:
Next week I will integrate the file generation and unzipping into a preliminary version of the bash script mentioned earlier. I also hope to test the OMR on more music scores to come up with a numeric metric for comparison with our goals. My current progress is good and is on schedule.