Rahul’s Status Report for 2/18

Since our OMR solution will be run on Windows, this week I put some work into setting up the shell scripts and python actions to call such a script. While our main hub application development won’t start for a few weeks, I still wanted to build a skeleton of functionality for calling the OMR without the default Audiveris GUI that could be modified later on. For this I had to learn some features of the .ps1 or Windows powershell scripting language by consulting stack overflow. Though the syntax is not as kind as bash, the operations remain the same, and I was able to allow python to execute it via the os module. I recalled some libraries from 15-112 for file path opening through GUI and decided to incorporate those into the skeleton code, as this will make our UX better come app design time. 

I also have spent time preparing for the design review presentation next week, as I will be delivering the presentation on behalf of my team. In the effort to expand sections of our block diagram, I felt it best to segment our project in three dimensions: a transcription phrase, a scheduling phase, and an execution phase. 

As will appear in our presentation:

I hope this will provide our audience(s) some clarity to some of the uncertainties regarding technicalities of our project. By doing this, I uncovered that our notes scheduling was defined rather weakly, and deserves more planning time. As a group we knew that converting music scores to MusicXML format was the way to go and that the RaspberryPi could go from there. After generating the XML with Audiveris, and trying to move forward with its output, we realized how much extraneous information there is just in the readable XML. This led me to do some digging on open source XML “condensing” code, just so that it could be organized into data structures that might be more easily accessible and operable by our (to be determined) mode of scheduling. Fortunately, I found that MIT has poured in years of experience and expertise into developing music21, a python module for importing music file formats for conversion to data structures that can be easily traversed or manipulated, and permitting export of different file types or playing imported source directly (Plus, they have awesome documentation). Considering the RaspberryPi will be switching on and off the solenoids from a python script, I can foresee having music21’s to preprocess the XML being an important intermediate step. 

In terms of staying on schedule, I needed to configure the OMR to output XML/MIDI. I consider this accomplished, since MIDI was an idea that was not necessarily needed (plus I found there are many resources available for XML to MIDI conversion). Since music21 will be able to play back our XML, our sound quality testing will be facilitated as such. Next week, I will have to work with Nora and Aden on formalizing our scheduling better to determine most if not all of the necessary transformations of the transcribed XML. Hopefully, I may get to writing a portion of the corresponding code.



Rahul’s Status Report for 2/11

I did further research into alternative OMR technology earlier in the week as I was having trouble building a custom version Audiveris on Mac. Since the default output is an MXL file and not XML, I wanted to edit the source code to build it to my needs. I figured out all the modifications necessary to make, however, ran into a dependencies issue. Audiveris requires an older version of an OCR (optical character recognition) library called tesseract. Since my Mac is of the M2 type, it was practically impossible to get a hold of the older version of this software for M2. This was as far as I could get:

I was able to download the relevant jar files, and made sure to specify the classpath to link the files, but it seems that I need dynamic link files as well which will be impossible to get.

This led me to look for an alternative python-based OMR solution Oemer, which essentially runs a pretrained ML model on a pdf of music. The simplicity of usage was great, however runs take a few minutes to complete and upon reconverting the xml back to pdf form I was very dissatisfied with its accuracy  on the Charlie Brown example from the Team Status Report (probably like 50%).

Last week, I mentioned how Audiveris was able to run fairly well on windows, though it was outputting MXL files which was annoying as it read like a compressed binary. I eventually discovered that these MXL files are just zipped XML files, and unzipping a few kB per page would hardly be expensive for meeting the parsing time requirements that we set.

Eventually, I will write a bash script to run the OMR, callable by the GUI application from our Proposal. The only thing to keep in mind is that it will have to use Windows commands (yikes). This is a sample of what the commands would look like.

Running the OMR:

This is able to generate the MXL in the default output directory (though there is another parameter that can be used to specify the directory). It also produces a log of what was executed:

If you check the time stamps of the log, you will see this took roughly 11 seconds to parse the single page, which is very reasonable and should not be too cumbersome for our end user.

Previously I was solely running the omr from the Audiveris GUI which though pretty would not be ideal for our pipeline app.

Audiveris GUI build:

Next week I will integrate the file generation and unzipping into a preliminary version of the bash script mentioned earlier. I also hope to test the OMR on more music scores to come up with a numeric metric for comparison with our goals. My current progress is good and is on schedule.

Team Status Report for 2/11

This week, we presented our project proposal to faculty and fellow teams. We received some insightful questions that we should keep in mind for our design. For instance, one question asked how we would interpret key words in music such as “rubato” that are not associated with specific tempo values. 

Our project includes considerations for safety, cultural, and economic. In regards to safety, our project will likely be dealing with a large amount of power due to the high current required by the solenoids. Limiting this power is an important consideration for the safety of us as well as the user. Culturally, we recognize that our sheet music parsing is centered more on Western styles of music which puts a limitation on the style of music we can play. Importantly, our project aims to lower the cost associated with piano accompaniment. Since the cost of hiring a professional piano accompanist is high, we can provide a more inexpensive alternative.

We have also begun the process of looking ahead for parts that will be needed. Since we have uncertainties about power consumption, we will be ordering and testing different solenoids for our key press mechanism. The biggest risk associated with our project so far comes with the issue of powering our worst case number of solenoids at once. To mitigate this risk, we have considered reducing the max number of keys pressed to three keys at once. This would reduce the current draw to below the max current of our existing power supply while still allow us to have a polyphonic system that can cover a three note chord.

Our identified OMR solution is looking pretty good so far. We have decided to go with Audiveris, and here is a sample of the transcription.

Original Music Pdf file:

Corresponding Transcribed XML

Rahul’s Status Report for 2/5

This week, I worked with Nora and Aden to finalize proposal slides. In particular, I contributed mostly to the technical challenges, UI application mockup, and software solution approach slides. I constructed my portion of the Gantt chart schedule and worked with Nora to settle dependencies in our timelines between production of XML data and processing of this data. I also wrangled with the initial setup for our website.

I have done some digging for good OMR tools, and have played around with the following. This first project I configured on my mac, but certain dependencies are unavailable. Here was another project that was research backed with a pretrained ML model. I was able to build and run this ML software, however I found its accuracy to be around 70-80% which is a little low for our project standards. Additionally, it only was capable of producing a monotone interpretation of a music score. A third project I looked at was Audiveris. After installing relevant JDK toolkits, I was able to get it working for Windows OS. I think this may be the way to go. It seems to have a robust note tracking algorithm in play, however it is only capable for converting scores to MXL format. Hence, I will need to do further research on how to convert this MXL format into XML or equivalent.

I believe I am on schedule at the moment. I will have to drop the small possibility of manually generating and training an OMR machine learning model. Just from the work and documentation that I have read, such a task is a capstone project itself. Ideally within a week’s time, I will have determined the best OMR solution for our project and have it up and running. To reiterate from the proposal, this solution will parse the notes into an XML/JSON style structure with >95% accuracy.