Rahul’s Status Report for 4/29

For the final week of classes, I was able to accomplish the tasks I outlined for myself last week. This includes building a UI accessible cache prompt, enabling   a disconnection mechanism from the rest of the system, as well as creating disappearing messages to provide an informational feedback to the user on application events. The cache prompt works by checking to see if the music score that the user is inputting has already been processed. If so, the user is prompted with a message saying whether they would like to use the existing xml or create a new one (which would take a while). Since Audiveris takes several seconds to translate each input page of music into xml, it would be beneficial for the end user to save on computation time here. If no matching xml is found in the cache, the cache prompt is not displayed and the application goes straight to running an OMR job.

Connecting to device was made possible through a large sky blue button in the app, but if there are complications for connecting such as the USB wire coming loose, then the application should recognize and report such things. Previous print statements that I had in the code for when the device connected or python exceptions for the connection breaking have now been replaced by color coded disappearing messages that show as an alert at the top of the app. Additionally, any misinputs such as non pdf/png files or files that the OMR fails to process are also reported in this fading out alert message style. Lastly, just for the sake of having it the “connect to device button” now becomes a “disconnect from device” button after having connected. This was more complex under the hood as I had to clean up the communication threads each time the USB port was closed, and then generate new threads for a reconnection so as not to run into problems writing or reading from a closed port.

Here is a view of the essentially finished accompanyBot GUI showcasing the new cache message along with a fading alert at the top:

This simplicity of the updated display hides away all the complexity from the end user which was the intended goal of this design. To verify the new additions I tried connecting and disconnecting an arduino multiple times, sometimes unplugging, sometimes disconnecting through the software button. Everything there seems to run smoothly. Performance tests integrated with the notes scheduler and OMR software were detailed in previous status reports.

Thats all for my work into this project. Thanks for reading.

Rahul’s Status Report for 4/22

Since there was a gap from the last post, I completed a good amount of the work pertaining my last two tasks in the updated Gantt chart. This includes finishing the communication protocols between subsystems and running some tests for the project requirements. Last report, the missing piece of intercommunication was mentioned to be the ssh/scp format for relaying xml files from the user application to the notes scheduler RPi. This subprocess has been resolved with no issues assuming both devices are present and connected to the campus internet. I had to do some research on how to specify a timeout parameter for making the scp connection in powershell, and incorporated this into the design as well. Otherwise, the application would freeze while trying to connect to ssh for much too long. Additionally, setting up scp required both Nora and I to store the appropriate keys in the ssh_config to avoid having manual password entry every time an xml file is sent.

Regarding the testing, I am mostly done with performance tests. I was able to make qualitative and quantitative measurements for the Audiveris accuracy on reading notes. The quantitative measurements involved me counting the actual number of note heads and rests in a piece, generating the xml from the notes, reconverting this xml back into sheet music, counting the number of correctly placed notes/rests, and taking the ratio of correct markings to total markings. For the most part, results meet the design requirement of 95% accuracy on average for basic to moderate difficulty pieces. This is how the OMR testing results look:

Medium had either 4 note chords in one of the two piano staves and Hard tended to be more of a solo (this is supposed to be the “accompany”Bot) . These pieces were just tested to find the limits of the OMR solution. I also tried taking a picture with my phone of a sheet of music in okay lighting and the OMR seemed to have around 50% accuracy, but this too goes out of the appropriate inputs the accompanyBot should receive. Also note the first two pieces had too many notes for me to manually count so I just took the ratio of time where the song sounded correct after playback through MIDI. Sometimes MIDI playback would expose singing parts generated in the XML as opposed to only piano parts. This would not be problematic for the solenoids device as it ignores the type of part, however, is an issue for the sample audio which Tom suggested I incorporate. I tried correcting the xml to find and replace all singing parts with piano, but as the xml file gets longer, this job requires significant time that slows down the app. Ultimately, the digital player is just a secondary feature and plays more of a debugging role for the user in case something goes wrong once the solenoids start playing.

Additionally, with Nora I ran a latency test from application to scheduler and back, sending play and pause commands while the solenoids were active. Through our tests we determined the maximum full cycle communication time to not exceed 72ms. This successfully meets our 100ms limitation.

Regarding bug testing, I realized there were too many print statements in my code for a finalized project as well as the fact that the serial port from the GUI was never being closed at any point. I want to add in features to the GUI that replace console logs with more user friendly text or buttons, as well as introduce a disconnection mechanism from the UI for the user to disconnect without unplugging the device. This is by no means a setback, as I have had much time to work on the app and believe I can develop these features quickly now over the final week of classes.

Overall I believe my progress is on schedule. I’m looking forward to finishing my last tasks and operating the project on an actual piano.

Rahul’s Status Report for 4/8

This week I was able to accomplish end to end connection of the accompanyBot with all of my team members. This was very exciting! I also discovered a bug while doing this. If the user opens the insert file menu while Audiveris is processing another file and then hits cancel on the new file menu the program crashes. This has been resolved now. Another issue that I have been aware of but cannot resolve is if Audiveris is busy processing and the user wishes to cancel the process or just decides to close the app preemptively. The Audiveris process dialog will continue running until it fully finishes. Considering that this OMR software is built upon very old toolchains and I already tried many options to get it configured for MacOS at the beginning of the semester, I will not dig into how I can send it a SIGKILL or similar termination signal. The easy work around is just to close the window that appears which terminates the process manually.

Earlier in the week, the interim demo took place. After showing the application, my team’s course advisor Tom suggested that I should add a digital player to demo the parsed music as a midi output before the accompanyBot starts to play. I was able to address this suggestion and implemented the digital player which converts the xml into midi first and then begins playing an endless loop. This loop can be paused and unpaused via the spacebar key.

Here is a demonstration of it playing and pausing (there should be audio playing along with the video)

In terms of completing my side of the serial communication I set up sender code and patched up the receiver thread code to get measure data from the notes scheduler running on the RPi. I also eliminated one piece of data that was initially being sent by the RPi. Previously, the number of measures in the song was getting determined by music21 and sent over the serial connection to the GUI app. I decided to process the number of measures algorithmically from the raw xml structure.

Overall, the project is in a good position. I was slightly worried last week that I would get behind, but after getting things working I believe I am on schedule again. Going into next week, I will have some more time to confirm the workings of the communication between subsystems. In particular I need to set up the ssh command to deliver the xml file from the GUI to the RPi. As mentioned earlier, this is necessary to avoid the one-time heavy load over the serial connection. If this gets done quickly then I can begin stress testing early and identifying problems that the GUI or notes scheduler may need to have fixed.

The tests on the application side will involve many different variations of quality of music inputs, randomization of button presses to break application, manual disconnection of serial cable while app is still running (I think I tried this already, and since I’m using threads for communication, they terminate with an error printed to the console, but the main app thread keeps running fine, so this is not unreasonable behavior I suppose), etc. I anticipate the app to crash when tests fail and will analyze the error logs closely to determine the point(s) of failure. Also, I will try to perform actions that spawn and terminate different threads at a time while checking the task manager to make sure all python processes get cleaned up when they should. Part of this effect on the use case requirements is to maintain the functionality and latency requirement between subsystems. In terms of design, the GUI should be user friendly. So any odd errors that prevent intended usage must be addressed.

 

Rahul’s Status Report for 4/1

Early in the week I worked with Nora to devise data structures for our notes scheduler. The primary functionality we determined would be to take a measure of notes at a time and play through them sequentially unless it receives a measure change or pause signal from the app. I then proceeded to try and work out the communication methodology between the RPi and Windows computer. Unfortunately, my team did not have a usb to usb-c cable on hand, so I placed an order request for one. While waiting, I was able to borrow one from Alex, my group’s TA. I was able to connect the RPi to the Windows computer, but this amounted to nothing since the RPi is not a peripheral and does not get recognized immediately. We would need to order a UART to facilitate the data communication. To avoid ordering more parts and having to deal with UART, I decided to just plug in an Arduino into my laptop and test serial communication from python to the Arduino. I had to relearn some of the arduino intricacies but got a reasonable proof of concept working.

The accompanyBot GUI application will have to spawn a thread from the start that communicates with the hardware. This communication thread and the app’s main thread should be allowed to modify specific mutable objects such as the playing state variable or the current measure number. I ran a test to verify this functionality and it was successful .

In the print log from the test, the list [“nothing”, “here”] gets modified to [“winner”, “here”], by the thread doing communication the data “winner” also gets received by the arduino which was configured to only send this data if it received the “hello mr arduino” message. The main thread is printing the list, while the communication thread is printing the accurate sent and received messages.

The experienced engineer might be thinking right now that I will have a lot of debugging to do if do not mention using locks for the shared data. Though I did not try that out in the test (as it was not necessary), I will keep in mind to use mutex locks in the integration.

The next steps will be to implement all of the communication signals that will be necessary. The variety needed could change as we flesh out our app, but for now I can get started on some of the essential ones. This is what I will be working to do, going into next week. And in general for the weeks to come, it will all be integration work. My team and I are facing some design changes that we will outline in the team status report so our Gantt schedule will likely be very volatile in the coming weeks. I feel like I am slightly behind schedule, but I am sure I can complete the necessary work without cause for concern. We have set up the slack time just for this reason and hopefully will have a working product by final demo.

 

Rahul’s Status Report for 3/25

For this week, I had to build out a lot of the framework for the GUI application. Everything I did was able to be successfully implemented in python (alongside the Audiveris and powershell scripts I had written last week and early on in the semester).

This included, adding a variable tempo metric display, controlling this tempo with key presses or mouse clicks on the bi-directional vertical arrow buttons, displaying the pause button when the play button is clicked and vice versa, adding a variable current measure number, creating a slider that controls the current measure number (with equally spaced stops at the integer measures determined by the total number of measures in a piece/song), allowing the measure number to also be more fine-tuned to decrement from bi-directional horizontal arrow buttons.

I also successfully integrated Audiveris into the application, by making a seperate thread call to the OMR program after the file opening thread returns the string of the selected file. Since users may select invalid files, I implemented some basic error handling here as well. Once the OMR parses the music score into XML, it stores it to a cache folder and then the application renders the png of the score into the application for display above the slider. I discovered that pdfs are slightly more complicated than pngs as they consist of multiple pages. Nonetheless, I did some digging and came across the “pdf2image” module for python. Since the app runs on Windows, I had to install and configure an additional library known as poppler to get pdf2image to work properly. With that done I was successfully able to display the pdf music scores in the event the users input was a pdf, which we deemed as valid input in our proposal and design review. At the moment I am only displaying the first page always. Once the notes scheduling portion of the accompanyBot is integrated, I will look to ensure the proper page is being displayed at all times.

This is a short demonstration of what is possible from the GUI as of March 25, 2023.

I also believe I can attempt to offer a more formal definition of what valid input is. In previous work my group and I have done, we just said no handwritten or low resolution scores. Quantitatively speaking, I would say the scores that should be generated need to be at least 400 DPI, otherwise Audiveris begins to have some problems assigning note heads and rests to the scores.

I ran into a bug with pygame not being able to display some of the lower resolution scores that Audiveris was just barely able to process (they were roughly 240 DPI). I will try to look into this, but I did get these pages off the internet, and to crack down on copyright infringement it seems a lot of free content is being made available only at lower resolutions.

My goal for the week was to get as much of the app done that does not require integrating with the RaspberryPi. I think I was able to accomplish that successfully. My schedule for next week states I should be helping with hardware construction and circuitry. Considering my strengths are in software and that we are behind schedule on the implementation of the notes scheduling portion of the accompanyBot, I will shift my time to be spent helping get that done. Nora is working to finish that up, so if she does have it working early in the week then I will instead work to integrate it with the GUI application that I have spent the last 2 weeks building. The new schedule will be updated accordingly.

Rahul’s Status Report for 3/18

The earlier portion of this week was spent working on the ethics assignment for this course and analyzing my project from an ethical lens. After that, I implemented more of the UI diagram from my previous status report into our GUI python application. Some details have been omitted while some will get added towards integration stages, but here is the current bootup view:

You might be able to tell that last week’s application view that I showed was running on MacOS while this week has a Windows style app window. Since the app will be designed for Windows, I have been making the effort to continue my initial development on Windows. Also, I left out the fast forward and rewind red arrows and replaced them with forward and back, to give the user a measure control. FF and RW just didn’t make sense for the accompanyBot. The only UI I have not developed was the measure display and the slider. This should be simple enough especially in comparison to the rest of the features that were intended for next week which I will detail below in this post.

Another change made, was utilizing the PIL library for image resizing as opposed to pygame’s internal resizing function. Now image quality is very much retained while displaying the music score in the application. Since I am at the stage where I will be implementing UI changes while also adding integration components, I decided to introduce one piece of functionality. In particular, the import pdf/png button now works upon a click. Initially, I had planned to use a python module for displaying a GUI of importing a file but I realized that this would get very computationally expensive and if it were to run in the foreground then the user would not be able to perform other UI actions (say pause the player robot immediately). For this reason, I did more digging into powerscript development to figure out how to launch the Windows file opening system and return the file name that was clicked and opened. From here, I modified the event loop code for the pygame graphics, that are displaying the app view I show above, to utilize multithreading to launch the windows powerscript job, maintain the GUI operation, and cancel user requests to import more files while the file opening system is already running.

As of now, once the user selects their file, the python code just prints the file returned, but given the previous work I have posted to our team Github of spawning an OMR job from python with the file string as input, this should hopefully be a straightforward integration hack. I may also want to multithread the OMR job appropriately.

According to our team Gantt chart I am on schedule. Tomorrow I should finish the home screen UI, which is pretty much done, and by next week all user actions should be implemented including the OMR integration. I have been on schedule every week so far, but just since next week is such a large task, I will have to work longer to get it done and still likely will fall short of finishing. (I say this because getting the file insertion view working properly took a few hours alone) Ultimately, I will try to get as much as I can finished and catch up as necessary.

Rahul’s Status Report for 3/11

Most of my time on the project this week was spent writing the design review report with my team. In addition to writing that paper, my task in the Gantt chart was to create the UI mockup for the accompanyBot application. Early on, I had made an initial design for the proposal. I decided to make modifications on that to make the colors more aesthetically pleasing as well as make the functionality more feasible.

For reference, this initial mockup had highlighted the note or rest that was currently being played or in queue on the accompanyBot device:

This functionality would require additional machine learning and synchronizing between the subsystems that would deviate from the rest of the project. Additionally, this design was lacking the tempo modifier that was specified in our use case requirements.

For our design review report, I updated our UI model to this version:

The new design incorporates a dark theme which will have less strain on users eyes, the tempo modifier which will be dynamic and prevent input speeds which are too fast, and a current measure display to replace the highlighted note(s) of the previous design. The current measure will be feasible as music21  allows image generation from segments of measures from the original score. So for a score with 100 measures, the system will create and store 100 small images of measures in local memory.

Inevitably, more changes will likely be made to the front-end design come implementation. To account for this, I decided to get started learning the pygame documentation and begin implementing. For now I have developed a proof of concept display:

Now, I know where to look to create the precise visuals such as rounded corners or underlined font. The main concern with visuals is that I found pygame’s image rescaler to very much degrade overall image quality. This results in missing music staff lines when displayed. I will likely delegate rescaling and other poor operations to be performed as OS jobs.

I will have to clean up the app code into modular sections to continue development, but so far I am in a good position to hopefully finish the bulk of the application by next week. Once we start integrating my team and I will have to figure out PyUSB and our data transmission protocol between subsystems so that I can successfully send signals to the RPi, and Nora and Aden can implement the reception of these signals to complete the control of the accompanyBot.

 

Rahul’s Status Report for 2/25

I delivered my team’s design review presentation this week, and overall conveyed the design requirements and status of our project well. Analyzing the feedback, I see we need to bring some more of the use case requirements into light. I will work with the team to ensure Nora emphasizes the use case metrics in our final presentation. 

Last week, I talked about the preliminary code skeleton which I have worked to expand upon. Since then I have made some modifications and additions. The powershell script to call Audiveris now executes in foreground to signal the completion of the OMR job. Following this, I modified the python script to unzip the generated mxl file after the powershell job completes. For testing purposes, I also added a testing function that runs the whole pipeline of reading music, unzipping, passing the xml to python data structures, and then playing the music data through computer speakers. This was possible after doing more research into the music21 library and understanding respective formats and syntaxes.

I have also better understood a portion of how to create our notes scheduling algorithm. Once music21 has loaded the musicxml file into a stream, it separates the notes into parts. For the case of piano music, a bass clef and treble clef. Then within each of these parts I can access an array of measures, each of which contains an array of notes, rests or chords (which are also sort of arrays of notes). More work will have to be done in integration to set up our own player which will line up the note to the appropriate solenoid, but otherwise whatever note(s) the application is sitting on in the music21 structure should be directing some solenoid with the GPIO signal to be on. 

Our team has also migrated code to GitHub. All of my contributions are pushed to my forked copy of the team repo. This allows us to verify modifications by inspecting each other’s commits before merging. Overall, I am on schedule. My task for the week was to “modify XML/MIDI output to integration specs”, and I accomplished this with my preliminary music21 code. Next up, I will diagram the front end layout for the application. I think if time permits (though I probably will dedicate the rest of my time to writing up the design report with my team) I should also research what framework will be best to implement the application in. At the moment, pygame seems like a reasonable choice to meet our design requirements (especially the 150 ms latency time).



Rahul’s Status Report for 2/18

Since our OMR solution will be run on Windows, this week I put some work into setting up the shell scripts and python actions to call such a script. While our main hub application development won’t start for a few weeks, I still wanted to build a skeleton of functionality for calling the OMR without the default Audiveris GUI that could be modified later on. For this I had to learn some features of the .ps1 or Windows powershell scripting language by consulting stack overflow. Though the syntax is not as kind as bash, the operations remain the same, and I was able to allow python to execute it via the os module. I recalled some libraries from 15-112 for file path opening through GUI and decided to incorporate those into the skeleton code, as this will make our UX better come app design time. 

I also have spent time preparing for the design review presentation next week, as I will be delivering the presentation on behalf of my team. In the effort to expand sections of our block diagram, I felt it best to segment our project in three dimensions: a transcription phrase, a scheduling phase, and an execution phase. 

As will appear in our presentation:

I hope this will provide our audience(s) some clarity to some of the uncertainties regarding technicalities of our project. By doing this, I uncovered that our notes scheduling was defined rather weakly, and deserves more planning time. As a group we knew that converting music scores to MusicXML format was the way to go and that the RaspberryPi could go from there. After generating the XML with Audiveris, and trying to move forward with its output, we realized how much extraneous information there is just in the readable XML. This led me to do some digging on open source XML “condensing” code, just so that it could be organized into data structures that might be more easily accessible and operable by our (to be determined) mode of scheduling. Fortunately, I found that MIT has poured in years of experience and expertise into developing music21, a python module for importing music file formats for conversion to data structures that can be easily traversed or manipulated, and permitting export of different file types or playing imported source directly (Plus, they have awesome documentation). Considering the RaspberryPi will be switching on and off the solenoids from a python script, I can foresee having music21’s to preprocess the XML being an important intermediate step. 

In terms of staying on schedule, I needed to configure the OMR to output XML/MIDI. I consider this accomplished, since MIDI was an idea that was not necessarily needed (plus I found there are many resources available for XML to MIDI conversion). Since music21 will be able to play back our XML, our sound quality testing will be facilitated as such. Next week, I will have to work with Nora and Aden on formalizing our scheduling better to determine most if not all of the necessary transformations of the transcribed XML. Hopefully, I may get to writing a portion of the corresponding code.



Rahul’s Status Report for 2/11

I did further research into alternative OMR technology earlier in the week as I was having trouble building a custom version Audiveris on Mac. Since the default output is an MXL file and not XML, I wanted to edit the source code to build it to my needs. I figured out all the modifications necessary to make, however, ran into a dependencies issue. Audiveris requires an older version of an OCR (optical character recognition) library called tesseract. Since my Mac is of the M2 type, it was practically impossible to get a hold of the older version of this software for M2. This was as far as I could get:

I was able to download the relevant jar files, and made sure to specify the classpath to link the files, but it seems that I need dynamic link files as well which will be impossible to get.

This led me to look for an alternative python-based OMR solution Oemer, which essentially runs a pretrained ML model on a pdf of music. The simplicity of usage was great, however runs take a few minutes to complete and upon reconverting the xml back to pdf form I was very dissatisfied with its accuracy  on the Charlie Brown example from the Team Status Report (probably like 50%).

Last week, I mentioned how Audiveris was able to run fairly well on windows, though it was outputting MXL files which was annoying as it read like a compressed binary. I eventually discovered that these MXL files are just zipped XML files, and unzipping a few kB per page would hardly be expensive for meeting the parsing time requirements that we set.

Eventually, I will write a bash script to run the OMR, callable by the GUI application from our Proposal. The only thing to keep in mind is that it will have to use Windows commands (yikes). This is a sample of what the commands would look like.

Running the OMR:

This is able to generate the MXL in the default output directory (though there is another parameter that can be used to specify the directory). It also produces a log of what was executed:

If you check the time stamps of the log, you will see this took roughly 11 seconds to parse the single page, which is very reasonable and should not be too cumbersome for our end user.

Previously I was solely running the omr from the Audiveris GUI which though pretty would not be ideal for our pipeline app.

Audiveris GUI build:

Next week I will integrate the file generation and unzipping into a preliminary version of the bash script mentioned earlier. I also hope to test the OMR on more music scores to come up with a numeric metric for comparison with our goals. My current progress is good and is on schedule.