Neeraj’s Status Report for 3/16/24 – Team E6: TransLingualVisionary

My main update is regarding the RNN architecture. As mentioned last week, a lot of this week has mainly been spent on exploring MUSE-RNN, its capabilities, and whether it can be applied to our current architecture. I have found some current code implementing it in MatLab at https://github.com/MUSE-RNN/Share. However, a majority of this code is p-code meaning, that it is encrypted and we can only interact with the model itself. From testing it out, it does seem to be able to work when given an appropriately structured mat file as a dataset. However, I also believe that creating a script to redevelop our dataset into such is not a viable use of our time, especially when considering that MatLab might not be the best language to use for our code. As such, I have reached a few options. We can move forward with the basic RNN that we have, use the current implementation of MUSE-RNN that we have and disregard the possible negative drawbacks of MatLab as a language, or try developing/finding a new model that could also work. As of right now, I believe the best option is the first one, but I have also found another model to explore called spoter, which is a transformer that has been used in a very similar way to our use case, as we can see here: https://github.com/matyasbohacek/spoter. I am also interested in looking into this and possibly building a transformer with a similar structure since this code also works under the presumption of pose estimation inputs, meaning that this would translate cleanly into our current pipeline. On the LLM side, there has been a good amount of progress made, as I have experimented more with different ideas from various papers and articles. In particular, from this site (https://humanloop.com/blog/prompt-engineering-101), I found that example and concise, structured prompts are more effective, which I have been working with. I am planning on bringing this up during our group’s working session later, as I want to solidify our prompt to dedicate more time to everything else.

I want to finish up the decision and the classification model as soon as possible, so that is my first priority. The LLM prompt is also a priority, but I want to finish that with the rest of the team as I believe that is something that can be finished relatively quickly with the three of us working together, especially since it is something that we can quickly test once we gather a few example sentences hat we can test.

I am currently a bit behind schedule, as the RNN work is taking a bit more time than I anticipated, especially when considering the fact that there are a variety of different models that I am looking at using. However, there is a counterbalance here, because the LLM prompt generation should take far less time than we had originally anticipated on. As a result, we can adjust the schedule to dedicate more time towards the RNN rather than the LLM prompting.

Leave a Reply Cancel reply