Personal Accomplishments
I have started programming the natural language processing system for the project this week using Python’s spaCy library. I used the built in en_core_web_sm pipeline for the basic file parsing with tokenizer, tagger, parser and NER. There are two ideas for grammar rule matching that I experimented with. The Dependency Matcher is able to get the menu item and the quantity even with multiple words in between, such as “a splendidly delicious hamburger”, but it is complicated to set rules for vocabulary without directly dependent relationships. If the matcher fails to detect that the item quantity word is related to the menu item word, there is no way they can be identified using the Dependency Matcher. The token modifier works better in this situation, since the rules can be set based on part-of-speech tags or other token properties instead. However, the token modifier is not very proficient in identifying the relationship between different parts of the sentence and might require more edge-case accommodations. Based on the findings, I will attempt to utilize both matchers to create a more comprehensive algorithm.
Schedule
I’m slightly behind our original schedule, since I haven’t made significant progress on the natural language processing algorithm due to the large amount of time spent in familiarizing myself with spaCy and the matchers. Our Gantt chart plan has changed to accommodate this. Since the database also has not been established yet, the integration between database and natural language processing system can be pushed back until after spring break to allocate more time for the development of both tools. I will make sure to work on the algorithm more in the next week and get an MVP version by the end of spring break at the latest.
Plans for Next Week
I plan to refine my natural language processing algorithm while working on the design report with my teammates. By the end of next week, my algorithm should be able to parse the user input in the following situations:
- When there are words between the quantity and item name (eg. “a beautifully packaged cheeseburger”)
- When the user makes an attempt to change the order using some easily detectable keywords (eg. “remove the diet coke” / “I wanna add another cheeseburger”)