On Monday, Professor Low told us that we needed to act on a contingency plan in case I could not get all of the convolutional operations done before the demo on Monday, April 20. He was absolutely right, and I’ve rescoped the hardware portion to include only the layers needed for Linear Layers, which we already had implemented. I’m disappointed that I wasn’t able to implement my specifications for the convolutional layers (which includes Max Pooling and Flatten operations), but I seriously underestimated the amount of time it would take and it does not achieve the core goal of the project, which is to implement a fast hardware architecture for training neural networks.
Accomplishments
The hardware architecture is complete up to the Data Pipeline Router, which interfaces with the SPI bus that Jared is working on. At this point, we have a top-level module that drives signals to the Model Manager, which exposes memory handles to the FPU, which drives signals to the memory port managers in the MMU, which multiplexes a single-cycle on-chip memory and simulated off-chip SDRAM (which stalls be a number of cycles before servicing a request). We’re currently working on implementing these signals into the Data Pipeline Router, which will read packets and drive the proper signals without needing a testbench.
Schedule & Accomplishments for Next Week
Now that we’re not implementing convolutional layers, we need a benchmark suite of models to train on. We will be making this throughout the next week so we can get some numbers for how fast our hardware implementation can train them.