From last week have an implementation of fsrcnn which runs faster than srcnn, still slow though. One optimisation that I tested was using fixed weights as opposed to weights stored in host-side memory which is mapped to the kernel. This led to a decent improvement in latency but not enough to meet our initial specifications. Porting and integrating with host code has produced further slowdowns. Trying to remedy this with a multikerneled approach which should be finished by tonight. Will be focusing on writing the paper, the video, and making a narrative to sell what we have for the coming week, as we aren’t in the position schedule wise to try for more optimisations, even if that’s what I would like to do.
Project-management-wise, I also helped Josh practice for the Wednesday presentation on Tuesday.