Implementing Intelligent Caching in Clipper: Low latency machine learning Kiran Pandit Ian Van Stralen ABSTRACT Effectively using machine learning in modern distributed applications requires serving predictions at low latencies. Clipper is a prediction serving system which can deploy di erent models and serve machine learning queries with low latencies. We evaluate Clipper’s default query caching mechanism. Additionally, we pro- pose and test a system for improving this mechanism via intelligent caching through dynamically resizable memory allocations based on hit rates. We fnd that cache usage and policy has an effect on the overall latency of requests.