Implementing Intelligent Caching in Clipper: Low latency machine learning

Kiran Pandit 
Ian Van Stralen

ABSTRACT

Effectively using machine learning in modern distributed applications
requires serving predictions at low latencies. Clipper is a prediction
serving system which can deploy di erent models and serve machine
learning queries with low latencies. We evaluate Clipper’s default
query caching mechanism. Additionally, we pro- pose and test a system
for improving this mechanism via intelligent caching through
dynamically resizable memory allocations based on hit rates. We fnd
that cache usage and policy has an effect on the overall latency of
requests.