Introduction and Project Summary

A predictive maintenance system for liquid-cooled processors.

Modern liquid-cooled CPUs and GPUs are typically protected by fixed temperature thresholds. When these thresholds are exceeded, cooling systems ramp up fans or pumps, and processors may throttle performance or even shut down to avoid damage. By the time this happens, processors may already have sustained thermal stress that reduces performance or causes permanent damage.

AnomAIy is a predictive maintenance system designed to prevent this outcome. Instead of relying on static thresholds, AnomAIy learns the normal patterns of heat transfer and cooling response, then detects early deviations that signal potential problems. This enables the system to issue alerts with enough margin for safe user intervention.

To develop and test the system, we use high-power resistors mounted to water blocks to simulate CPU/GPU heat loads inside a liquid cooling loop. By collecting temperature sensor data under varying loads and environmental conditions, we build models that distinguish normal cooling behavior from abnormal trends using statistical methods and/or machine learning.

The result is a platform that not only visualizes real-time coolant and simulated CPU/GPU temperatures, but also delivers clear, actionable notifications when unusual thermal behavior is detected. AnomAIy makes liquid-cooled systems, from gaming PCs to AI racks, safer, smarter, and more reliable.