Concepts
Core Idea
Kubernetes is designed to keep workloads alive. Zombie Mode reverses that: it safely scales workloads down when you don’t need them.
Scope of Control
Zombie Mode targets deployments (and by extension, pods/replicas). Any workload with the ZombieMode label can be put to sleep on a schedule.
Fleet-Aware
Unlike single-cloud schedulers, Zombie Mode manages workloads across all your clusters—no matter the provider or geography.
Granular Control
1. Infrastructure layer (Cluster or Node level): Nodes (underlying infrastructure) are the by-product of Zombie Mode in action. With auto-scaled clusters Zombie Mode reduces the need for a high (static) node count. Because the running apps/pods are "dead," there is no longer any need for memory or CPU, so Kubernetes will schedule the removal of excess nodes. For example, a node dies, but the workloads/deployments still exist in that cluster. When the zombies are scheduled to turn the apps on, Kubernetes will reschedule and reallocate nodes to fit the demand. This is the root of the savings and safety via Zombie Mode.
2. Workload level (Pod or Container):
Zombie Modedoes its work here. Ensuring a workload "dies" by removing replicas (pods) from all deployments defined in the namespaces of your choosing, via your defined schedule & respawns correctly when it's time to turn apps back on.
3. Application level (Service / Process):
Application Developers - this is where you live. Zombie Mode here only means an affected app inside the container will NOT be running. So if you have a URL going to this app, you'll receive a 503 because the Kubernetes ingresses and services still exist, but your app does not; it was removed. You services and processes will reappear on schedule as if they had never been gone.
4. Code or Module level: The finest level of granularity. Zombie Mode never interacts at this level. All of your code, configs, secrets, and everything else that keeps that code running properly is immutable to Zombie Mode. It's never touched.
Terminology
- Asleep: Replicas scaled to zero (or to a minimal baseline).
- Awake: Replicas restored to normal.
- Schedule: A recurring 7-day calendar controlling awake/asleep states.
- Override: Manual wake-ups outside schedule.