Cloud Computing

Kubernetes v1.36: Smarter Controller Caches and Real-Time Insight

2026-05-04 00:17:41

Welcome to our deep dive into Kubernetes v1.36's latest features! We'll explore how the new release tackles a sneaky problem called staleness in controllers. Staleness can make controllers act on outdated information, causing missed actions or incorrect decisions. v1.36 brings powerful improvements to client-go (the library controllers use) and the most popular controllers in kube-controller-manager. These changes not only prevent stale data but also give you better visibility into what's happening. Let's answer your burning questions about these updates.

What is staleness in Kubernetes controllers?

Staleness describes a situation where a controller has an outdated view of the cluster. Controllers keep a local cache so they can respond quickly without hitting the API server on every action. This cache is filled by watching the API server for object changes. But if the cache falls behind – for example, after a controller restart or an API server outage – the controller might see old data. It might then take an incorrect action (like scaling a deployment up when it should be down) or do nothing when it should act. Staleness is often invisible until something breaks in production, making it a subtle but dangerous issue.

Kubernetes v1.36: Smarter Controller Caches and Real-Time Insight

How does staleness affect controller behavior?

Staleness can cause three main problems:

These problems often stem from assumptions developers make about cache consistency – assumptions that break when events come out of order or the cache rebuilds slowly.

What causes controller cache to become stale?

Several scenarios can leave a controller's cache outdated:

  1. Controller restarts: The cache must be rebuilt from scratch by listing and watching objects. During this window, the cache is empty or partially filled.
  2. API server outages: If the API server is unreachable, the watch connection breaks and the cache stops updating. Old data remains until the connection resumes.
  3. Out-of-order events: When the informer receives events in a different order than they happened, the cache can temporarily reflect an impossible state (e.g., a pod appearing before its namespace).
  4. High churn: In busy clusters, many changes can overwhelm the watch stream, causing it to fall behind and drop events, leading to a stale cache.

What improvements does v1.36 bring?

Kubernetes v1.36 focuses on two areas:

What is atomic FIFO and how does it help?

Atomic FIFO is a new mode for the queue that processes events in batches as a single, indivisible operation. The existing FIFO queue allowed events to be added individually as they arrived. If events came out of order (e.g., the initial list arrived as separate events), the queue might process a partial set before the rest arrived. This could cause the controller to see an inconsistent cache snapshot.

With atomic FIFO, all events from a batch (like the initial list or a set of related updates) are held until the batch is complete, then applied atomically. After that, the cache reflects a consistent, up-to-date state. This reduces the chance that a controller will act on stale or partial information. The feature gate is named AtomicFIFO and is opt-in initially; you can enable it to get these benefits now.

How can I take advantage of these improvements?

To use the new atomic FIFO in your own controllers:

  1. Update your client-go dependency: Make sure you're using a version of client-go that includes the AtomicFIFO feature (available in Kubernetes v1.36).
  2. Enable the feature gate: When initializing your informer factory or queue, set the AtomicFIFO feature gate to true. This can be done via environment variable or controller flags.
  3. Monitor your controller: With atomic FIFO, you can also introspect the cache to see the latest resource version processed, giving you better observability into cache freshness.

For operators, simply upgrading kube-controller-manager to v1.36 will bring these benefits to built-in controllers. No configuration changes are required unless you want to opt in for custom controllers.

How does v1.36 improve observability for controllers?

Alongside staleness mitigation, v1.36 adds better observability hooks. With atomic FIFO, the queue exposes metrics and logging that let you see when batches are processed and how far behind the cache might be. You can query the latest resource version applied to the cache and compare it to the current state of the API server. This makes it easier to detect staleness before it causes harm. Previously, you had to rely on indirect signals or wait for a bug report. Now, operations teams can set up alerts for cache lag, helping them catch issues early.

These observability improvements complement the correctness fixes, giving you both a safer and more transparent controller ecosystem.

Explore

Unlocking Earth’s Ring Current: A Step-by-Step Guide to NASA’s STORIE Mission Critical Supply Chain Attack Compromises PyTorch Lightning and Intercom-client Packages for Credential Theft Google Launches TurboQuant: Breakthrough Compression Suite Targets LLM and Vector Search Efficiency Critical SQL Injection Flaw in LiteLLM Exploited Within 36 Hours of Disclosure 10 Crucial Facts About the Increasingly Competitive NIH Grant Landscape