Education & Careers

How to Dynamically Adjust Resource Allocations for Suspended Kubernetes Jobs (v1.36 Beta)

2026-05-01 05:41:36

Introduction

Kubernetes v1.36 introduces a powerful enhancement for batch and machine learning workloads: the ability to modify container resource requests and limits in the pod template of a suspended Job. Now in beta (first introduced as alpha in v1.35), this feature lets queue controllers and administrators fine-tune CPU, memory, GPU, and extended resource specifications on a Job while it's suspended, before it starts or resumes running. This means you can adapt resource allocations without deleting and recreating the Job, preserving all metadata and status.

How to Dynamically Adjust Resource Allocations for Suspended Kubernetes Jobs (v1.36 Beta)

In this step-by-step guide, you'll learn how to leverage this feature to dynamically adjust resources for suspended Jobs, ensuring efficient cluster utilization and smoother operation of resource‑intensive workloads.

What You Need

Step-by-Step Guide

Step 1: Verify the Feature is Enabled

In Kubernetes v1.36, this feature is beta, so it's enabled by default. To confirm, run:

kubectl api-versions | grep batch/v1

If you're on v1.35, you may need to enable the JobMutablePodTemplate feature gate. In v1.36, no manual action is required.

Step 2: Create a Suspended Job

Define a Job manifest with the spec.suspend: true field. This suspends the Job immediately after creation, allowing you to modify its resources before any Pods are launched. Below is an example of a machine learning training Job requesting 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
  name: ml-training-suspended
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:latest
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

Apply it with kubectl apply -f job-suspended.yaml.

Step 3: Modify Resource Requests/Limits While Suspended

Once the Job is created and in a suspended state, you can update its pod template's resources. Use kubectl edit or kubectl patch. For example, to reduce GPU count from 4 to 2 and adjust CPU/memory:

kubectl patch job ml-training-suspended --type='json' -p='[
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/example-hardware-vendor.com~1gpu", "value": "2"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/example-hardware-vendor.com~1gpu", "value": "2"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "4"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "16Gi"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "4"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "16Gi"}
]'

Note: The tilde (~1) in the GPU field escapes the slash in the resource name. Ensure the new values are valid (e.g., non‑negative, within cluster capacity).

Step 4: Resume the Job

After adjusting resources, unsuspend the Job by setting spec.suspend to false:

kubectl patch job ml-training-suspended -p '{"spec":{"suspend":false}}'

The Job will start creating Pods with the updated resource specifications. You can monitor progress with kubectl get pods -w.

Step 5: Verify Resource Allocation

Check that the running Pods reflect the new resources:

kubectl get pod ml-training-suspended-xxxxx -o jsonpath='{.spec.containers[0].resources}'

You should see the adjusted requests and limits. If a queue controller is managing the Job, it can also perform these updates automatically.

Tips and Best Practices

This feature dramatically improves flexibility for batch and ML workloads, letting you adapt to changing cluster conditions without disruption. Embrace it to make your Kubernetes environment more resilient and efficient.

Explore

Firefox 150 Launches with Linux Emoji Picker and PDF Page Reordering: A Major Productivity Boost German Police Unmask 'UNKN': The Man Behind REvil and GandCrab Ransomware Gangs Revealed Roblox's User Decline: 10 Key Insights from the Latest Earnings Report 10 Key Insights from Rust’s Challenges: Lessons Learned from the Vision Doc Team GitHub Copilot CLI Explained: 8 Key Tips for Interactive and Non-Interactive Modes