Skip to content

NodePool (nodepools)

  • Defines what instance types Karpenter can create
  • Based on this node pool, Karpenter will decide what is the most appropriate instance type to be created

NodePools strategy

  • Single
  • A single NodePool to manage compute for multiple teams and workloads
  • Use Cases

    • Mix of graviton and x86 as requirements in a single NodePool
  • Multiple

  • Isolating compute for different purposes
  • Use Cases

    • Expensive hardware
    • Security isolation
    • Team separation
    • Different AMIs
    • Tenant isolation due to noisy neighbor
  • Weighted

  • Define order across your NodePools so that the node scheduler will attempt to schedule with one NodePool before trying another
  • The weight for picking one nodepool over others is defined by nodepool.spec.weight. Higher weights are preferred
  • Use Cases

    • Prioritize RI and Compute Savings Plan ahead of other instance types. For leveraging discounts over a quota of specific EC2 instances. You want to use these discount EC2 quota first
    • Default cluster-wide configuration
    • Ratio split - Spot/OD, x86/Graviton
  • Multi-NodePool strategy gotchas:

  • The pods do not need to specify the nodepool, it is automatically picked based on the requirements. If the requirements are overlapping and multiple NodePools match, the first one alphabetically is used (if no weight is defined)
  • However, it is very common that a pod does want to specify a NodePool (e.g., the Team A may only use the NodePool A). In this case it's common to define node labels in the NodePool and use them as nodeSelector in the pod workloads. It's also possible to enforce a specific nodeSelector using Kyverno so that each team uses its own namespace which is "tied" to a specific NodePool
  • The limits are isolated per NodePool

Properties

spec.template.metadata

  • Metadata defined here will be attached to the underlying nodes
  • The name of the NodePool is also added as a label in the node created. E.g., karpenter.sh/nodepool: default
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      annotations:
        application/name: my-app
        karpenter.sh/do-not-disrupt: true # prevent voluntary disruptions to all nodes
      labels:
        team: my-team
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]

spec.template.spec.nodeCLassRef

  • What NodeClass (VM configuration) to use for this node pool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default # must match the name of your NodeClass resource
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]

spec.template.spec.requirements

  • When a node is created, the same requirement key-value pair is added as a label in the new node
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default

      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]

        - key: kubernetes.io/os
          operator: In
          values: ["linux"]

        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["p3.8xlarge", "p3.16xlarge"]

        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-west-2a", "us-west-2b"]

        # prioritizes spot
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", on-demand"]

        - key: karpenter.k8s.aws/instance-hypervisor
          operator: In
          values: ["nitro"]

        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]

        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - m5
            - m5d
            - c5
            - c5d
            - c4
            - r4
            - p3 # gpu

        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]

        - key: karpenter.k8s.aws/instance-cpu
          operator: In
          values: ["4", "8", "16", "32"]

        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values: ["nano","micro","small"]

      expireAfter: 720h

  limits:
    cpu: 1000
    memory: 10000Gi

  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m

spec.template.spec.taints

  • Taints defined here will be attached to the underlying nodes
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
      taints:
        # only gpu workloads (with the nvidia.com/gpu toleration) will run on the node
        - key: nvidia.com/gpu
          value: true
          effect: NoSchedule

spec.template.spec.expireAfter & terminationGracePeriod

  • expireAfter specifies the period after which the node is voluntarily disrupted
  • Useful for forcing AMI refresh or recycling nodes for security concerns
  • In contrast, terminationGracePeriod specifies how long to wait after the expiration before an involuntary disruption is triggered
  • Useful in situations in which a PDB is misconfigured and will block Karpenter to voluntarily disrupt a node
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]

      # Voluntary Expiration
      expireAfter: 720h # 30 days (default)
      expireAfter: 30d # 30 days
      expireAfter: Never

      # Involuntary Expiration
      terminationGracePeriod: 1d

spec.limits

  • Hard limits for all instances in the node pool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
  limits:
    cpu: 1000
    memory: 10000Gi

spec.weight

  • Hard limits for all instances in the node pool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
  weight: 60

spec.disruption

  • Tells Karpenter when/how to disrupt nodes
  • Actions
  • Remove empty nodes
  • Remove nodes by moving pods to another underutilized node
  • Replace nodes with cheaper variants
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    # consolidationPolicy: WhenEmpty

    consolidateAfter: 10m # how much to wait to scale nodes down due to low utilization (defaults to 0 - right away, this can result in high node churn)

    # NodePool Disruption Budget
    # Controls how many nodes can be disrupted at a time
    # All the items (assertions) must match (AND)
    budgets:
      - nodes: 20% # allow up to 20% of the nodes to be disrupted at a time
      - nodes: 5 # allow up to 5 nodes to be disrupted at a time
      - nodes: 0 # no disruption at the first 10 minutes of the day
        schedule: "@daily"
        duration: 10m
      - nodes: 0 # no disruption up to 8am from Monday to Friday caused by the reasons
        schedule: "0 9 * * mon-fri"
        duration: 8h
        reasons:
          - Drifted
          - Underutilized
          - Empty