Skip to content

Pod (po)

  • Kubernetes does not deploy containers directly on the worker nodes, instead it encapsulates it as an object named "Pod"
  • A pod runs one of more containers (usually a main container and optionally side-car containers)

Properties

spec.containers[]

  • If any of the container fails, the POD restarts
  • Multi-container PODs Design Patterns
  • Sidecar
  • Adapter
  • Ambassador
apiVersion: v1 # access to predefined set of object types
kind: Pod # kind of object to be created
metadata: # metadata to identify the object
  name: myapp
  labels:
    app: myapp
spec: # specification about the object
  containers:
    - name: nginx-container
      image: nginx:1.20
    - name: log-agent
      image: log-agent

spec.containers[].env

  • Environment Variables
apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
    - name: postgres
      image: postgres:12
      # set envs one by one
      env:
        # from literal
        - name: MY_FAVORITE_COLOR
          value: blue
        # from secret
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: pgsecret
              key: PG_PASSWORD
        # from configmap
        - name: MY_TOP_CONFIG
          valueFrom:
            configMapKeyRef:
              name: pgconfigmap
              key: PG_CONFIG
        # from field
        - name: MY_FAVORITE_ANIMAL
          fieldRef:
            fieldPath: metadata.labels # map labels into envs

spec.containers[].envFrom

  • Environment Variables
apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
    - name: postgres
      image: postgres:12
      # set all envs from a resource
      envFrom:
        - configMapRef:
            name: simpleconfigmap
        - secretRef:
            name: simplesecret

spec.containers[].command & spec.containers[].args

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
    - name: nginx
      image: nginx:l.20
      command: ["sleep"] # override the entrypoint from dockerfile
      args: ["10"] # override the cmd from dockerfile

spec.containers[].ports[]

  • This information is primarily informational, it does not expose the port outside the Pod (for that a service is needed)
  • It tells Kubernetes and other tools (e.g., monitoring systems) that the application inside the container listens on the specified port.
apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
    - name: postgres
      image: postgres:12
      ports:
        - containerPort: 5432

spec.containers[].volumeMounts & spec.volumes

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  nodeName: node-name # manually schedule pod
  containers:
    - name: myapp
      image: nginx
      volumeMounts:
        - name: host-filesystem
          mountPath: /host-filesystem
        - name: postgres-data
          mountPath: /var/lib/postgresql/data # path to be mounted inside of the container
          subPath: postgres
        - name: pod-info
          mountPath: /etc/podinfo
          readOnly: false
        - name: configmap-data
          mountPath: /configmaps
        - name: secret-data
          mountPath: /secrets
  volumes:
    # from host filesystem
    - name: host-filesystem
      hostPath:
        path: /
        type: Directory # DirectoryOrCreate
    # from pvc
    - name: postgres-data
      persistentVolumeClaim:
        claimName: database-pvc
    # from aws storage
    - name: aws-volume
      awsElasticBlockStore:
        volumeID: <volume-id>
        fsType: ext4
    # from labels and annotations
    - name: pod-info
      downwardAPI:
        items:
          - path: labels
            fieldRef:
              fieldPath: metadata.labels
          - path: annotations
            fieldRef:
              fieldPath: metadata.annotations
    # from configmap
    - name: configmap-data
      configMap: # each key is created as a file
        name: simpleconfigmap
    # from secret
    - name: secret-data
      secret: # each key is created as a file
        secretName: simplesecret
        defaultMode: 420

spec.containers[].livenessProbe & spec.containers[].readinessProbe

  • Probes
  • Liveness Probe
    • Signals that a pod is in a failure state
    • If it fails, pod is restarted
  • Readiness Probe

    • Signals that the pod is ready to accept traffic
    • If it fails, traffic is not accepted
    • Traffic is blocked by removing the pod from the service LBs
  • When to start probing? (Startup Probe)

  • It is defined by the initialDelaySeconds property
  • Signals that a pod has been started
  • Liveness or readiness probes start only after the startup probe is received
  • Specially useful for slow starting containers (to avoid them getting killed by the kubelet)

  • Kubelet periodically fetches the probes of each pod in the node

  • Springboot Actuator provides built-in readiness and liveness probes

  • Old pods will be deleted only when the new pods are ready the receive traffic
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
    - name: nginx
      image: nginx:latest

      livenessProbe:
        initialDelaySeconds: 60 # start performing the probe (by kubelet) N seconds after the container has started
        periodSeconds: 1 # how often to perform the probe (by kubelet)
        httpGet: # Option 1
          port: 8000
          path: /actuator/health/liveness
        exec: # Option 2
          command:
            - /bin/shz
            - -c
            - nc -z localhost 8095

      readinessProbe:
        httpGet:
          port: 8000
          path: /actuator/health/readiness

spec.containers[].resources

  • Kube scheduler uses the resource information to decide which node to place the pod
  • The default resource requests and resource limits are defined in the LimitRange object

  • Resource Requests

  • Defaults:0.5 vCPU (500m), 256 Mi RAM (if no LimitRange is defined)
  • 1 CPU is equals to: 1 AWS vCPU, 1 GCP Core, 1 Azure Core, 1 Hyperthread
  • 1m (milicpu) is the minimum amount of CPU, it is 0.001 vCPU

  • Resource Limits

  • Defaults: 1 vCPU (1000m), 512 Mi RAM (if no LimitRange is defined)
  • If CPU usage exceeds the limit, the container is throttled
  • If memory usage exceeds the limit constantly, the container is terminated (OOMKilled)
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: nginx-container
      image: nginx
      resources:
        requests: # This is the minimum resources that k8s will allocated to the container
          memory: 1Gi
          cpu: 1
        limits: # max resource usage
          memory: 2Gi
          cpu: 1200m # same as 1.2 cpu

          # gpu units
          nvidia.com/gpu: 1
          amd.com/gpu: 1
          aws.amazon.com/neuron: 1
          habana.ai/gaudi: 1

spec.restartPolicy

apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
    - name: postgres
      image: postgres:12
  restartPolicy: Never

spec.imagePullSecrets

  • In order to authenticate against a private container registry, a secret must be created
kubectl create secret docker-registry "regcred" \
  --docker-server "private-registry.io" \
  --docker-username "registry-user" \
  --docker-password "registry-password" \
  --docker-email "[email protected]"
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
spec:
  containers:
    - name: nginx-container
      image: private-registry.io/apps/internal-app:1.0.0
  imagePullSecrets:
    - name: regcred

spec.nodeName

  • Every pod has a field called spec.nodeName
  • It is a responsibility of the scheduler to fill this field and schedule the pod. But you can do that manually too
  • The nodeName property cannot be modified after the pod has been created
  • If a pod could not be scheduled to any node because a scheduler is not present, the pod remains in pending state
  • Another way to schedule a pod to a node is creating a Binding object
apiVersion: v1
kind: Pod
metadata:
  name: myapp-po
  labels:
    app: myapp
spec:
  containers:
    - name: nginx-container
      image: docker.io/nginx
      ports:
        - containerPort: 3000
  nodeName: node01

spec.NodeSelector

  • With the node selectors you can run certain workloads in certain nodes
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: nginx-container
      image: nginx
  # select schedulable nodes by labels
  nodeSelector:
    a-size: a-large-node

spec.affinity.nodeAffinity

  • Scheduling: the state in which the pod does not exist yet
  • Execution: the state in which a pod is running and has already been scheduled

  • Node affinity types

  • requiredDuringSchedulingIgnoredDuringExecution: if affinity rules cannot be matched, pod will not be scheduled
  • preferredDuringSchedulingIgnoredDuringExecution: if affinity rules cannot be matched, pod will be scheduled to another node that "violates less" the rules
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: nginx-container
      image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions: # select node by labels
              - key: karpenter.sh/capacity-type
                operator: In
                value:
                  - spot
              - key: foo
                operator: NotIn
                value:
                  - small
              - key: bar
                operator: Exists

spec.affinity.podAntiAffinity

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: nginx-container
      image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: karpenter.sh/capacity-type
                operator: In
                value:
                  - spot
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: kubernetes.io/hostname

spec.schedulerName

  • A pod can be instructed to use a specific scheduler other than the default
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: nginx-container
      image: nginx
  schedulerName: my-custom-scheduler

spec.tolerations

  • Toleration: tolerance that a pod has to a specific node taint. If not specified, pods have no tolerations. Toleration does not guarantee that a pod will be scheduled to the tolerated pod
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: nginx-container
      image: nginx
  tolerations:
    - key: "nvidia.com/gpu" # tolerates the node taint "nvidia.com/gpu"
      operator: "Exists"
      effect: "NoSchedule"
    - key: "foo"
      operator: "Equal"
      value: "blue"
      effect: "NoSchedule"

spec.initContainers

  • initContainer: a container that will runs a initial setup task until completion and then terminates

  • That is a task that will be run only one time when the pod is first created. E.g., pulls a code or binary from a repository that will be used by the main application

  • Or a process that waits for an external service or database to be up before the actual application starts
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
    - name: myapp-container
      image: busybox:1.28
      command: ["sh", "-c", "echo 'The app is running!' && sleep 3600"]
  initContainers:
    # each initContainer is executed in sequence
    # if any container execution fails, the whole pod is restarted (and start the init containers all over)
    # While init containers are running, the pod shows the status "Init:0/1"
    - name: init-myservice-pullcode
      image: busybox
      command:
        [
          "sh",
          "-c",
          "git clone https://github.com/foo/bar.git",
        ]
    - name: init-db
      image: busybox:1.28
      command:
        [
          "sh",
          "-c",
          "until nslookup mydb; do echo 'Waiting for mydb'; sleep 2; done;",
        ]
    - name: init-db2
      image: busybox:1.31
      command:
        [
          "sh",
          "-c",
          'echo -e "Checking for the availability of MySQL Server deployment"; while ! nc -z mysql 3306; do sleep 1; printf "-"; done; echo -e "  >> MySQL DB Server has started";',
        ]
  • Each init container is run one at a time in sequential order (Init:0/3)
  • If any of the initContainers fail to complete, Kubernetes restarts the Pod repeatedly until the Init Container succeeds

spec.securityContext

  • Can be configured for at Container level or Pod level
  • Container settings will override the Pod settings

  • runAsUser: ID of the user to use

  • capabilities: linux capabilities that can be added or removed
apiVersion: v1
kind: Pod
metadata:
  name: myos
spec:
  containers:
    - name: ubuntu
      image: ubuntu
      command: ["sleep", "3600"]
      securityContext:
        runAsUser: 0 # override user 1000, and run as user 0 (root)
        capabilities: # capabilities apply only for the securityContext inside the container
          add: ["MAC_ADMIN"]
  securityContext:
    runAsUser: 1000 # will be overridden

spec.topologySpreadConstraints

  • Describes how a group of pods ought to spread across topology domains (e.g., Availability Zones)
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  topologySpreadConstraints:
    - topologyKey: "topology.kubernetes.io/zone" # try to use one than one zone (as defined by this label)
      maxSkew: 1
      whenUnsatisfiable: ScheduleAnyway

      # only nodes that have this label
      labelSelector:
        matchLabels:
          mylabel: foo