Pod (po)
- Kubernetes does not deploy containers directly on the worker nodes, instead it encapsulates it as an object named "Pod"
- A pod runs one of more containers (usually a main container and optionally side-car containers)
Properties
spec.containers[]
- If any of the container fails, the POD restarts
- Multi-container PODs Design Patterns
Sidecar
Adapter
Ambassador
apiVersion: v1 # access to predefined set of object types
kind: Pod # kind of object to be created
metadata: # metadata to identify the object
name: myapp
labels:
app: myapp
spec: # specification about the object
containers:
- name: nginx-container
image: nginx:1.20
- name: log-agent
image: log-agent
spec.containers[].env
- Environment Variables
apiVersion: v1
kind: Pod
metadata:
name: postgres
spec:
containers:
- name: postgres
image: postgres:12
# set envs one by one
env:
# from literal
- name: MY_FAVORITE_COLOR
value: blue
# from secret
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: pgsecret
key: PG_PASSWORD
# from configmap
- name: MY_TOP_CONFIG
valueFrom:
configMapKeyRef:
name: pgconfigmap
key: PG_CONFIG
# from field
- name: MY_FAVORITE_ANIMAL
fieldRef:
fieldPath: metadata.labels # map labels into envs
spec.containers[].envFrom
- Environment Variables
apiVersion: v1
kind: Pod
metadata:
name: postgres
spec:
containers:
- name: postgres
image: postgres:12
# set all envs from a resource
envFrom:
- configMapRef:
name: simpleconfigmap
- secretRef:
name: simplesecret
spec.containers[].command & spec.containers[].args
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: nginx
image: nginx:l.20
command: ["sleep"] # override the entrypoint from dockerfile
args: ["10"] # override the cmd from dockerfile
spec.containers[].ports[]
- This information is primarily informational, it does not expose the port outside the Pod (for that a service is needed)
- It tells Kubernetes and other tools (e.g., monitoring systems) that the application inside the container listens on the specified port.
apiVersion: v1
kind: Pod
metadata:
name: postgres
spec:
containers:
- name: postgres
image: postgres:12
ports:
- containerPort: 5432
spec.containers[].volumeMounts & spec.volumes
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
nodeName: node-name # manually schedule pod
containers:
- name: myapp
image: nginx
volumeMounts:
- name: host-filesystem
mountPath: /host-filesystem
- name: postgres-data
mountPath: /var/lib/postgresql/data # path to be mounted inside of the container
subPath: postgres
- name: pod-info
mountPath: /etc/podinfo
readOnly: false
- name: configmap-data
mountPath: /configmaps
- name: secret-data
mountPath: /secrets
volumes:
# from host filesystem
- name: host-filesystem
hostPath:
path: /
type: Directory # DirectoryOrCreate
# from pvc
- name: postgres-data
persistentVolumeClaim:
claimName: database-pvc
# from aws storage
- name: aws-volume
awsElasticBlockStore:
volumeID: <volume-id>
fsType: ext4
# from labels and annotations
- name: pod-info
downwardAPI:
items:
- path: labels
fieldRef:
fieldPath: metadata.labels
- path: annotations
fieldRef:
fieldPath: metadata.annotations
# from configmap
- name: configmap-data
configMap: # each key is created as a file
name: simpleconfigmap
# from secret
- name: secret-data
secret: # each key is created as a file
secretName: simplesecret
defaultMode: 420
spec.containers[].livenessProbe & spec.containers[].readinessProbe
- Probes
- Liveness Probe
- Signals that a pod is in a failure state
- If it fails, pod is restarted
-
Readiness Probe
- Signals that the pod is ready to accept traffic
- If it fails, traffic is not accepted
- Traffic is blocked by removing the pod from the service LBs
-
When to start probing? (Startup Probe)
- It is defined by the
initialDelaySeconds
property - Signals that a pod has been started
- Liveness or readiness probes start only after the startup probe is received
-
Specially useful for slow starting containers (to avoid them getting killed by the kubelet)
-
Kubelet
periodically fetches the probes of each pod in the node -
Springboot Actuator provides built-in readiness and liveness probes
- Old pods will be deleted only when the new pods are ready the receive traffic
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: nginx
image: nginx:latest
livenessProbe:
initialDelaySeconds: 60 # start performing the probe (by kubelet) N seconds after the container has started
periodSeconds: 1 # how often to perform the probe (by kubelet)
httpGet: # Option 1
port: 8000
path: /actuator/health/liveness
exec: # Option 2
command:
- /bin/shz
- -c
- nc -z localhost 8095
readinessProbe:
httpGet:
port: 8000
path: /actuator/health/readiness
spec.containers[].resources
- Kube scheduler uses the resource information to decide which node to place the pod
-
The default resource requests and resource limits are defined in the
LimitRange
object -
Resource Requests
- Defaults:
0.5 vCPU
(500m),256 Mi RAM
(if no LimitRange is defined) - 1 CPU is equals to: 1 AWS vCPU, 1 GCP Core, 1 Azure Core, 1 Hyperthread
-
1m (milicpu) is the minimum amount of CPU, it is 0.001 vCPU
-
Resource Limits
- Defaults:
1 vCPU
(1000m),512 Mi RAM
(if no LimitRange is defined) - If CPU usage exceeds the limit, the container is throttled
- If memory usage exceeds the limit constantly, the container is terminated (OOMKilled)
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
containers:
- name: nginx-container
image: nginx
resources:
requests: # This is the minimum resources that k8s will allocated to the container
memory: 1Gi
cpu: 1
limits: # max resource usage
memory: 2Gi
cpu: 1200m # same as 1.2 cpu
# gpu units
nvidia.com/gpu: 1
amd.com/gpu: 1
aws.amazon.com/neuron: 1
habana.ai/gaudi: 1
spec.restartPolicy
apiVersion: v1
kind: Pod
metadata:
name: postgres
spec:
containers:
- name: postgres
image: postgres:12
restartPolicy: Never
spec.imagePullSecrets
- In order to authenticate against a private container registry, a secret must be created
kubectl create secret docker-registry "regcred" \
--docker-server "private-registry.io" \
--docker-username "registry-user" \
--docker-password "registry-password" \
--docker-email "[email protected]"
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx-container
image: private-registry.io/apps/internal-app:1.0.0
imagePullSecrets:
- name: regcred
spec.nodeName
- Every pod has a field called
spec.nodeName
- It is a responsibility of the
scheduler
to fill this field and schedule the pod. But you can do that manually too - The nodeName property cannot be modified after the pod has been created
- If a pod could not be scheduled to any node because a
scheduler
is not present, the pod remains inpending
state - Another way to schedule a pod to a node is creating a
Binding
object
apiVersion: v1
kind: Pod
metadata:
name: myapp-po
labels:
app: myapp
spec:
containers:
- name: nginx-container
image: docker.io/nginx
ports:
- containerPort: 3000
nodeName: node01
spec.NodeSelector
- With the node selectors you can
run certain workloads in certain nodes
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
containers:
- name: nginx-container
image: nginx
# select schedulable nodes by labels
nodeSelector:
a-size: a-large-node
spec.affinity.nodeAffinity
Scheduling
: the state in which the pod does not exist yet-
Execution
: the state in which a pod is running and has already been scheduled -
Node affinity types
- requiredDuringSchedulingIgnoredDuringExecution: if affinity rules cannot be matched, pod will not be scheduled
- preferredDuringSchedulingIgnoredDuringExecution: if affinity rules cannot be matched, pod will be scheduled to another node that "violates less" the rules
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
containers:
- name: nginx-container
image: nginx
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions: # select node by labels
- key: karpenter.sh/capacity-type
operator: In
value:
- spot
- key: foo
operator: NotIn
value:
- small
- key: bar
operator: Exists
spec.affinity.podAntiAffinity
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
containers:
- name: nginx-container
image: nginx
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: karpenter.sh/capacity-type
operator: In
value:
- spot
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
spec.schedulerName
- A pod can be instructed to use a specific scheduler other than the default
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
containers:
- name: nginx-container
image: nginx
schedulerName: my-custom-scheduler
spec.tolerations
Toleration
: tolerance that apod
has to a specific node taint. If not specified, pods have no tolerations. Toleration does not guarantee that a pod will be scheduled to the tolerated pod
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
containers:
- name: nginx-container
image: nginx
tolerations:
- key: "nvidia.com/gpu" # tolerates the node taint "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- key: "foo"
operator: "Equal"
value: "blue"
effect: "NoSchedule"
spec.initContainers
-
initContainer: a container that will runs a
initial setup
task until completion and then terminates -
That is a task that will be run only one time when the pod is first created. E.g., pulls a code or binary from a repository that will be used by the main application
- Or a process that waits for an external service or database to be up before the actual application starts
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ["sh", "-c", "echo 'The app is running!' && sleep 3600"]
initContainers:
# each initContainer is executed in sequence
# if any container execution fails, the whole pod is restarted (and start the init containers all over)
# While init containers are running, the pod shows the status "Init:0/1"
- name: init-myservice-pullcode
image: busybox
command:
[
"sh",
"-c",
"git clone https://github.com/foo/bar.git",
]
- name: init-db
image: busybox:1.28
command:
[
"sh",
"-c",
"until nslookup mydb; do echo 'Waiting for mydb'; sleep 2; done;",
]
- name: init-db2
image: busybox:1.31
command:
[
"sh",
"-c",
'echo -e "Checking for the availability of MySQL Server deployment"; while ! nc -z mysql 3306; do sleep 1; printf "-"; done; echo -e " >> MySQL DB Server has started";',
]
- Each init container is run one at a time in sequential order (
Init:0/3
) - If any of the initContainers fail to complete, Kubernetes restarts the Pod repeatedly until the Init Container succeeds
spec.securityContext
- Can be configured for at
Container level
orPod level
-
Container settings will override the Pod settings
-
runAsUser: ID of the user to use
- capabilities: linux capabilities that can be added or removed
apiVersion: v1
kind: Pod
metadata:
name: myos
spec:
containers:
- name: ubuntu
image: ubuntu
command: ["sleep", "3600"]
securityContext:
runAsUser: 0 # override user 1000, and run as user 0 (root)
capabilities: # capabilities apply only for the securityContext inside the container
add: ["MAC_ADMIN"]
securityContext:
runAsUser: 1000 # will be overridden
spec.topologySpreadConstraints
- Describes how a group of pods ought to spread across topology domains (e.g., Availability Zones)
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
topologySpreadConstraints:
- topologyKey: "topology.kubernetes.io/zone" # try to use one than one zone (as defined by this label)
maxSkew: 1
whenUnsatisfiable: ScheduleAnyway
# only nodes that have this label
labelSelector:
matchLabels:
mylabel: foo