How to upgrade Kubernetes version on EKS with Terraform and no downtime?

This procedure is based on these articles:

Requirements:

this procedure is aimed at a terraform deployed EKS cluster, so we need terraform and access to the AWS accounts

Important Note:

k8s versions can be upgraded only one minor version at time (i.e. 1.11 to 1.12 or 1.12 to 1.13)

These are the steps:

Modify the cluster version
Update kube-proxy, CoreDNS and Amazon VPC CNI
Selecting a New AMI
Turn off autoscaler && Creating a New Worker Group
Draining Your Old Nodes
Cleanup

The Terraform file

I’m using this module to create the cluster:

module "eks_cluster" {
  source          = "terraform-aws-modules/eks/aws"
  version         = "2.2.0"

  cluster_name    = "mycluster-k8s"
  subnets         = ["${concat(module.vpc.public_subnets, module.vpc.private_subnets)}"]
  vpc_id          = "${module.vpc.vpc_id}"
  manage_aws_auth = true
  cluster_version = "1.13"
  worker_group_count = "2"

  worker_groups = [
    {
      name = "worker_group_0"
      instance_type = "t3.medium"
      asg_max_size  = 6
      asg_min_size  = 1
      asg_desired_capacity = 3
      enabled_metrics = "GroupMinSize,GroupMaxSize,GroupDesiredCapacity,GroupInServiceInstances,GroupPendingInstances,GroupTerminatingInstances,GroupStandbyInstances,GroupTotalInstances"
      enable_monitoring = true
      protect_from_scale_in = false
      autoscaling_enabled = true
      ami_id = "ami-08198f90fe8bc57f0"
    },
    {
      name = "worker_group_1"
      instance_type = "t3.medium"
      asg_max_size  = 0
      asg_min_size  = 0
      asg_desired_capacity = 0
      autoscaling_enabled = true
      enabled_metrics = "GroupMinSize,GroupMaxSize,GroupDesiredCapacity,GroupInServiceInstances,GroupPendingInstances,GroupTerminatingInstances,GroupStandbyInstances,GroupTotalInstances"
      enable_monitoring = true
      protect_from_scale_in = true
      ami_id = "ami-08198f90fe8bc57f0"
    }
  ]

  map_users =     [
    {
      user_arn = "arn:aws:iam::111111111111:user/user@email.com"
      username = "user@email.com"
      group    = "system:masters"
    }
  ]

  map_users_count = "1"

  tags = {
    Terraform = "true"
    Environment = "${var.environment}"
    Project     = "${var.project_name}"
    Component   = "${var.component_label_eks}"
  }
}

Insights from this file:

Used version 2.2.0 of module EKS https://github.com/terraform-aws-modules/terraform-aws-eks
This is part of the Terraform file, as you can see I have references to other modules. (e.g. "${module.vpc.vpc_id}")
Here it’s set version 1.13 for Kubernetes
Here we have set two worker nodes groups. (one has 0 instances, we will use this group to do the migration with no downtime, more on this soon)

Modify cluster version

On the Terraform file main.tf, search for this element:

module "eks_cluster" {

Check that key cluster_version exists. If it doesn’t add a line like this one:

cluster_version = "1.13"

Change “1.13” to the version you want.

If you already have this line just modify the version, keeping in mind that you can change only one minor version at time.

To avoid worker groups instances update due to AMI changes, force to use the one you are using right now. To do this, set the AMI in the worker groups. So check what’s your AMI, search for this element:

module "eks_cluster" {

…then this sub element:

 worker_groups = [

and modify it with your current image:

ami_id = "ami-08571c6cee1adbb62"

Also you can add a name to your workers group so you can identify it easily.

Apply changes. It can take 20 minutes.

You can check the change on the console .

Note: Wait until this step has finished.

Update kube-proxy, CoreDNS and Amazon VPC CNI

Now, you need to update versions for these components according to the following table:

K8s version	1.13	1.14
DNS	CoreDNS 1.2.6	CoreDNS 1.6.6
KubeProxy	1.13.7	1.14.9

For more information see this link https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html

Regarding Amazon VPC CNI plug-in It is recommended the latest available CNI version. (so far it’s 1.5.5)

Kube-proxy

Kube-proxy can be updated by patching the DaemonSet with the newer image version:

kubectl patch daemonset kube-proxy -n kube-system -p '{"spec": {"template": {"spec": {"containers": [{"image": "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.13.7","name":"kube-proxy"}]}}}}'

For version 1.13.7

CoreDNS

Newer versions of EKS use CoreDNS as the DNS provider. To check if your cluster is using CoreDNS, run the following command:

kubectl get pod -n kube-system -l k8s-app=kube-dns

The pods in the output will start with coredns in the name if they are using CoreDNS. If your cluster is not running CoreDNS, follow the Amazon-provided instructions on this page to install CoreDNS at the correct version.

If your cluster was previously running CoreDNS, update it to the latest version for your version of Kubernetes in the table above. These are the steps:

Check the current version of your cluster’s coredns deployment.

kubectl describe deployment coredns --namespace kube-system | grep Image | cut -d "/" -f 3

If you are going from a version pre 1.5 to a 1.5 or later, do the following:

Edit the coredns’ configmap:

kubectl edit configmap coredns -n kube-system

Replace proxy in the following line with forward. Save the file and exit the editor:

proxy . /etc/resolv.conf

Now you can upgrade the image in a safe way:

kubectl set image --namespace kube-system deployment.apps/coredns coredns=602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.2.6

For version 1.2.6.

Replace region on the url with you own.

Amazon VPC CNI

The Amazon VPC CNI is not explicitly tied to certain versions of Kubernetes, but its recommended you upgrade it to version 1.5.5 when you upgrade your Kubernetes version. To check your current version of the CNI use:

kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2

If your version is lower than the latets version (by now is 1.5.5), run the following command to update the DaemonSet to the newest configuration:

kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/v1.5/aws-k8s-cni.yaml

Selecting Current and New AMIs

To know what AMI your instances are using right now:

Go to EC2 Dashboard
Identify one of your worker nodes
Get the instance id
Get data for the instance with this command:

aws ec2 describe-instances --filters "Name=instance-id,Values=i-0434c94e3b14849d1"

Replacing the Value with the one you just get from the dashboard.

Look for the key ImageId and get the ami value.

To know the AMI you must use for the new workers:

Got to the EC2 service
Go to AMIs
Search for AMIs with name amazon-eks-node-1.13 and owned by AWS. (change the version to the one your are updating to)
Choose the newest.

Turn off autoscaler && Set the New Worker Group

Locate under:

module "eks_cluster" {

…this sub element in your main.tf:

worker_groups = [

If you already have ran this process, probably you have two worker nodes groups, one with zero desire capacity. (as shown in the terraform file included above)

If you didn’t, add one element to your worker nodes groups array:

{
 name = "worker_group_1"
 instance_type = "t3.medium"
 asg_max_size = 6
 asg_min_size = 1
 asg_desired_capacity = 2
 autoscaling_enabled = true
 enabled_metrics = "GroupMinSize,GroupMaxSize,GroupDesiredCapacity,GroupInServiceInstances,GroupPendingInstances,GroupTerminatingInstances,GroupStandbyInstances,GroupTotalInstances"
 enable_monitoring = true
 protect_from_scale_in = true
 ami_id = "ami-01e370f796735b244" 
}

Don’t forget elements in the array are comma separated. Check the sample above.

Here you must set your new image and a name different than the one you have already set, e.g.:

 worker_groups = [
 {
 name = "worker_group_0"
 instance_type = "t3.medium"
 asg_max_size = 6
 asg_min_size = 1
 asg_desired_capacity = 2
 autoscaling_enabled = true
 enabled_metrics = "GroupMinSize,GroupMaxSize,GroupDesiredCapacity,GroupInServiceInstances,GroupPendingInstances,GroupTerminatingInstances,GroupStandbyInstances,GroupTotalInstances"
 enable_monitoring = true
 protect_from_scale_in = true
 ami_id = "ami-08571c6cee1adbb62"
 },
 {
 name = "worker_group_1"
 instance_type = "t3.medium"
 asg_max_size = 6
 asg_min_size = 1
 asg_desired_capacity = 2
 autoscaling_enabled = true
 enabled_metrics = "GroupMinSize,GroupMaxSize,GroupDesiredCapacity,GroupInServiceInstances,GroupPendingInstances,GroupTerminatingInstances,GroupStandbyInstances,GroupTotalInstances"
 enable_monitoring = true
 protect_from_scale_in = true
 ami_id = "ami-01e370f796735b244"
 } 
]

Note: Add the new group at the end since the order matters.

Note: Check asg_max_size match the current worker nodes group one so it wont scale up.

If you before had only one group, remember to add under:

module "eks_cluster" {

…add this variable (or set it to the number of worker nodes group):

worker_group_count = "2"

If auto scaler is deployed, scale it down to avoid conflicts with scaling actions:

kubectl scale deployments/cluster-autoscaler --replicas=0 -n kube-system

Apply terraform changes.

Then wait for your AutoScaling group to have the desired capacity. You can check that the workers are created and ready by using:

kubectl get nodes

You should see workers with different versions of Kubernetes:

NAME STATUS ROLES AGE VERSION 
ip-10-25-1-115.ec2.internal Ready <none> 107d v1.11.9 
ip-10-25-1-245.ec2.internal Ready <none> 10m v1.12.10-eks-ffbd96 
ip-10-25-1-250.ec2.internal Ready <none> 36m v1.11.10-eks-f12431 
ip-10-25-101-36.ec2.internal Ready <none> 83d v1.11.9 
ip-10-25-101-85.ec2.internal Ready <none> 192d v1.11.5 
ip-10-25-102-21.ec2.internal Ready <none> 41d v1.11.9 
ip-10-25-2-128.ec2.internal Ready <none> 10m v1.12.10-eks-ffbd96

If you have issues with nodes not being created, check the autoscaler group for this group. Sometimes, Terraform can’t update the desired capacity for the ASG. If this is the case, access your console, look for you ASG, edit it, and set the desired capacity. Then wait for the nodes to be created.

Draining Your Old Nodes

Now that our new nodes are running, we need to move our pods to the new ones.

The first step is to use the name of your nodes returned from (i.e. for v1.12):

kubectl get nodes | grep "v1.12" | awk '{print $1}'

…to run kubectl taint nodes on each old node to prevent new pods from being scheduled on them, e.g.:

kubectl taint nodes ip-10-25-1-115.ec2.internal key=value:NoSchedule 
kubectl taint nodes ip-10-25-1-250.ec2.internal key=value:NoSchedule 
kubectl taint nodes ip-10-25-101-36.ec2.internal key=value:NoSchedule 
kubectl taint nodes ip-10-25-101-85.ec2.internal key=value:NoSchedule 
kubectl taint nodes ip-10-25-102-21.ec2.internal key=value:NoSchedule

Before draining the old nodes, we will scale-up the deployments. Since old nodes are tainted as NoSchedule, the new pods will be created in the new nodes. This way, we are trying to avoid downtimes. To do this follow these steps.

Get the deployments of your interest:

kubectl get deployments -n <yournamespace>

Let’s say you have this output:

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
kubeapp-canary        1/1     1            1           142m

Now, scale-up on one pod your deployment (in this case there is one pod ready from 1 required, READY column, so we need to ask for 2):

kubectl scale deployments/kubeapp-canary --replicas=2 -n <yournamespace>

If you check your pods this way:

kubectl get po -n <yournamespace> -o wide

…you should see the new pod was scheduled on one of the new nodes.

Do the same with each deployment you need to prevent of downtime. You should end up with the new pods scheduled on the new nodes.

Now we will drain the old nodes, and force the pods to move to new nodes. I recommend doing this one node at a time to ensure that everything goes smoothly, especially in a production cluster:

kubectl drain ip-10-25-1-115.ec2.internal --ignore-daemonsets --delete-local-data

Repeat this command for each node. Wait in each step to the node to be drained.

You can check on the progress in-between drain calls to make sure that pods are being scheduled onto the new nodes successful by using:

kubectl get pods -A -o wide

or to watch the pods for a given node, and refresh list each 5 seconds:

while true; do kubectl get po -o wide --all-namespaces | grep ip-10-25-101-85.ec2.internal; echo " *******************" ; echo " "; sleep 5; done

Note: You may run into issues with StatefulSet pods. This is why it is important to ensure your new worker groups have the same config as your old ones. Any Persistent Volumes you had can only run in the same Availability Zone if they are backed by EBS, so the new workers need to run in the same AZs as the old ones.

Cleanup

Once you have confirmed that all non-DaemonSet pods are running on the new nodes, we can terminate your old worker group.

Since the eks module uses an ordered array of worker group config objects in the worker_groups key, you cannot just delete the old config.

If you do, Terraform will see this change and assume that the order must have changed and try to recreate the AutoScaling groups.

Instead, we should recognize that this will not be the last time we will do an upgrade, and that empty AutoScaling groups are free. So we will keep the old worker group configuration and just run 0 capacity in it like so:

 {
 name = "worker_group_0"
 instance_type = "t3.medium"
 asg_max_size = 0
 asg_min_size = 0
 asg_desired_capacity = 0
 ami_id = "ami-08571c6cee1adbb62"
 },

Now the next time we upgrade, we can put the new config in this worker group and easily spin up workers without worrying about the Terraform state.

Apply the changes:

terraform apply

If terraform can’t modify the desired capacity for the ASG, please, change it on the console.

If it’s needed, scale-down your deployments.

Scale on again

kubectl scale deployments/cluster-autoscaler --replicas=1 -n kube-system

Note on helm upgrades

If after the upgrade you have a message about immutable fields using helm upgrade, just like this one:

Error: UPGRADE FAILED: Deployment.apps "datadog-kube-state-metrics" is invalid: [spec.template.metadata.labels: Invalid value: map[string]string{"app.kubernetes.io/instance":"datadog", "app.kubernetes.io/name":"kube-state-metrics"}: `selector` does not match template `labels`, spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"kube-state-metrics", "release":"datadog", "app.kubernetes.io/name":"kube-state-metrics"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable]

…you should try to use the option

-force

with your helm upgrade command.

Wanna to say hi to Fort Commander, creator and mentor of RicardoForm jewels.