= Kubernetes Configuration & Best Practices include::subdocs/_init.adoc[] ifndef::imagesdir[:imagesdir: images] image::turnoff-cloud-autoscaling.png[{half-width}] == Installation === Google Kubernetes Engine (GKE) Create an account on GCP and follow any tutorial, for example link:https://www.youtube.com/watch?v=ZpbXSdzp_vo[this video workshop]. === AWS / EKS Have an admin account on AWS. ==== Manual installation [TIP] ==== * Good to do once to understand every steps * You can switch menu language at the bottom left of any page. Select english. ==== .Follow `Getting Started with Amazon EKS` of link:https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html[Official documentation] to : * Create your Amazon EKS Service Role * Create your Amazon EKS Cluster VPC * Install and Configure kubectl for Amazon EKS * Download and Install the Latest AWS CLI * Create Your Amazon EKS Cluster * Configure kubectl for Amazon EKS * Launch and Configure Amazon EKS Worker Nodes ==== Terraform installation TIP: To destroy the cluster : `terraform destroy --force` If needed, install Terraform using link:https://askubuntu.com/questions/983351/how-to-install-terraform-in-ubuntu[this guide] : [source,shell] cd /tmp/ wget https://releases.hashicorp.com/terraform/0.12.6/terraform_0.12.6_linux_amd64.zip unzip terraform_0.12.6_linux_amd64.zip sudo mv terraform /usr/local/bin/ terraform --version ===== Official repository Using the link:https://github.com/terraform-aws-modules/terraform-aws-eks[official terraform/eks repo] as a module, without cloning it. * Create the main config file .main.tf [%collapsible] ==== [source,json] ------ include::../samples/kubernetes/eks.main.tf[] ------ ==== * Create a minimal output file .outputs.tf [%collapsible] ==== [source,json] ------ include::../samples/kubernetes/eks.outputs.tf[] ------ ==== * If you configured autoscaling usage, create the associated config file .autoscaler.yml [%collapsible] ==== [source,yaml] ------ include::../samples/kubernetes/eks.autoscaler.yml[] ------ ==== * If not already done, configure AWS CLI [source,shell] aws configure * Create the cluster. It should take about 10min. [source,shell] terraform init terraform apply * Configure kubeconfig ** Either with terraform output [source,shell] terraform output kubeconfig > ~/.kube/my-sweet-cluster export KUBECONFIG=~/.kube/my-sweet-cluster ** Or with `aws eks` [source,shell] aws eks update-kubeconfig --name my-sweet-cluster * If you configured it to use the autoscaler ** Install and initialize helm as described in <>. ** Apply the helm chart [source,shell] helm install stable/cluster-autoscaler --values=autoscaler.yml --name cas --namespace kube-system ** Test the autoscaler *** Scale up [source,shell] kubectl run example --image=nginx --port=80 --replicas=50 kubectl logs -l "app.kubernetes.io/instance=cas" -f kubectl get nodes -w *** Scale down [source,shell] kubectl delete deployment example kubectl logs -l "app.kubernetes.io/instance=cas" -f After 10 minutes (by default), the cluster should scale down and you should see I0423 12:18:52.539729 1 scale_down.go:600] ip-10-0-3-163.eu-north-1.compute.internal was unneeded for 10m8.928762095s I0423 12:18:52.539815 1 scale_down.go:600] ip-10-0-3-149.eu-north-1.compute.internal was unneeded for 10m8.928762095s I0423 12:18:52.539884 1 scale_down.go:600] ip-10-0-1-206.eu-north-1.compute.internal was unneeded for 10m8.928762095s I0423 12:18:52.539947 1 scale_down.go:600] ip-10-0-1-222.eu-north-1.compute.internal was unneeded for 10m8.928762095s I0423 12:18:52.540077 1 scale_down.go:819] Scale-down: removing empty node ip-10-0-3-163.eu-north-1.compute.internal I0423 12:18:52.540190 1 scale_down.go:819] Scale-down: removing empty node ip-10-0-3-149.eu-north-1.compute.internal I0423 12:18:52.540261 1 scale_down.go:819] Scale-down: removing empty node ip-10-0-1-206.eu-north-1.compute.internal I0423 12:18:52.540331 1 scale_down.go:819] Scale-down: removing empty node ip-10-0-1-222.eu-north-1.compute.internal ===== Alternative repository TIP: Smaller option list, easier to understand * clone link:https://github.com/WesleyCharlesBlake/terraform-aws-eks[this working repo] [source,shell] git clone https://github.com/WesleyCharlesBlake/terraform-aws-eks.git * check that link:https://github.com/WesleyCharlesBlake/terraform-aws-eks/pull/16[this issue] is merged, else apply the changes locally * create a configuration file at root and change values if needed. See default values in `variables.tf` or on Github page. .terraform.tfvars [source,ini] cluster-name = "my-sweet-cluster" k8s-version = "1.12" aws-region = "eu-west-1" node-instance-type = "t2.medium" desired-capacity = "1" max-size = "5" min-size = "0" * Configure AWS to the right account [source,shell] pip install --upgrade awscli aws configure * Create the cluster. It should take about 10min. [source,shell] terraform apply * Configure kubeconfig [source,shell] terraform output kubeconfig > ~/.kube/trekea-cluster export KUBECONFIG=~/.kube/trekea-cluster or [source,shell] aws eks update-kubeconfig --name my-sweet-cluster * Make the workers join the cluster and watch [source,shell] terraform output config-map > config-map-aws-auth.yaml kubectl apply -f config-map-aws-auth.yaml kubectl get nodes --watch ==== Administration [[user-installation]] ===== Initial admin For a user / maintainer of the cluster, here are the pre-requisites : * Install link:https://kubernetes.io/docs/tasks/tools/install-kubectl/[kubectl] * Install link:https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html[aws-iam-authenticator] * Install link:https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html[AWS CLI] * Check installation [source,shell] ---- kubectl version aws-iam-authenticator help aws --version ---- * Configure AWS and kubectl [source,shell] ---- create-access-key --user-name aws configure aws eks --region update-kubeconfig --name ---- ===== Additional admins To grant cluster rights to other admins than the cluster creator, do the following. ====== IAM Go to AWS console / IAM to give EKS/ECR admin rights to the user(s). .Policies Default policies do not cover EKS and ECR admin usage (!), so we create some custom policies. * Create EKS policies with btn:[Policies] -> btn:[Create policy] ** Service = `EKS` ** Action = `All EKS actions` ** Resources = `All resources` * Click btn:[Review policy] ** Name = `EKS-admin` * Create EKS policies with btn:[Policies] -> btn:[Create policy] ** Service = `ECR` ** Action = `All ECR actions` ** Resources = `All resources` ** Name = `ECR-admin` .Groups * Create a group with btn:[Policies] -> btn:[Create policy] ** Group Name = `MyDevTeam` ** Choose policies : *** `IAMSelfManageServiceSpecificCredentials` *** `IAMFullAccess` *** `IAMUserChangePassword` *** `IAMUserSSHKeys` *** `EKS-admin` *** `ECR-admin` ** Click btn:[Next] then btn:[Create Group] .Users If needed, create the user in AWS console / IAM. Copy the `userarn` for next step. Attach the group to the user by clicking on the user -> btn:[Groups] -> btn:[Add user to groups] -> select `MyDevTeam` -> btn:[Add to Groups]. ====== Configmap The actual admin has to update the configmap with the new user. [source,shell] kubectl edit -n kube-system configmap/aws-auth [source,yaml] ---- apiVersion: v1 data: mapRoles: | - rolearn: arn:aws:iam:::role/adx-worker-nodes-NodeInstanceRole-XXXXX username: system:node:{{EC2PrivateDNSName}} groups: - system:bootstrappers - system:nodes mapUsers: | - userarn: arn:aws:iam:::user/my-user username: my-user groups: - system:masters - userarn: arn:aws:iam:::user/another-user <1> username: another-user groups: - system:masters kind: ConfigMap [...] ---- <1> Add user(s) to the list in mapUsers WARNING: Modification is asynchronous, you may have to wait a few minutes for this to be taken into account ====== Tools configuration You now have the rights, you can then perform operations described in <>. ==== Kubernetes dashboard Follow link:https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html[official guide] : [source,shell] kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml kubectl apply -f https://raw.githubusercontent.com/ericdahl/hello-eks/master/k8s/dashboard/eks-admin-service-account.yaml kubectl apply -f https://raw.githubusercontent.com/ericdahl/hello-eks/master/k8s/dashboard/eks-admin-cluster-role-binding.yaml kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}') This gets you a token for connecting to a link:http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login[local redirection] of the dashboard when typing [source,shell] kubectl proxy image::k8s-dashboard.jpg[] ===== CPU / RAM You may not have CPU and RAM on EKS as of late 2018 / early 2019, see link:https://github.com/awslabs/amazon-eks-ami/issues/128[this issue] for explanation. * check heapster logs [source,shell] kubectl get all kubectl logs heapster-*** E1228 12:13:05.074233 1 manager.go:101] Error in scraping containers from kubelet:10.0.30.39:10255: failed to get all container stats from Kubelet URL "http://10.0.30.39:10255/stats/container/": Post http://10.0.30.39:10255/stats/container/: dial tcp 10.0.30.39:10255: getsockopt: connection refused * Fix the deployment by editing the yml to add some extra parameters to `--source` [source,shell] kubectl edit deployment heapster ---- spec: template: spec: containers: - command: - /heapster - --source=kubernetes:kubernetes:https://kubernetes.default?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true <1> - --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086 name: heapster ---- * Add some ClusterRole & ClusterRoleBinding by creating this file .heapster-node-stats.yml [source,yaml] ---- include::subdocs/heapster-node-stats.yml[] ---- * And applying it [source,shell] kubectl apply -f heapster-node-stats.yml Error logs should stop. After a few minutes, with enough data, the dashboard should show CPU / RAM in `All` namespace list. Else delete the `kubernetes-dashboard-*` to restart it. ==== CI pipelines Full automation requires a CI technical user being able to interract with the cluster. For this : * Create a technical user in IAM * Install, configure and use `AWS CLI`, `aws-iam-authenticator` and `kubectl` in your pipeline ===== Bitbucket For Bitbucket, we use an AWS docker image. First add some configuration by going to your repository -> btn:[Settings] -> btn:[Repository variables] AWS_ACCOUNT_ID 777777777777 AWS_DEFAULT_REGION eu-west-3 AWS_ACCESS_KEY_ID XXXXXXX AWS_SECRET_ACCESS_KEY tPbfRc/wx3JmPp6XXXXXXXty2yFJ6wl4rZ0B/Q Then define the pipeline .bitbucket-pipelines.yml [source,yaml] ---- - step: &deploy-to-develop-k8s name: Deploy to Develop on Kubernetes cluster image: atlassian/pipelines-awscli max-time: 5 services: - docker script: - export REPOSITORY_URL=${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com # Download the necessary tools to deploy to kubernetes - apk add --no-cache curl - curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl - chmod +x ./kubectl - mv ./kubectl /usr/local/bin/kubectl # Download aws-iam-authenticator - curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator - chmod +x ./aws-iam-authenticator - mkdir $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$HOME/bin:$PATH - echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc - aws eks update-kubeconfig --name ${KUBERNETES_CLUSTER_NAME} - kubectl set image deployment/api-dpl api=${REPOSITORY_URL}/adx/adx-api:develop - kubectl set image deployment/client-dpl client=${REPOSITORY_URL}/adx/adx-client:develop ---- === AWS / Kubespray / Terraform ==== Prerequisites * Install pip3 (Python 3) or pip (Python 2) [source,shell] sudo apt update sudo apt install python3-pip pip3 --version * clone kubespray repository [source,shell] git clone https://github.com/kubernetes-sigs/kubespray.git * Install the requirements using pip3 [source,shell] pip3 install -r requirements.txt * This installs Ansible but not Terraform, which we will use to generate the `hosts.ini` file used by Ansible. Let's install it following link:https://askubuntu.com/questions/983351/how-to-install-terraform-in-ubuntu[this guide]. [source,shell] cd /tmp/ wget https://releases.hashicorp.com/terraform/0.11.11/terraform_0.11.11_linux_amd64.zip unzip terraform_0.11.11_linux_amd64.zip sudo mv terraform /usr/local/bin/ terraform --version * Create a key pair in AWS Console ** btn:[Services] -> btn:[Network & Security] -> btn:[Key Pairs] -> btn:[Create Key Pair] ** Save it and change rights [source,shell] cp my-private-key.pem ~/.ssh chmod 700 ~/.ssh/my-private-key.pem ==== Terraform : EC2 servers and host.ini creation We will use Terraform to generate the host.ini file. * Go to your cloned kubespray project * Create the file `credentials.tfvar` .contrib/terraform/aws/credentials.tfvar [source,ini] #AWS Access Key AWS_ACCESS_KEY_ID = "XXXXXXXXX" #AWS Secret Key AWS_SECRET_ACCESS_KEY = "YYYYYYYYYYYYYYYYYYY" #EC2 SSH Key Name AWS_SSH_KEY_NAME = "my-key-pair-name" #AWS Region AWS_DEFAULT_REGION = "eu-west-3" * Copy the file `terraform.tfvars.example` to create one that suits your needs .contrib/terraform/aws/terraform.tfvars [source,ini] ---- #Global Vars aws_cluster_name = "mycluster" #VPC Vars aws_vpc_cidr_block = "10.250.192.0/18" aws_cidr_subnets_private = ["10.250.192.0/20","10.250.208.0/20"] aws_cidr_subnets_public = ["10.250.224.0/20","10.250.240.0/20"] #Bastion Host aws_bastion_size = "t2.medium" #Kubernetes Cluster aws_kube_master_num = 1 aws_kube_master_size = "t2.medium" aws_etcd_num = 1 aws_etcd_size = "t2.medium" aws_kube_worker_num = 1 aws_kube_worker_size = "t2.medium" #Settings AWS ELB aws_elb_api_port = 6443 k8s_secure_api_port = 6443 kube_insecure_apiserver_address = "0.0.0.0" default_tags = { Env = "mycluster" App = "mycluster-my-app" Product = "kubernetes" } inventory_file = "../../../inventory/mycluster/hosts.ini" ---- * Optional : if you want Ubuntu/Debian images for your cluster (instead of CoreOS), change this in `variables.tf` .contrib/terraform/aws/variables.tf [source,json] ---- data "aws_ami" "distro" { most_recent = true filter { name = "name" values = ["ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-*"] # values = ["debian-stretch-hvm-x86_64-gp2-2018-11-10-63975-572488bb-fc09-4638-8628-e1e1d26436f4-ami-0f4768a55eaaac3d7.4"] } filter { name = "virtualization-type" values = ["hvm"] } owners = ["099720109477"] #ubuntu # owners = ["679593333241"] #debian } ---- * Initialize inventory folder (where host.ini will be written in next step) [source,shell] mv inventory/sample inventory/mycluster * Initialize Terraform and apply [source,shell] cd contrib/terraform/aws terraform init terraform plan --var-file=credentials.tfvars terraform apply --var-file=credentials.tfvars * Edit the generated host.ini file to put the internal DNS instead of generated names (see names in AWS console) .inventory/mycluster/host.ini [source,ini] ---- [all] ip-10-250-206-126.eu-west-3.compute.internal ansible_host=10.250.206.126 ip-10-250-201-250.eu-west-3.compute.internal ansible_host=10.250.201.250 ip-10-250-204-239.eu-west-3.compute.internal ansible_host=10.250.204.239 bastion ansible_host=35.180.250.230 bastion ansible_host=35.180.55.194 [bastion] bastion ansible_host=35.180.250.230 bastion ansible_host=35.180.55.194 [kube-master] ip-10-250-206-126.eu-west-3.compute.internal [kube-node] ip-10-250-201-250.eu-west-3.compute.internal [etcd] ip-10-250-204-239.eu-west-3.compute.internal [k8s-cluster:children] kube-node kube-master [k8s-cluster:vars] apiserver_loadbalancer_domain_name="kubernetes-elb-mycluster-*****.eu-west-3.elb.amazonaws.com" ---- NOTE: Yes, there are 2 bastions with the same name, you can change names if you want both to be configured * Optionnal : to restart from scrach (helpfull when you messed a lot with ansible) [source,shell] terraform destroy --var-file=credentials.tfvars terraform apply --var-file=credentials.tfvars ==== Ansible : Cluster creation and configuration * Change some configuration files .inventory/mycluster/group_vars/all/all.yml [source,yaml] cloud_provider: aws .inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml [source,yaml] ---- cluster_name: mycluster # Make a copy of kubeconfig on the host that runs Ansible in {{ inventory_dir }}/artifacts kubeconfig_localhost: true # Download kubectl onto the host that runs Ansible in {{ bin_dir }} kubectl_localhost: true ---- * Go to project root directry and launch Ansible [source,shell] cd [..]/kubespray ansible-playbook -vvvv -i ./inventory/mycluster/hosts.ini ./cluster.yml -e ansible_user=core -b --become-user=root --flush-cache --private-key=~/.ssh/my-private-key.pem -e ansible_ssh_private_key_file=~/.ssh/my-private-key.pem 2>&1 | tee "ansible_$(date +"%Y-%m-%d_%I-%M-%p").log" [TIP] ==== * When Ubuntu image, change this parameter : `-e ansible_user=ubuntu` * You can drop the end (starting with `2>&`) if you don't want to save output to file ==== * With the minimal configuration we used before, the script should take 10min. Use this time to check operations on the master [source,shell] ssh -F ./ssh-bastion.conf core@ -i ~/.ssh/my-private-key.pem journalctl -f * When finished, check the cluster connecting on master [source,shell] ssh -F ./ssh-bastion.conf core@ -i ~/.ssh/my-private-key.pem sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes ==== Kubectl configuration behind a bastion * Get Master and Bastion IPs from `./inventory/my-cluster/hosts.ini` or `./ssh-bastion.conf` * Forward cluster API port to your local machine [source,shell] ssh -L 6443::6443 ubuntu@ -N -i ~/.ssh/traffic_staging_k8s_fr.pem * Use admin.conf generated by Ansible playbook as a kube config file, replacing master ip by `localhost`. Suggested approach : copy it and link it to default kube config path. Then it won't be replaced by a playbook run. [source,shell] cp ./inventory/my-cluster/artifacts/admin.conf ./inventory/my-cluster/artifacts/kubeconfig mv ~/.kube/config ~/.kube/config.old sudo ln -s ./inventory/my-cluster/artifacts/kubeconfig ~/.kube/config TIP: a non updated kubeconfig could lead to the following error on `kubectl` usage : `Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")`` ==== Prepare bastion for CI usage You can use the bastion as a server for the CI to interract with the cluster. //TODO put a link to Gitlab CI script using the bastion * connect to bastion [source,shell] ssh-add ~/.ssh/cluster-ssh-key.pem ssh BASTION_USER@BASTION_IP * install docker [source,shell] sudo apt-get remove docker docker-engine docker.io containerd runc sudo apt-get update sudo apt-get install -y apt-transport-https ca-certificates curl gnupg2 software-properties-common curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install -y docker-ce docker-ce-cli containerd.io * install kubectl [source,shell] sudo apt-get update && sudo apt-get install -y apt-transport-https curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list sudo apt-get update sudo apt-get install -y kubectl * create kubeconfig with content of admin.conf [source,shell] mkdir .kube vi .kube/config * add CI runner public ssh key to authorized_keys [source,shell] vi ~/.ssh/authorized_keys * download/create the key file to be used to connect to a cluster node [source,shell] vi ~/.ssh/cluster-key.pem chmod 400 ~/.ssh/cluster-key.pem * test the ssh connection to the node (and then accept new host) [source,shell] ssh admin@$MASTER_IP -i ~/.ssh/cluster-key.pem == Tips === Kubectl autocompletion You can add autocompletion to kubectl commands. [source,shell] sudo apt-get install bash-completion sudo kubectl completion bash >/etc/bash_completion.d/kubectl === Kubectl aliases link:https://containerized.me/600-kubectl-aliases-for-devops-ninjas/[This article] mentions link:https://github.com/ahmetb/kubectl-aliases/blob/master/.kubectl_aliases[this repo] for alias list. Let's use link:https://github.com/jessestuart/kubectl-aliases[this fork] with more aliases. * Clone the alias file [source,shell] curl -o ~/.kubectl_aliases https://raw.githubusercontent.com/jessestuart/kubectl-aliases/master/.kubectl_aliases * replace `kubectl` with `kctl`, for use together with print command [source,shell] sed -i 's/kubectl/kctl/g' ~/.kubectl_aliases * Add it to `.bashrc`, with command print and some custom aliases [source,shell] ---- cat <> ~/.bashrc [ -f ~/.kubectl_aliases ] && source ~/.kubectl_aliases # k8s change namespace alias kn='kctl config set-context $(kubectl config current-context) --namespace' # k8s get events sorted by last time alias kgel='kubectl get events --sort-by=.lastTimestamp' # k8s get events sorted by creation time alias kgec='kubectl get events --sort-by=.metadata.creationTimestamp' # print kubectl command and then execute it function kctl() { echo "+ kubectl $@"; command kubectl $@; } EOT ---- === Latest kubectl patch To get the latest kubectl patch for a Kubernetes major.minor version, visit the link:https://github.com/kubernetes/kubectl[release page of Kubectl Github repo]. === Tools ==== K9S : terminal UI image:https://raw.githubusercontent.com/derailed/k9s/master/assets/screen_po.png[] * Installation [source,shell] cd /tmp/ curl -L https://github.com/derailed/k9s/releases/download/0.5.2/k9s_0.5.2_Linux_x86_64.tar.gz | tar zx sudo mv k9s /usr/bin/ ==== Kubectx : context switching ifndef::backend-pdf[] * Demo image:https://raw.githubusercontent.com/ahmetb/kubectx/master/img/kubectx-demo.gif[{3q-width}] endif::[] * Installation [source,shell] sudo curl -o /usr/bin/kubectx https://raw.githubusercontent.com/ahmetb/kubectx/master/kubectx sudo chmod +x /usr/bin/kubectx * Autocompletion .bash [source,shell] sudo apt-get install bash-completion sudo curl -o /etc/bash_completion.d/kubectx https://raw.githubusercontent.com/ahmetb/kubectx/master/completion/kubectx.bash ==== Kubens : namespace switching ifndef::backend-pdf[] * Demo image:https://raw.githubusercontent.com/ahmetb/kubectx/master/img/kubens-demo.gif[{3q-width}] endif::[] * Installation [source,shell] sudo curl -o /usr/bin/kubens https://raw.githubusercontent.com/ahmetb/kubectx/master/kubens sudo chmod +x /usr/bin/kubens * Autocompletion WARNING: not working under AWS [source,shell] sudo apt-get install bash-completion sudo curl -o /etc/bash_completion.d/kubens https://raw.githubusercontent.com/ahmetb/kubectx/master/completion/kubens.bash ==== Kubeval : manifest validator link:https://github.com/instrumenta/kubeval[kubeval] is a tool for validating a Kubernetes YAML or JSON configuration file. [source,shell] cd /tmp wget https://github.com/instrumenta/kubeval/releases/latest/download/kubeval-linux-amd64.tar.gz tar xf kubeval-linux-amd64.tar.gz sudo cp kubeval /usr/bin ==== Stern : multi-pods tailing * Installation [source,shell] sudo curl -L -o /usr/bin/stern https://github.com/wercker/stern/releases/download/1.10.0/stern_linux_amd64 sudo chmod +x /usr/bin/stern * Example [source,shell] stern --all-namespaces . --since 1m ==== Kubectl-debug : sidecar debugging pod ifndef::backend-pdf[] * Demo image:https://raw.githubusercontent.com/aylei/kubectl-debug/master/docs/kube-debug.gif[{3q-width}] endif::[] link:https://github.com/aylei/kubectl-debug[kubectl-debug] is an out-of-tree solution for troubleshooting running pods, which allows you to run a new container in running pods for debugging purpose. Follow link:https://github.com/aylei/kubectl-debug#install-the-kubectl-debug-plugin[official instructions] to install it. ==== Microk8s link:https://microk8s.io/[Microk8s] is like link:https://kubernetes.io/docs/setup/learning-environment/minikube/[Minikube] but without VM and without using host Docker context. TIP: Multiple nodes on a single host is possible, yet a bit tricky, see link:https://github.com/ubuntu/microk8s/issues/732[this issue]. * Installation [source,shell] sudo snap install microk8s --classic sudo usermod -a -G microk8s $USER su - $USER * To use it like a regular cluster among others, add this to your `.bashrc` : [source,shell] export PATH=/snap/bin:$PATH microk8s.kubectl config view --raw > $HOME/.kube/microk8s.config export KUBECONFIG=$HOME/.kube/config export KUBECONFIG=$KUBECONFIG:$HOME/.kube/microk8s.config * To install dns, storage and dashboard, and get the token : [source,shell] microk8s.enable dns storage ingress dashboard token=$(microk8s.kubectl -n kube-system get secret | grep default-token | cut -d " " -f1) microk8s.kubectl -n kube-system describe secret $token kubectl proxy Now you can serve the dashboard with `kubectl proxy` and then link:http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login[browse the dashboard] using the token. * To list plugins [source,shell] microk8s.kubectl status * To delete all [source,shell] microk8s.kubectl reset === Useful commands * Switch namespace [source,shell] kubectl config set-context $(kubectl config current-context) --namespace=kube-system * View an applied yaml [source,shell] kubectl apply view-last-applied svc/client * Get pod's name [source,shell] kubectl get pods --show-labels kubectl get pod -l app=sonarqube -o jsonpath="{.items[0].metadata.name}" * Decode a secret [source,shell] kubectl get secrets/my-secret --template={{.data.password}} | base64 --decode === ClusterIP vs NodePort vs Ingress link:https://medium.com/google-cloud/kubernetes-nodeport-vs-loadbalancer-vs-ingress-when-should-i-use-what-922f010849e0[Great article] on the subject. === Restart deployment pods To have fresh pods, you can kill them. But you can also scale down and up again the deployment. WARNING: This induces a service interruption [source,shell] kubectl scale deployment kubernetes-dashboard --replicas=0 deployment.extensions "kubernetes-dashboard" scaled [source,shell] kubectl scale deployment kubernetes-dashboard --replicas=1 deployment.extensions "kubernetes-dashboard" scaled === Getting logged in to ECR for image pulling [source,shell] aws ecr get-login docker login -u AWS -p -e none https://.dkr.ecr..amazonaws.com [source,shell] docker login -u AWS -p https://.dkr.ecr..amazonaws.com === Force updating image with same tag ==== Manually TIP: The ECR secret has to be updated in the last 12H To force update an image, if the deployment is configured as `imagePullPolicy: Always`, deleting the pod will pull the new image. [source,shell] kubectl get pods -w NAME READY STATUS RESTARTS AGE api-dpl-6b5698b848-vddc8 1/1 Running 0 2m client-dpl-69786bdd8f-5zcnd 1/1 Running 0 13d db-dpl-6874657d-w6mzb 1/1 Running 0 27m kibana-dpl-55fdf8776f-k45pm 1/1 Running 0 26m [source,shell] kubectl delete pod api-dpl-6b5698b848-vddc8 pod "api-dpl-6b5698b848-bthbs" deleted [source,shell] kubectl get pods -w NAME READY STATUS RESTARTS AGE api-dpl-6b5698b848-bthbs 1/1 Running 0 13d client-dpl-69786bdd8f-5zcnd 1/1 Running 0 13d db-dpl-6874657d-w6mzb 1/1 Running 0 54s kibana-dpl-55fdf8776f-k45pm 0/1 ContainerCreating 0 3s kibana-dpl-8d76c6dd8-cmrvz 0/1 Terminating 0 14d kibana-dpl-8d76c6dd8-cmrvz 0/1 Terminating 0 14d kibana-dpl-8d76c6dd8-cmrvz 0/1 Terminating 0 14d kibana-dpl-55fdf8776f-k45pm 1/1 Running 0 13s [source,shell] kubectl describe pod api-dpl-6b5698b848-vddc8 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 38s default-scheduler Successfully assigned develop/api-dpl-6b5698b848-vddc8 to ***.compute.internal Normal Pulling 37s kubelet, ***.compute.internal pulling image "694430786501.dkr.ecr.eu-west-3.amazonaws.com/adx/adx-api:develop" Normal Pulled 14s kubelet, ***.compute.internal Successfully pulled image "694430786501.dkr.ecr.eu-west-3.amazonaws.com/adx/adx-api:develop" Normal Created 13s kubelet, ***.compute.internal Created container Normal Started 13s kubelet, ***.compute.internal Started container [source,shell] kubectl logs api-dpl-6b5698b848-vddc8 2019-02-08 11:04:50.939 INFO 1 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService 'applicationTaskExecutor' 2019-02-08 11:04:51.578 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): XXX (http) with context path '' 2019-02-08 11:04:51.585 INFO 1 --- [ main] com.biomerieux.adxapi.AdxApiApplication : Started AdxApiApplication in 9.462 seconds (JVM running for 10.442) ==== In CI with deployment file * Add a variable in deployment file .my-app.svc-dpl.yml [source,yaml] ---- spec: replicas: 1 selector: matchLabels: app: sltback template: metadata: labels: app: sltback # to force rollout on apply commit: ${CI_COMMIT_SHA} <1> ---- <1> Unique value tied to CI, for example commit ID, here using Gitlab CI variables ---- ---- * During CI, replace the value [source,shell] apk --update add gettext # or apt-get, depending on the OS envsubst < "my-app-with-variable.svc-dpl.yml" > "my-app.svc-dpl.yml" * Apply the deployment file === Example application Here is a sample application, link:https://github.com/reactiveops/k8s-workshop[k8s-workshop], backed by a link:https://www.youtube.com/watch?v=H-FKBoWTVws[youtube video]. It's a scalable webapp with a Redis cluster. To have some replicas at start, edit auto scaling files. [source,shell] git clone https://github.com/reactiveops/k8s-workshop.git kubectl create namespace k8s-workshop kubectl label namespace k8s-workshop istio-injection=enabled cd k8s-workshop/complete kubectl apply -f 01_redis/ kubectl apply -f 02_webapp/ kubectl get pods --watch NAME READY STATUS RESTARTS AGE redis-primary-7566957b9c-6rzb6 1/1 Running 0 4h10m redis-replica-7fd949b9d-db2rf 1/1 Running 0 4h10m redis-replica-7fd949b9d-rz7nb 1/1 Running 0 108m webapp-5498668448-8hcgq 1/1 Running 0 4h10m webapp-5498668448-ghv22 1/1 Running 0 107m webapp-5498668448-jdx9j 1/1 Running 0 107m == Helm Helm helps you manage Kubernetes applications — Helm Charts help you define, install, and upgrade even the most complex Kubernetes application. === Installation [source,shell] curl -s https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash === Initialization for current cluster [source,shell] kubectl create serviceaccount tiller -n kube-system kubectl create clusterrolebinding tiller-is-admin --clusterrole=cluster-admin --serviceaccount=kube-system:tiller helm init --service-account tiller --upgrade [TIP] ==== if already initialized by mistake, use this command : [source,shell] helm reset --force ==== ==== Incubator repository [source,shell] helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator helm repo update === Chart installation example [source,shell] helm install stable/my-chart --name mc WARNING: if no name provided, an auto-generated name is used, like `wonderful-rabbit`. It is prefixed to all objects, so better shorten it by providing something. === Chart uninstallation [source,shell] helm delete --purge mc === Multi-tenant By default, Helm cannot install multiple charts with the same release name, which can become a problem when having multiple environment in the same cluster. The solution is having link:https://gist.github.com/venezia/69e9bd25d79fec9834fe1c5b589d4206[multiple tillers], one in each namespace/environment. == NGINX Ingress Controller [source,shell] helm install stable/nginx-ingress --name ngc --namespace kube-system Then get the public DNS with : [source,shell] kubectl get svc NOTE: Works out of the box with AWS/EKS === Cert manager For TLS, a Certificate manager is needed. Steps are taken from link:https://cert-manager.readthedocs.io/en/latest/getting-started/install.html#steps[official install documentations]. * Install the manager [source,shell] kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.7/deploy/manifests/00-crds.yaml kubectl create namespace cm kubectl label namespace cm certmanager.k8s.io/disable-validation=true helm repo add jetstack https://charts.jetstack.io helm repo update helm install --name cm --namespace cm --version v0.7.0 jetstack/cert-manager * Check installation using link:https://cert-manager.readthedocs.io/en/latest/getting-started/install.html#verifying-the-installation[official procedure]. image::turnoff-serverless-economic-impact.png[{half-width}] == Istio Connect, secure, control, and observe services in a Kubernetes Cluster. [NOTE] ==== * Installation inspired from link:https://istio.io/docs/setup/kubernetes/install/helm/[Official documentation]. * Default Istio installation does not install all modules. Istio is installed here in `demo` mode to have all. See link:https://istio.io/docs/setup/kubernetes/additional-setup/config-profiles/[profiles coverage]. ==== === Prerequisites Have <> installed, locally and cluster side. === Installation * Download and initialise Istio using Helm [source,shell] curl -L https://git.io/getLatestIstio | ISTIO_VERSION=1.1.0 sh - cd istio-1.1.0 export PATH=$PWD/bin:$PATH helm install install/kubernetes/helm/istio-init --name istio-init --namespace istio * Check that bellow command returns `53` (since cert-manager is not enabled, else it would be `58`) [source,shell] kubectl get crds | grep 'istio.io\|certmanager.k8s.io' | wc -l * Finish installation with Helm [source,shell] helm template install/kubernetes/helm/istio --name istio --namespace istio --values install/kubernetes/helm/istio/values-istio-demo.yaml | kubectl apply -f - * Check Istio pods startup [source,shell] kubectl get pods -n istio -w NAME READY STATUS RESTARTS AGE grafana-57586c685b-m67t6 1/1 Running 0 2d19h istio-citadel-645ffc4999-7g9rl 1/1 Running 0 2d19h istio-cleanup-secrets-1.1.0-ks4fb 0/1 Completed 0 2d19h istio-egressgateway-5c7fd57fdb-spwlp 1/1 Running 0 2d19h istio-galley-978f9447f-zj8pd 1/1 Running 0 2d19h istio-grafana-post-install-1.1.0-wjn4m 0/1 Completed 0 2d19h istio-ingressgateway-8ccdc79bc-p67np 1/1 Running 0 2d19h istio-init-crd-10-t6pwq 0/1 Completed 0 2d19h istio-init-crd-11-j788x 0/1 Completed 0 2d19h istio-pilot-679c6b45b8-5fbw7 2/2 Running 0 2d19h istio-policy-fccd56fd-8qtlb 2/2 Running 2 2d19h istio-security-post-install-1.1.0-p9k29 0/1 Completed 0 2d19h istio-sidecar-injector-6dcc9d5c64-szrcw 1/1 Running 0 2d19h istio-telemetry-9bcfc78bd-mfwsh 2/2 Running 1 2d19h istio-tracing-656f9fc99c-r9n7d 1/1 Running 0 2d19h kiali-69d6978b45-t7tdn 1/1 Running 0 2d19h prometheus-66c9f5694-c548z 1/1 Running 0 2d19h === Activation for a namespace Namespaces have to be explicitely labelled to be monitored by Istio [source,shell] kubectl label namespace k8s-workshop istio-injection=enabled === UIs If your cluster is behind a bastion, forward cluster API port to your local machine [source,shell] ssh -L 6443:10.250.202.142:6443 ubuntu@35.180.231.227 -N -i ~/.ssh/traffic_staging_k8s_fr.pem ==== Kiali Kiali project provides answers to the questions: What microservices are part of my Istio service mesh and how are they connected. image::istio-kiali.jpg[{half-width}] * forward the port [source,shell] kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=kiali -o jsonpath='{.items[0].metadata.name}') 20001:20001 & * access UI at http://localhost:20001 ==== Jaeger / OpenTracing Open source, end-to-end distributed tracing. Monitor and troubleshoot transactions in complex distributed systems. image::istio-jaeger.jpg[{half-width}] * forward the port [source,shell] kubectl port-forward -n istio-system $(kubectl get pod -n istio-system -l app=jaeger -o jsonpath='{.items[0].metadata.name}') 16686:16686 & * access UI at http://localhost:16686 ==== Grafana & Prometheus Grafana in front of Prometheus, for analytics and monitoring the cluster. image::istio-grafana.jpg[{half-width}] * forward the port [source,shell] kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=grafana -o jsonpath='{.items[0].metadata.name}') 3000:3000 & * access UI at http://localhost:3000/dashboard/db/istio-mesh-dashboard == Powerfulseal link:https://github.com/bloomberg/powerfulseal[PowerfulSeal] adds chaos to your Kubernetes clusters, so that you can detect problems in your systems as early as possible. It kills targeted pods and takes VMs up and down. WARNING: Powerfulseal uses collected cluster IPs, so if the cluster is behind a bastion, installation and usage must be on the bastion. * Install Powerfulseal [source,shell] sudo pip install powerfulseal * Create a policy file. Here we focus on pods failure .~/seal-policy.yml [source,yaml] ---- config: minSecondsBetweenRuns: 1 maxSecondsBetweenRuns: 30 nodeScenarios: [] podScenarios: - name: "delete random pods in default namespace" match: - namespace: name: "k8s-workshop" filters: - randomSample: size: 1 actions: - kill: probability: 0.77 force: true ---- * Launch powerfulseal [source,shell] seal autonomous --kubeconfig ~/.kube/config --no-cloud --inventory-kubernetes --ssh-allow-missing-host-keys --remote-user ubuntu --ssh-path-to-private-key ~/.ssh/traffic_staging_k8s_fr.pem --policy-file ~/seal-policy.yml --host 0.0.0.0 --port 30100 * watch your pods failing and restarting [source,shell] kubectl get pods --watch NAME READY STATUS RESTARTS AGE redis-primary-7566957b9c-6rzb6 1/1 Running 1 4h10m redis-replica-7fd949b9d-db2rf 1/1 Running 0 4h10m redis-replica-7fd949b9d-rz7nb 1/1 Running 0 108m webapp-5498668448-8hcgq 1/1 Running 3 4h10m webapp-5498668448-ghv22 1/1 Running 1 107m webapp-5498668448-jdx9j 1/1 Running 1 107m == Troobleshooting === User "system:anonymous" cannot get resource With some script under AWS, you can have this error User \"system:anonymous\" cannot get resource This command grants admin rights to anonymous users : [source,shell] kubectl create clusterrolebinding cluster-system-anonymous --clusterrole=cluster-admin --user=system:anonymous WARNING: temporary when installing, and only if your cluster is not accessible from internet == Full project example === Target architecture [plantuml,archi-doc,svg] .... left to right direction actor user cloud cluster { frame frontend as "client\n(front-end)" agent backend as "api\n(back-end)" database elastic as "elasticsearch\n(database)" frame kibana as "kibana\n(database UI)" } cloud external { agent auth0 } user --> frontend : 10080:80 frontend --> auth0 admin --> frontend : 10080:80 admin ..> kibana : " by port forwarding :5601" frontend --> backend : :8080 backend --> elastic : :9200 backend --> auth0 kibana --> elastic : :9200 .... === Environments The AWS cluster will host multiple environments, so we first create and use a `develop` namespace : [source,shell] ---- kubectl create namespace develop kubectl config current-context ---- arn:aws:eks:::cluster/ [source,shell] kubectl config set-context arn:aws:eks:::cluster/ --namespace=develop === Deployments Kubernetes deployments and services are stored in the same file for each module. ==== Elasticsearch We start with the elasticsearch database. .Some explanation : * This is the OSS image, simpler, no need for X-Pack * Note the system command in `initContainers` section ===== Deployment file .db.svc-dpl.yml [%collapsible] ==== [source,yaml] ------ include::../samples/kubernetes/adx.db.svc-dpl.yml[] ------ ==== ===== Commands Launch (or update) the deployment : [source,shell] kubectl apply -f adx.db.svc-dpl.yml service/api created deployment.extensions/api-dpl created [source,shell] kubectl get rs NAME DESIRED CURRENT READY AGE db-dpl-5c767f46c7 1 1 1 32m [source,shell] kubectl get pods NAME READY STATUS RESTARTS AGE db-dpl-5c767f46c7-tkqkv 1/1 Running 0 32m ==== Kibana Kibana is included, only for elasticsearch administration in test environments. .Some explanation : * This is the OSS image, simpler, no need for X-Pack * This will not be accessible from external network, for security reasons ===== Deployment file .kibana.svc-dpl.yml [%collapsible] ==== [source,yaml] ------ include::../samples/kubernetes/adx.kibana.svc-dpl.yml[] ------ ==== ===== Commands Launch (or update) the deployment : [source,shell] kubectl apply -f adx.kibana.svc-dpl.yml To access the UI, we use port forwarding in a dedicated shell : [source,shell] kubectl port-forward svc/kibana 5601:5601 The Kibana UI is now available at http://localhost:5601 ==== Api / backend .Some explanation : * The backend is pulled from AWS/ECR registry ===== Prerequisites * Get the full image name in EKR ** Got to AWS Admin UI ** Choose the zone containing your registry ** btn:[Services] -> btn:[ECR] -> api repository ** Get the `Image URI` * get the registry password [source,shell] ---- aws ecr get-login docker login -u AWS -p -e none https://.dkr.ecr..amazonaws.com ---- * create/update the secret using it [source,shell] ---- kubectl delete secret ecr-registry-secret kubectl create secret docker-registry ecr-registry-secret --docker-username="AWS" --docker-password="" --docker-server=".dkr.ecr..amazonaws.com" --docker-email="my.email@my-provider.com" ---- It is valid for 12 hours. Now we can update the file and deploy it. ===== Deployment file .api.svc-dpl.yml [%collapsible] ==== [source,yaml] ------ include::../samples/kubernetes/adx.api.svc-dpl.yml[] ------ ==== ===== Commands Launch (or update) the deployment : [source,shell] ---- kubectl apply -f adx.api.svc-dpl.yml ---- ==== Client / frontend ===== Prerequisites Same as Api module. ===== Deployment file .client.ing-svc-dpl.yml [%collapsible] ==== [source,yaml] ------ include::../samples/kubernetes/adx.client.ing-svc-dpl.yml[] ------ ==== ===== Commands Launch (or update) the deployment : [source,shell] ---- kubectl apply -f adx.client.svc-dpl.yml ---- ===== Access the frontend in a browser * Get the host/port [source,shell] ---- get services -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR adx-api ClusterIP 10.100.78.159 8080/TCP 2h app=api,group=adx client LoadBalancer 10.100.145.183 .us-east-2.elb.amazonaws.com 10080:30587/TCP 2h app=client,group=adx elasticsearch ClusterIP 10.100.15.82 9200/TCP 23h app=db,group=adx kibana ClusterIP 10.100.114.147 5601/TCP 23h app=kibana,group=adx ---- * Go to http://..elb.amazonaws.com:10080 image::turnoff-before-devops-after-devops.png[{half-width}] image::turnoff-adam-eve.jpg[{half-width}] image::turnoff-enterprise-vs-startup-journey-to-cloud.png[{full-width}]