Daniel's Blog

Upgrading old rook-ceph in our colo kubernetes from 1.2.7 to 1.3.11

Our colo has been using rook-ceph for a long time. However, it has never really worked. The throughput we get is order of magnitudes less than the speed of writing to the disks. Eventually we moved our production to Google Cloud Platform, but our beta sites, build servers, and test servers are all dealing with slow disks. So the task is to upgrade the rook-ceph cluster to get access to some extra features and see if we can tweak it to work. If not, we'll either abandon rook in favor of a bare metal ceph cluster, or abandon ceph in favor of a kubernetes NFS operator.

These are my notes on upgrading from version 1.2.7 to the latest.

check the current version

kubectl -n rook-ceph get deployment -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq

For example:

$ kubectl -n rook-ceph get deployment -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
rook-version=v1.2.7

The rook cluster is on version 1.2.7

Upgrading to the next version:

Check the tags in github to see the next version:

git tag -l

For example:

$ git tag -l
v0.1.0
v0.1.1
v0.2.0
v0.2.1
v0.2.2
v0.3.0
v0.3.1
v0.4.0
v0.5.0
v0.5.1
v0.6.0
v0.6.1
v0.6.2
v0.7.0
v0.7.1
v0.8.0
v0.8.1
v0.8.2
v0.8.3
v0.9.0
v0.9.1
v0.9.2
v0.9.3
v1.0.0
v1.0.1
v1.0.2
v1.0.3
v1.0.4
v1.0.5
v1.0.6
v1.1.0
v1.1.0-beta.0
v1.1.0-beta.1
v1.1.1
v1.1.2
v1.1.3
v1.1.4
v1.1.5
v1.1.6
v1.1.7
v1.1.8
v1.1.9
v1.2.0
v1.2.0-beta.0
v1.2.0-beta.1
v1.2.1
v1.2.2
v1.2.3
v1.2.4
v1.2.5
v1.2.6
v1.2.7
v1.3.0
v1.3.0-beta.0
v1.3.0-beta.1
v1.3.1
v1.3.10
v1.3.11
v1.3.2
v1.3.3
v1.3.4
v1.3.5
v1.3.6
v1.3.7
v1.3.8
v1.3.9
v1.4.0
v1.4.0-alpha.0
v1.4.0-beta.0
v1.4.1
v1.4.2
v1.4.3
v1.4.4
v1.4.5
v1.4.6
v1.4.7
v1.4.8
v1.4.9
v1.5.0

Now checkout the tag for the next version which for us is 1.3.11

$ git checkout tags/v1.3.11
Note: checking out 'tags/v1.3.11'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at b28b21a03 Merge pull request #6241 from travisn/backport-ci-disk

Upgrade rbac and crds

Follow instructions to upgrade crds and other objects in kubernetes:

https://rook.io/docs/rook/v1.3/ceph-upgrade.html#2-update-the-rbac-and-crds

$ kubectl apply -f upgrade-from-v1.2-apply.yaml -f upgrade-from-v1.2-crds.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
clusterrole.rbac.authorization.k8s.io/cephfs-external-provisioner-runner-rules configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
clusterrole.rbac.authorization.k8s.io/rbd-external-provisioner-runner-rules configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
role.rbac.authorization.k8s.io/rook-ceph-mgr configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
clusterrole.rbac.authorization.k8s.io/rook-ceph-global-rules configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
configmap/rook-ceph-operator-config configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
role.rbac.authorization.k8s.io/rook-ceph-system configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/cephclients.ceph.rook.io configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/volumes.rook.io configured

Update the rook operator

kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.3.11

For example:

$ kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.3.11
deployment.apps/rook-ceph-operator image updated

Monitor the operatior redoing the deployments.

watch --exec kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'

For example:

Every 2.0s: kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath={range .items[*]}{.metadata.name}{"  \treq/up...  gold-1: Mon Aug 15 23:13:50 2022

rook-ceph-crashcollector-gold-1         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-2         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-crashcollector-gold-3         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-crashcollector-gold-4         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-5         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-6         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mds-myfs-a                    req/upd/avl: 1/1/1      rook-version=v1.2.7
rook-ceph-mds-myfs-b                    req/upd/avl: 1/1/1      rook-version=v1.2.7
rook-ceph-mgr-a                         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-mon-a                         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-mon-h                         req/upd/avl: 1/1/1      rook-version=v1.2.7
rook-ceph-mon-i                         req/upd/avl: 1/1/1      rook-version=v1.2.7
rook-ceph-mon-j                         req/upd/avl: 1/1/1      rook-version=v1.2.7
rook-ceph-osd-1                         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-osd-2                         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-osd-3                         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-osd-5                         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-osd-6                         req/upd/avl: 1/1/1      rook-version=v1.2.7
rook-ceph-osd-9                         req/upd/avl: 1/1/1      rook-version=v1.2.7

Eventually:

kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath={range .items[*]}{.metadata.name}{"  \treq/up...  gold-1: Tue Aug 16 00:26:23 2022

rook-ceph-crashcollector-gold-1         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-2         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-crashcollector-gold-3         req/upd/avl: 1/1/       rook-version=v1.2.7
rook-ceph-crashcollector-gold-4         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-5         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-6         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mds-myfs-a    req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mds-myfs-b    req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mgr-a         req/upd/avl: 1/1/       rook-version=v1.3.11
rook-ceph-mon-h         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mon-i         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mon-k         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-1         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-2         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-3         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-5         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-6         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-9         req/upd/avl: 1/1/1      rook-version=v1.3.11

The two crashcollectors are from unresponsive nodes that have been cordoned. They need to be removed from the cluster, but it was decided to attempt upgrades of rook first.

As that is the case the deployment was deleted:

$ kubectl delete deployment rook-ceph-crashcollector-gold-2
deployment.apps "rook-ceph-crashcollector-gold-2" deleted

$ kubectl delete deployment rook-ceph-crashcollector-gold-3
deployment.apps "rook-ceph-crashcollector-gold-3" deleted

Now it all looks good:

kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath={range .items[*]}{.metadata.name}{"  \treq/up...  gold-1: Tue Aug 16 00:29:23 2022

rook-ceph-crashcollector-gold-1         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-4         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-5         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-crashcollector-gold-6         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mds-myfs-a    req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mds-myfs-b    req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mgr-a         req/upd/avl: 1/1/       rook-version=v1.3.11
rook-ceph-mon-h         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mon-i         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-mon-k         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-1         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-2         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-3         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-5         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-6         req/upd/avl: 1/1/1      rook-version=v1.3.11
rook-ceph-osd-9         req/upd/avl: 1/1/1      rook-version=v1.3.11

Check all deployments to ensure everything is upgraded:

$ kubectl -n rook-ceph get deployment -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq

For example:

$ kubectl -n rook-ceph get deployment -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
rook-version=v1.3.11

Upgrade ceph tools

I don't know why this is optional as I need to use it all the time. Like literally every time I need to look at the cluster I have to use this tool to find out what's going on. But this upgraded it:

kubectl -n rook-ceph set image deploy/rook-ceph-tools rook-ceph-tools=rook/ceph:v1.3.11