Upgrading rook from v1.5.12 to v1.6.11
Getting the release
Cloning the repo
git clone https://github.com/rook/rook.git
Checking for v1.6
$ git tag -l | grep 1.6
v1.1.6
v1.6.0
v1.6.0-alpha.0
v1.6.0-beta.0
v1.6.1
v1.6.10
v1.6.11
v1.6.2
v1.6.3
v1.6.4
v1.6.5
v1.6.6
v1.6.7
v1.6.8
v1.6.9
Checking out v1.6.11
$ git checkout tags/v1.6.11
Previous HEAD position was a40bfdd62 Merge pull request #8049 from travisn/release-1.5.12
HEAD is now at 4f245c3ea Merge pull request #9210 from travisn/release-1.6.11
Upgrading to v1.6
Instructions are here: https://rook.io/docs/rook/v1.6/ceph-upgrade.html
Updating the Common Resource Definitions (CRDs)
kubectl apply -f common.yaml
kubectl replace -f crds.yaml
kubectl apply -f crds.yaml
The crds had to be replaced as our cluster was created with version 1.2.
Updating the rook operator version:
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.6.11
The new operator was created properly:
rook-ceph-operator-67cd46f7c7-bwnf5 1/1 Running 0 5h50m
rook-ceph-operator-7c8f969cb7-bdnxw 0/1 ContainerCreating 0 7s
And it began processing the upgrade:
watch --exec kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
Every 2.0s: kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath={range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.u... gold-1: Wed Aug 17 03:54:57 2022
rook-ceph-crashcollector-gold-1 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-crashcollector-gold-4 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-crashcollector-gold-5 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-crashcollector-gold-6 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mds-myfs-a req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-mds-myfs-b req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-mon-h req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-mon-i req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-mon-k req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-osd-3 req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-osd-5 req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-osd-6 req/upd/avl: 1/1/1 rook-version=v1.5.12
rook-ceph-osd-9 req/upd/avl: 1/1/1 rook-version=v1.5.12
After about 10 minutes everything was updated
Every 2.0s: kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath={range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.u... gold-1: Wed Aug 17 04:05:21 2022
rook-ceph-crashcollector-gold-1 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-crashcollector-gold-4 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-crashcollector-gold-5 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-crashcollector-gold-6 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mds-myfs-a req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mds-myfs-b req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mon-h req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mon-i req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mon-k req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-osd-2 req/upd/avl: 1/1/ rook-version=v1.6.11
rook-ceph-osd-3 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-osd-5 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-osd-6 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-osd-9 req/upd/avl: 1/1/1 rook-version=v1.6.11
Upgrading the toolbox
Just had to delete the deployment and reapply the yaml
$ kubectl delete deployment rook-ceph-tools
deployment.apps "rook-ceph-tools" deleted
$ kubectl apply -f toolbox.yaml
deployment.apps/rook-ceph-tools created
Error in one pod
Checking the pods after the update I found one was in Init:CrashLoopBackOff
rook-ceph-osd-1-768b4d5cb8-xvdwg 1/1 Running 0 44m 10.233.90.37 gold-6 <none> <none>
rook-ceph-osd-2-54fb868d5c-9m5qx 0/1 Init:CrashLoopBackOff 10 31m 10.233.97.118 gold-1 <none> <none>
rook-ceph-osd-3-56f6dbbfbd-sc7bs 1/1 Running 0 44m 10.233.90.23 gold-6 <none> <none>
rook-ceph-osd-5-5db775d797-j9gss 1/1 Running 0 42m 10.233.97.170 gold-1 <none> <none>
rook-ceph-osd-6-679bf6779-4zbdr 1/1 Running 0 42m 10.233.99.68 gold-4 <none> <none>
rook-ceph-osd-9-85987bcb4c-n564h 1/1 Running 0 43m 10.233.99.166 gold-4 <none> <none>
This indicates that the Init Container is failing.
Checking it's logs shows this:
$ kubectl logs rook-ceph-osd-2-54fb868d5c-9m5qx activate
+ OSD_ID=2
+ OSD_UUID=64af65b8-7306-46fb-af0e-9d14157ab122
+ OSD_STORE_FLAG=--bluestore
+ OSD_DATA_DIR=/var/lib/ceph/osd/ceph-2
+ CV_MODE=lvm
+ DEVICE=/dev/ceph-98eeb77a-7962-45cb-868d-2fb0f73c793d/osd-data-195e8630-6c2a-410a-a457-dca290ec0e30
+ ceph -n client.admin auth get-or-create osd.2 mon 'allow profile osd' mgr 'allow profile osd' osd 'allow *' -k /etc/ceph/admin-keyring-store/keyring
Error EINVAL: key for osd.2 exists but cap mgr does not match
Using ceph toolbox to check the cap I found this
$ ceph auth ls
mds.myfs-a
key: <hidden>
caps: [mds] allow
caps: [mon] allow profile mds
caps: [osd] allow *
mds.myfs-b
key: <hidden>
caps: [mds] allow
caps: [mon] allow profile mds
caps: [osd] allow *
osd.0
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.10
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.11
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.12
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.2
key: <hidden>
caps: [mon] allow profile osd
caps: [osd] allow *
osd.3
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.4
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.5
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.6
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.7
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.8
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.9
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: <hidden>
caps: [mds] allow *
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: <hidden>
caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
key: <hidden>
caps: [mon] allow profile bootstrap-mgr
client.bootstrap-osd
key: <hidden>
caps: [mon] allow profile bootstrap-osd
client.bootstrap-rbd
key: <hidden>
caps: [mon] allow profile bootstrap-rbd
client.bootstrap-rbd-mirror
key: <hidden>
caps: [mon] allow profile bootstrap-rbd-mirror
client.bootstrap-rgw
key: <hidden>
caps: [mon] allow profile bootstrap-rgw
client.crash
key: <hidden>
caps: [mgr] allow rw
caps: [mon] allow profile crash
client.csi-cephfs-node
key: <hidden>
caps: [mds] allow rw
caps: [mgr] allow rw
caps: [mon] allow r
caps: [osd] allow rw tag cephfs *=*
client.csi-cephfs-provisioner
key: <hidden>
caps: [mgr] allow rw
caps: [mon] allow r
caps: [osd] allow rw tag cephfs metadata=*
client.csi-rbd-node
key: <hidden>
caps: [mgr] allow rw
caps: [mon] profile rbd
caps: [osd] profile rbd
client.csi-rbd-provisioner
key: <hidden>
caps: [mgr] allow rw
caps: [mon] profile rbd
caps: [osd] profile rbd
mgr.a
key: <hidden>
caps: [mds] allow *
caps: [mon] allow profile mgr
caps: [osd] allow *
installed auth entries:
the relevant difference being:
osd.2
key: <hidden>
caps: [mon] allow profile osd
caps: [osd] allow *
osd.3
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
so a cap is missing.
To add it I did:
ceph auth caps osd.2 mon 'allow profile osd' mgr 'allow profile osd' osd 'allow *'
Now it shows this:
$ ceph auth ls
...
osd.2
key: <hidden>
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
...
This brought it into a running state:
$ kubectl get pods | grep osd
rook-ceph-osd-1-768b4d5cb8-xvdwg 1/1 Running 0 55m
rook-ceph-osd-2-54fb868d5c-snlwz 1/1 Running 0 35s
rook-ceph-osd-3-56f6dbbfbd-sc7bs 1/1 Running 0 54m
rook-ceph-osd-5-5db775d797-j9gss 1/1 Running 0 53m
rook-ceph-osd-6-679bf6779-4zbdr 1/1 Running 0 53m
rook-ceph-osd-9-85987bcb4c-n564h 1/1 Running 0 54m
And the cluster was healthy again
$ kubectl exec rook-ceph-tools-6dcbb78845-6xkk8 -- ceph status
cluster:
id: 04461f64-e630-4891-bcea-0de24cf06c51
health: HEALTH_OK
services:
mon: 3 daemons, quorum h,i,k (age 56m)
mgr: a(active, since 55m)
mds: myfs:1 {0=myfs-a=up:active} 1 up:standby-replay
osd: 13 osds: 6 up (since 36s), 6 in (since 6d)
data:
pools: 4 pools, 73 pgs
objects: 6.69M objects, 2.9 TiB
usage: 8.8 TiB used, 46 TiB / 55 TiB avail
pgs: 71 active+clean
1 active+clean+wait
1 active+clean+scrubbing
io:
client: 1.6 KiB/s rd, 33 KiB/s wr, 2 op/s rd, 4 op/s wr
recovery: 553 KiB/s, 1 objects/s