Upgrading rook from v1.3.11 to v1.4.9
Okay, so after upgrading from 1.2 to 1.3, it's time to upgrade from 1.3 to 1.4. The worry is that the liveness probes all had to be removed to upgrade the ceph libraries from Ceph Nautilus (14.2.8) to Octopus (15.2.13) so hopefully when the liveness probes get readed by the operator, they will work for the new version of Ceph.
First, clone the rook repo:
git clone https://github.com/rook/rook.git
Then check for the 1.4 versions
$ git tag -l | grep v1.4
v1.4.0
v1.4.0-alpha.0
v1.4.0-beta.0
v1.4.1
v1.4.2
v1.4.3
v1.4.4
v1.4.5
v1.4.6
v1.4.7
v1.4.8
v1.4.9
v1.4.9 is the latest so let's go for that one.
$ git checkout tags/v1.4.9
Previous HEAD position was b28b21a03 Merge pull request #6241 from travisn/backport-ci-disk
HEAD is now at 3bccbc9ef Merge pull request #6963 from travisn/release-1.4.9
running the upgrade scripts
Now there will be the upgrade scripts in the cluster/examples/kubernetes/ceph folder
kubectl delete -f upgrade-from-v1.3-delete.yaml
kubectl apply -f upgrade-from-v1.3-apply.yaml -f upgrade-from-v1.3-crds.yaml
For example:
$:~/github/rook/cluster/examples/kubernetes/ceph$ kubectl delete -f upgrade-from-v1.3-delete.yaml
clusterrole.rbac.authorization.k8s.io "cephfs-csi-nodeplugin-rules" deleted
clusterrole.rbac.authorization.k8s.io "cephfs-external-provisioner-runner-rules" deleted
clusterrole.rbac.authorization.k8s.io "rbd-csi-nodeplugin-rules" deleted
clusterrole.rbac.authorization.k8s.io "rbd-external-provisioner-runner-rules" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-cluster-mgmt-rules" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-global-rules" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-mgr-cluster-rules" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-mgr-system-rules" deleted
clusterrole.rbac.authorization.k8s.io "cephfs-csi-nodeplugin" deleted
clusterrole.rbac.authorization.k8s.io "cephfs-external-provisioner-runner" deleted
clusterrole.rbac.authorization.k8s.io "rbd-csi-nodeplugin" deleted
clusterrole.rbac.authorization.k8s.io "rbd-external-provisioner-runner" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-cluster-mgmt" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-global" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-mgr-cluster" deleted
clusterrole.rbac.authorization.k8s.io "rook-ceph-mgr-system" deleted
$:~/github/rook/cluster/examples/kubernetes/ceph$ kubectl apply -f upgrade-from-v1.3-apply.yaml -f upgrade-from-v1.3-crds.yaml
clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
clusterrole.rbac.authorization.k8s.io/rook-ceph-object-bucket configured
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrole.rbac.authorization.k8s.io/cephfs-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/cephfs-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/rbd-external-provisioner-runner created
serviceaccount/rook-ceph-admission-controller created
clusterrole.rbac.authorization.k8s.io/rook-ceph-admission-controller-role created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-admission-controller-rolebinding created
customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io configured
customresourcedefinition.apiextensions.k8s.io/cephclients.ceph.rook.io unchanged
customresourcedefinition.apiextensions.k8s.io/cephrbdmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io configured
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io unchanged
customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io configured
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io unchanged
customresourcedefinition.apiextensions.k8s.io/cephobjectrealms.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzonegroups.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzones.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io configured
customresourcedefinition.apiextensions.k8s.io/volumes.rook.io unchanged
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/objectbuckets.objectbucket.io configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/objectbucketclaims.objectbucket.io configured
Upgrading the operator
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.4.9
This change of the cluster triggered a new operator pod to be created:
rook-ceph-operator-7d7b96897b-mszxr 1/1 Running 0 40s
rook-ceph-operator-88cdb5f4b-k6xvw 0/1 Terminating 8 17h
And the cluster started updating quickly:
kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath={range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.... gold-1: Tue Aug 16 17:02:19 2022
rook-ceph-crashcollector-gold-1 req/upd/avl: 1/1/1 rook-version=v1.4.9
rook-ceph-crashcollector-gold-4 req/upd/avl: 1/1/1 rook-version=v1.4.9
rook-ceph-crashcollector-gold-5 req/upd/avl: 1/1/1 rook-version=v1.4.9
rook-ceph-crashcollector-gold-6 req/upd/avl: 1/1/1 rook-version=v1.4.9
rook-ceph-mds-myfs-a req/upd/avl: 1// rook-version=v1.4.9
rook-ceph-mds-myfs-b req/upd/avl: 1/1/1 rook-version=v1.3.11
rook-ceph-mgr-a req/upd/avl: 1// rook-version=v1.4.9
rook-ceph-mon-h req/upd/avl: 1/1/1 rook-version=v1.4.9
rook-ceph-mon-i req/upd/avl: 1/1/1 rook-version=v1.4.9
rook-ceph-mon-k req/upd/avl: 1/1/1 rook-version=v1.4.9
rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.3.11
rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.3.11
rook-ceph-osd-3 req/upd/avl: 1/1/1 rook-version=v1.3.11
rook-ceph-osd-5 req/upd/avl: 1/1/1 rook-version=v1.3.11
rook-ceph-osd-6 req/upd/avl: 1/1/1 rook-version=v1.3.11
rook-ceph-osd-9 req/upd/avl: 1/1/1 rook-version=v1.3.11
Finally, after a small while everything was updated.
The only error is this one
mons are allowing insecure global_id reclaim
Which I tried fixing with this command:
ceph config set mon auth_allow_insecure_global_id_reclaim false
which resulted in the toolbox no longer being able to talk to the cluster.
[root@gold-1 /]# ceph health
[errno 5] RADOS I/O error (error connecting to the cluster)
so the cluster is running, which is good, but I can't talk to it anymore.
Even upgrading the toolbox doesn't work. I deleted the deployment and applied the latest toolbox.yaml and this error happens.
$ceph status
[errno 13] RADOS permission denied (error connecting to the cluster)
The ceph version in the toolbox is 15.2.8
ceph --version
ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable)
Looking here, it says it is fixed in ceph version 15.2.11:
https://docs.ceph.com/en/latest/security/CVE-2021-20288/
Fixed versions
- Pacific v16.2.1 (and later)
- Octopus v15.2.11 (and later)
- Nautilus v14.2.20 (and later)
Looking at the issue here:
https://github.com/rook/rook/issues/7746
It may be fixed in rook version 1.6.7
So the next step is to upgrade from 1.4.9 to 1.5+ then to 1.6+