Using NFS as a Storage Class for Kubernetes

29 Dec, 2022

These are the quick steps I had to go through to setup the nfs-subdir-external-provisioner for our bare metal kubernetes cluster.

Prepare the nodes

Install nfs client on all nodes

k8s$ sudo apt-get install nfs-common

Verify the clients can see the server

k8s$ showmount -e XXX.XXX.XXX.XXX
Export list for XXX.XXX.XXX.XXX:
/shared/folder XXX.XXX.XXX.XXX/24

Prepare the cluster

Create a namespace to install the helm chart into

k8s$ kubectl create namespace nfs-provisioner
k8s$ kubens nfs-provisioner

Install the helm chart

This installs the helm repository

k8s$ helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
"nfs-subdir-external-provisioner" has been added to your repositories

Then update the helm repository

k8s$ helm repo update nfs-subdir-external-provisioner
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nfs-subdir-external-provisioner" chart repository
Update Complete. ⎈Happy Helming!⎈

Then check the versions to verify everything is good

k8s$ helm search repo -l nfs-subdir-external-provisioner
NAME                                                    CHART VERSION   APP VERSION     DESCRIPTION
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.17          4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.16          4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.15          4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.14          4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.13          4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.12          4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.11          4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.10          4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.9           4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.8           4.0.2           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.6           4.0.1           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.5           4.0.0           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.4           4.0.0           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.3           4.0.0           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.2           4.0.0           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.1           4.0.0           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      4.0.0           4.0.0           nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte...      3.0.0           3.1.0           nfs-subdir-external-provisioner is an automatic...

Then install the helm chart

k8s$ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
    --set nfs.server=XXX.XXX.XXX.XXX \
    --set nfs.path=/shared/folder
NAME: nfs-subdir-external-provisioner
LAST DEPLOYED: Thu Dec 29 20:05:21 2022
NAMESPACE: gitlab-runner
STATUS: deployed
REVISION: 1
TEST SUITE: None

Testing the storage

Create a pod using the storage class

Switch to another namespace to test

k8s$ kubens default

Create the pods

k8s$ kubectl create -f https://raw.githubusercontent.com/kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/test-claim.yaml -f https://raw.githubusercontent.com/kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/test-pod.yaml

Check the pods

k8s$ kubectl get pods
NAME       READY   STATUS    RESTARTS   AGE
test-pod   1/1     Running   0          4s

Check the Persistent Volume Claims

k8s$ kubectl get pvc
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
test-claim   Bound    pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1   1Mi        RWX            nfs-client     17s

Check the persistent volume

k8s$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                             STORAGECLASS   REASON   AGE
pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1   1Mi        RWX            Delete           Bound    beta4-amigocloud-com/test-claim   nfs-client              11s

Check the nfs fileserver

Check for SUCCESS file being saved by test pod

nfs$ ssh fileshare
nfs$ ls /shared/folder
defaault-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1
nfs$ ls /shared/folder/defaault-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1
SUCCESS

Cleanup the clustter

k8s$ kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/test-claim.yaml -f https://raw.githubusercontent.com/
kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/test-pod.yaml
persistentvolumeclaim "test-claim" deleted
pod "test-pod" deleted

Check the cluster after cleanup

k8s$ kubectl get pods
No resources found in default namespace.
k8s$ kubectl get pvc
No resources found in default namespace.
k8s$ kubectl get pv
No resources found

Check the NFS server after cleanup

nfs$ ls /shared/folder/
archived-default-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1  
nfs$ ls /shared/folder/archived-default-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1  
nfs$ ls /nfs/k8s/archived-default-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1  
SUCCESS

The files are maintained after the Persistent Volume is destroyed and need to be cleaned manually.

Why we went this way

We have a test cluster and it is deployed on bare metal. There is a need for a way for our containers to persist data in this bare metal cluster.

Initially, the cluster was a production cluster, but the production workflows have been moved to the cloud for easier and more reliable maintenance as it is very rare that someone can maintain the bare metal cluster.

At first we used the rook operator with a Ceph backend and it was great. However, as hardware failed, we ended up where our storage nodes and processing nodes were on the same hardware. That means whenever we'd have a node that would go down, it would take some of our storage with it. Also, the nodes were built for processing and not storage so the disks were slow and the hardware had a hard time using them with Ceph.

As we lost servers due to hardware failures, the cluster eventually went down and had to be rebuilt with existing hardware.

Upon salvaging components from existing servers we ended up not having enough machines for an independent storage cluster. Rather than spend more money as our IT budget was several times more costly using cloud provided services, we just dusted off an old RAID server that was sharing folders out with NFS. This hadn't been used in several years, but it was still operational.

If I had more money available I'd just move all of our test software to the cloud and ditch the bare metal as it is much easier to maintain as we don't have the hardware on site. If we did have the hardware onsite and someone could maintain it every day, I would save money by using the bare metal servers rather than the cloud providers. To do that I'd need a cluster of dedicated storage machines and run Ceph on top of them.

Concerns

This storage class doesn't really support expanding volumes or volume limits so there is no way to test that before production.

The storage server is a single point of failure. If it goes down, the entire cluster dies. There aren't many of those in our colo system, but this is one of them.

It takes manual intervention to delete the storage from the nfs server, though that can be a good or bad thing.