Using NFS as a Storage Class for Kubernetes
These are the quick steps I had to go through to setup the nfs-subdir-external-provisioner
for our bare metal kubernetes cluster.
Prepare the nodes
Install nfs client on all nodes
k8s$ sudo apt-get install nfs-common
Verify the clients can see the server
k8s$ showmount -e XXX.XXX.XXX.XXX
Export list for XXX.XXX.XXX.XXX:
/shared/folder XXX.XXX.XXX.XXX/24
Prepare the cluster
Create a namespace to install the helm chart into
k8s$ kubectl create namespace nfs-provisioner
k8s$ kubens nfs-provisioner
Install the helm chart
This installs the helm repository
k8s$ helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
"nfs-subdir-external-provisioner" has been added to your repositories
Then update the helm repository
k8s$ helm repo update nfs-subdir-external-provisioner
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nfs-subdir-external-provisioner" chart repository
Update Complete. ⎈Happy Helming!⎈
Then check the versions to verify everything is good
k8s$ helm search repo -l nfs-subdir-external-provisioner
NAME CHART VERSION APP VERSION DESCRIPTION
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.17 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.16 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.15 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.14 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.13 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.12 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.11 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.10 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.9 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.8 4.0.2 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.6 4.0.1 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.5 4.0.0 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.4 4.0.0 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.3 4.0.0 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.2 4.0.0 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.1 4.0.0 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 4.0.0 4.0.0 nfs-subdir-external-provisioner is an automatic...
nfs-subdir-external-provisioner/nfs-subdir-exte... 3.0.0 3.1.0 nfs-subdir-external-provisioner is an automatic...
Then install the helm chart
k8s$ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=XXX.XXX.XXX.XXX \
--set nfs.path=/shared/folder
NAME: nfs-subdir-external-provisioner
LAST DEPLOYED: Thu Dec 29 20:05:21 2022
NAMESPACE: gitlab-runner
STATUS: deployed
REVISION: 1
TEST SUITE: None
Testing the storage
Create a pod using the storage class
Switch to another namespace to test
k8s$ kubens default
Create the pods
k8s$ kubectl create -f https://raw.githubusercontent.com/kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/test-claim.yaml -f https://raw.githubusercontent.com/kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/test-pod.yaml
Check the pods
k8s$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-pod 1/1 Running 0 4s
Check the Persistent Volume Claims
k8s$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-claim Bound pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1 1Mi RWX nfs-client 17s
Check the persistent volume
k8s$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1 1Mi RWX Delete Bound beta4-amigocloud-com/test-claim nfs-client 11s
Check the nfs fileserver
Check for SUCCESS file being saved by test pod
nfs$ ssh fileshare
nfs$ ls /shared/folder
defaault-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1
nfs$ ls /shared/folder/defaault-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1
SUCCESS
Cleanup the clustter
k8s$ kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/test-claim.yaml -f https://raw.githubusercontent.com/
kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/test-pod.yaml
persistentvolumeclaim "test-claim" deleted
pod "test-pod" deleted
Check the cluster after cleanup
k8s$ kubectl get pods
No resources found in default namespace.
k8s$ kubectl get pvc
No resources found in default namespace.
k8s$ kubectl get pv
No resources found
Check the NFS server after cleanup
nfs$ ls /shared/folder/
archived-default-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1
nfs$ ls /shared/folder/archived-default-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1
nfs$ ls /nfs/k8s/archived-default-test-claim-pvc-44207790-de7c-42a3-943b-ff4f8b4da9d1
SUCCESS
The files are maintained after the Persistent Volume is destroyed and need to be cleaned manually.
Why we went this way
We have a test cluster and it is deployed on bare metal. There is a need for a way for our containers to persist data in this bare metal cluster.
Initially, the cluster was a production cluster, but the production workflows have been moved to the cloud for easier and more reliable maintenance as it is very rare that someone can maintain the bare metal cluster.
At first we used the rook operator with a Ceph backend and it was great. However, as hardware failed, we ended up where our storage nodes and processing nodes were on the same hardware. That means whenever we'd have a node that would go down, it would take some of our storage with it. Also, the nodes were built for processing and not storage so the disks were slow and the hardware had a hard time using them with Ceph.
As we lost servers due to hardware failures, the cluster eventually went down and had to be rebuilt with existing hardware.
Upon salvaging components from existing servers we ended up not having enough machines for an independent storage cluster. Rather than spend more money as our IT budget was several times more costly using cloud provided services, we just dusted off an old RAID server that was sharing folders out with NFS. This hadn't been used in several years, but it was still operational.
If I had more money available I'd just move all of our test software to the cloud and ditch the bare metal as it is much easier to maintain as we don't have the hardware on site. If we did have the hardware onsite and someone could maintain it every day, I would save money by using the bare metal servers rather than the cloud providers. To do that I'd need a cluster of dedicated storage machines and run Ceph on top of them.
Concerns
This storage class doesn't really support expanding volumes or volume limits so there is no way to test that before production.
The storage server is a single point of failure. If it goes down, the entire cluster dies. There aren't many of those in our colo system, but this is one of them.
It takes manual intervention to delete the storage from the nfs server, though that can be a good or bad thing.