Checking Raid

21 Dec, 2022

There is an old data sever that has lots of hard drives in it and I needed to check on the server to see how the raid was doing and what it was configured for.

Check the raid daemon to see if it is running

$ sudo /etc/init.d/mdadm status
[ ok ] mdadm is running.

That's good

Check for raid devices

Check where the raid devices are mounted

$ cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Sat, 18 Jan 2014 16:16:30 -0800
# by mkconf 3.2.5-5
ARRAY /dev/md0 metadata=1.2 name=RaidZero:0 UUID=9eabb82e:90fa6184:64166a99:a695ec78
ARRAY /dev/md1 metadata=1.2 name=RaidOne:1 UUID=1d572ab6:81ce3b79:9f4fa4d5:ff1c1452

There are two devices: /dev/md0 and /dev/md1

Query the device information

$ sudo mdadm --query /dev/md0
/dev/md0: 48436.68GiB raid6 15 devices, 0 spares. Use mdadm --detail for more detail.
$ sudo mdadm --query /dev/md1
/dev/md1: 48436.68GiB raid6 15 devices, 0 spares. Use mdadm --detail for more detail.

md0 has 15 drives
md1 has 15 drives

No spares, so that's kinda bad. However with 15, we can lose a few before something goes wrong.

$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Jan 19 00:46:49 2014
     Raid Level : raid6
     Array Size : 50789535888 (48436.68 GiB 52008.48 GB)
  Used Dev Size : 3906887376 (3725.90 GiB 4000.65 GB)
   Raid Devices : 15
  Total Devices : 15
    Persistence : Superblock is persistent

    Update Time : Tue Dec 20 23:36:06 2022
          State : clean
 Active Devices : 15
Working Devices : 15
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 16K

           Name : RaidZero:0
           UUID : 9eabb82e:90fa6184:64166a99:a695ec78
         Events : 615

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd
       4       8       64        4      active sync   /dev/sde
       5       8       80        5      active sync   /dev/sdf
       6       8       96        6      active sync   /dev/sdg
       7       8      112        7      active sync   /dev/sdh
       8       8      128        8      active sync   /dev/sdi
       9       8      144        9      active sync   /dev/sdj
      10       8      160       10      active sync   /dev/sdk
      11       8      176       11      active sync   /dev/sdl
      12       8      192       12      active sync   /dev/sdm
      13       8      208       13      active sync   /dev/sdn
      14       8      224       14      active sync   /dev/sdo

Everything looks good!

$ sudo mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Sat Dec 13 21:47:25 2014
     Raid Level : raid6
     Array Size : 50789535888 (48436.68 GiB 52008.48 GB)
  Used Dev Size : 3906887376 (3725.90 GiB 4000.65 GB)
   Raid Devices : 15
  Total Devices : 15
    Persistence : Superblock is persistent

    Update Time : Tue Dec 20 23:36:12 2022
          State : clean
 Active Devices : 15
Working Devices : 15
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 16K

           Name : RaidOne:1 
           UUID : 1d572ab6:81ce3b79:9f4fa4d5:ff1c1452
         Events : 766

    Number   Major   Minor   RaidDevice State
       0       8      240        0      active sync   /dev/sdp
       1      65        0        1      active sync   /dev/sdq
       2      65       16        2      active sync   /dev/sdr
       3      65       32        3      active sync   /dev/sds
       4      65       48        4      active sync   /dev/sdt
       5      65       64        5      active sync   /dev/sdu
       6      65       80        6      active sync   /dev/sdv
       7      65       96        7      active sync   /dev/sdw
       8      65      112        8      active sync   /dev/sdx
       9      65      128        9      active sync   /dev/sdy
      10      65      144       10      active sync   /dev/sdz
      11      65      160       11      active sync   /dev/sdaa
      12      65      176       12      active sync   /dev/sdab
      13      65      192       13      active sync   /dev/sdac
      14      65      208       14      active sync   /dev/sdad

Everything looks good!

Check raid health

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : active raid6 sdp[0] sdad[14] sdac[13] sdab[12] sdaa[11] sdz[10] sdy[9] sdx[8] sdw[7] sdv[6] sdu[5] sdt[4] sds[3] sdr[2] sdq[1]
      50789535888 blocks super 1.2 level 6, 16k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

md0 : active raid6 sda[0] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
      50789535888 blocks super 1.2 level 6, 16k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

unused devices: <none>

md0 has 15 drives, all of them are up
md1 has 15 drives, all of them are up

That's good

Check to see what block devices are on those devices

$ lsblk /dev/md0
NAME                 MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
md0                    9:0    0  47.3T  0 raid6
|-raidlvm-lv0 (dm-0) 254:0    0  47.3T  0 lvm   /mnt/raid
`-raidlvm-lv1 (dm-1) 254:1    0  47.3T  0 lvm   /mnt/raid1

$ lsblk /dev/md1
NAME                 MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
md1                    9:1    0  47.3T  0 raid6
`-raidlvm-lv1 (dm-1) 254:1    0  47.3T  0 lvm   /mnt/raid1

So each md0 has two logical volumes and raid2 has one logical volume, but the device (dm-1) is on both devices? That's really weird. There is this node in the fstab file:

$ cat /etc/fstab
...
/dev/raidlvm/lv0                /mnt/raid               ext4    defaults        1 2
/mnt/raid/imagery               /nfs/folder1            none    bind    0       0

# moved to raid1 /mnt/raid/folder1               /nfs/folder1            none    bind    0       0

# 2nd raid
/dev/raidlvm/lv1                /mnt/folder1              ext4    defaults        1 2
/mnt/raid1/backups               /nfs/folder2           none    bind    0       0
/mnt/raid1/imagery               /nfs/folder3          none    bind    0       0

I don't know who moved this to raid1, or why. It doesn't seem possible that there was a larger array of disks and they took some out, also it doesn't make sense that the md0 device has a logical volume that is on the md1 device.

I thought it was that there was originally one raid, then a second was created. If they moved the data over, then they just had to move the mount point, and not keep it in lv0. But perhaps it's left over an needs a reboot.