RAID upgrade
|
For my home server, I have filesystems on LVMs on RAIDs on partitions. Upgrading a disk is then, arguably, rather non-trivial. However the intended benefit of this setup is, hopefully, that I can roll over the oldest disk in the array every year or so, and so the whole lot grows incrementally as needed. "live" expansion on all levels means I should never have to create a new filesystem and copy data over, as per historic efforts.
These are my notes-to-self as of the time leading up to my first hardware change. Prior to this all disks are identical in size. There will be no significant size benefit until the fourth disk (smallest) is upgraded. After that, every upgrade (of the smallest disk - presumably replacing it to become the new 'largest') will yield a size increase - based upon the limits set by the 'new' smallest (oldest) disk.
This is not optimal use of available disk space for any given drive over it's life. However, it is hopefully rather nice in terms of budgetry upgrade requirements! :)
Pros
- Rolling upgrades are win.
- response: rolling upgrades are the planning headache
Cons
- Each drive increase size is predicated on the drive purchased 3 drives back! So 'instant embiggening' is difficult.
- response: you don't use this system unless you plan to think ahead anyway. Also, the old drive being taken out could be put into external USB caddy for additional space
- response two: the 'unused' space on each drive (ie, the size difference between it and the smallest/oldest) could be partitioned into non-raid usable emergency spaces too!
My system
Software
# cat /etc/debian_version lenny/sid # uname -a Linux falcon 2.6.18-6-amd64 #1 SMP Mon Jun 16 22:30:01 UTC 2008 x86_64 GNU/Linux # mdadm --version mdadm - v2.5.6 - 9 November 2006 # lvm version LVM version: 2.02.07 (2006-07-17) Library version: 1.02.08 (2006-07-17) Driver version: 4.7.0
Setup
# cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md3 : active raid5 sda4[0] sdd4[3] sdc4[2] sdb4[1] 2637302400 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] md2 : active raid1 sdc3[0] sdd3[1] 87891520 blocks [2/2] [UU] md0 : active raid1 sdb2[0] sdc2[2](S) sdd2[3](S) sda2[1] 8787456 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 87891520 blocks [2/2] [UU] unused devices: <none> # pvs PV VG Fmt Attr PSize PFree /dev/md1 vg_home lvm2 a- 83.82G 0 /dev/md2 vg_home lvm2 a- 83.82G 0 /dev/md3 vg_shared lvm2 a- 2.46T 0 # vgs VG #PV #LV #SN Attr VSize VFree vg_home 2 1 0 wz--n- 167.63G 0 vg_shared 1 1 0 wz--n- 2.46T 0 # lvs LV VG Attr LSize Origin Snap% Move Log Copy% lv_home vg_home -wi-ao 167.63G lv_shared vg_shared -wi-ao 2.46T
The plan
Upgrading each disc in turn to a larger physical disc... (1.5TB or 2TB, etc), all levels can be grown and expanded...
I'll start with /dev/sdd and work backwards (sda and sdb are the enterprise and the quiet drive, respectively), making each partition larger as needed. Due to a change from MBR to EFI type partition tables, I'm no longer limited to four partitions - and so instead of having unusable space in an oversize partition on /shared, I'll create an extra partition out of that space to use, and repartition into this space when all drives are available to do so. I'll also be increasing the size of md2 and so vg_home will increase in size.
So - like this
- within mdadm
- Remove sdd2 from within the md0 raid1 - this array has 2 spares)
- Remove sdd3 from within the md2 raid1 - this has NO SPARE)
- Remove sdd4 from within the md3 raid5)
- Hardware setup
- Replace drive physically
- Partition
- in mdadm again
- join sdd1, sdd2, sdd3 and sdd4 into their respective MD devices
- within LVM
- enlarge the respective VG and LVs in turn
- Finally, enlarge the filesystem.
- any spare partitions do "stuff" (possibly mirror and
Implementation
Removing partitions from the RAIDs
(this is for /dev/sdd - I repeated for sdc also)
mdadm --fail /dev/md0 /dev/sdd2 mdadm --fail /dev/md2 /dev/sdd3 mdadm --fail /dev/md3 /dev/sdd4 mdadm --remove /dev/md0 /dev/sdd2 mdadm --remove /dev/md2 /dev/sdd3 mdadm --remove /dev/md3 /dev/sdd4
Note: You can watch a spare in the mirrors take over with
# cat /proc/mdstat
This process starts immediately after the --fail. No, you don't get to choose which spare will be used. There is an internal order (mdstats shows it inside [] brackets).
Change the drive physically
Timeframe: ~20minutes
- shut down the machine
- remove drive physically and replace with new
- powerup
- partition new drive as required
In my system, my new (2TB) drives have GUID partition tables with 5 partitions each. The original 1TB drives have 4 partitions each on MBR partition tables. My linux system (lenny/sid) handles this fine. I used gdisk for GUID partition table editing.
Adding partitions to the RAIDs
mdadm --add /dev/md0 /dev/sdd2 added as spare. ~instant mdadm --add /dev/md2 /dev/sdd3 syncing the mirror. ~20min mdadm --add /dev/md3 /dev/sdd4 syncing the raid5... ~3h40min
finishing up
(note that I swapped /dev/sdc also - using the identical process as above
- mkswap for sdc1 and sdd1, and swapon for each
- grow md2 for /home (this is due to my partitions for md2 now being doubled in size)
mdadm --grow /dev/md2 --size=max
- executed when sdc4 was still syncing in the raid5. This sync then was DELAYED till that was done...
- approx 20 min.
pvresize /dev/md2
- pvs now shows PV md2 for vg_home as having 96gig free :)
reboot and finish
or is it?
problems?
Here are my notes...
...md0 fails due to sdc2 and sdd2 superblocks looking far too similar! :( ...so I tried this: mdadm --zero /dev/sdd2 ...and it works. but lacking that one partition now! gar! ...ok, I think it's cos whilst sdc2 and sdd2 are technically set as mirrors, they have nothing syncd to them, no filesystem... (tune2fs was claiming it had nothing on there, even though it was spare. and still same even after it was sync'd to the live mirror by failing the other drives! grrrrr ...so, using mdadm --add to add the drive to the md0 ...and then using mdadm --grow -n 3 /dev/md0 - to make the drive an active mirror - forcing sync. ...same then for sdd2 - so it's a 4drive LIVE mirror - no spares. ...and reboot... = no did: mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sda2 /dev/sdb2 ...to recreate the superblock. a mirror with no spare yet ...and sda errors. GAR!!! ...turn off and on... ...ok ok ok I assembled from sdd2, since it was the only partition with the right uuid left, then added sd{a,b,c}2 in turn and that gave them all the right uuid then grew to 2 devices (ie, shrunk it from 4 active) - mdadm --grow and then added the 2 devices back in - making them now spare, not active and finally mdadm --assemble --scan works let's see if it boots now! WOOT!
yet more notes?
- what about sdc5 and sdd5
They are a mirror outside the LVM structure. Mounted as 'overflow' for shared data.
Some data shuffling will be required when all drives are ready to expand the /shared/ raid5 however!
# mdadm --create --level=mirror -n 2 /dev/md5 /dev/sdc5 /dev/sdd5 mdadm: array /dev/md5 started. # mkfs.ext3 /dev/md5 # tune2fs -r 1 /dev/md5 ...and mounted :)
Still TODO
- Extend lv_home into the new spare space, and embiggen the filesystem...
lvextend -L +50G /dev/vg_home/lv_home resize2fs /dev/vg_home/lv_home
- script snapshots of lv_home for backup improvement? :)
Other notes
- expanding a lv snapshot is as simple as
lvresize -L+5G /dev/vg_home/homesnapshot
Nothing else needs be done.
2011 January upgrade
In which I swapped a 1TB drive to 3TB.
The partitions to match existing were created, with the only note being to exactly round the swap to 1GB, system to 10GB, and home to 190GB (though /home/ cannot embiggen untill the next drive is upgraded so the mirror can grow. Also, was originally 200GB, but an error in partition sizes meant it had to be reduced to create space (later partitions were locked in with data).
a Raid1-to-Raid5 conversion was attempted on the 1TB raid1 on the two 2TB drives, but power was lost part way through reshaping to three devices and the raid was lost. Fortunately it's data had been backed up to the new spare TB partition on the 3TB drive, so that is a RAID5 being created from scratch, at which point the data will be copied back.
= Future plans
I should make /home sit on a RAID6, rather than VG over a pair of mirrors. That would protect me from ANY two drive failures, instead of only ⅔protection after the first failure. RAID6 should also make adding drives simpler (and can move to an odd number of drives smoothly). The downside is that size is 2x smallest partition. Current means size is smallest + 3rd smallest.
External reference
- This looks like it explains everything I need: https://raid.wiki.kernel.org/index.php/Growing
- http://www.arkf.net/blog/?p=47 - Converting a RAID1 to a RAID5 !!
- LVM snapshot stuff for future backups - http://www.cyberciti.biz/tips/consistent-backup-linux-logical-volume-manager-snapshots.html