RAID upgrade
|
For my home server, I have filesystems on LVMs on RAIDs on partitions. Upgrading a disk is then, arguably, rather non-trivial. However the intended benefit of this setup is, hopefully, that I can roll over the oldest disk in the array every year or so, and so the whole lot grows incrementally as needed. "live" expansion on all levels means I should never have to create a new filesystem and copy data over, as per historic efforts.
These are my notes-to-self as of the time leading up to my first hardware change. Prior to this all disks are identical in size. There will be no significant size benefit until the fourth disk (smallest) is upgraded. After that, every upgrade (of the smallest disk - presumably replacing it to become the new 'largest') will yield a size increase - based upon the limits set by the 'new' smallest (oldest) disk.
This is not optimal use of available disk space for any given drive over it's life. However, it is hopefully rather nice in terms of budgetry upgrade requirements! :)
Pros
- Rolling upgrades are win.
- response: rolling upgrades are the planning headache
Cons
- Each drive increase size is predicated on the drive purchased 3 drives back! So 'instant embiggening' is difficult.
- response: you don't use this system unless you plan to think ahead anyway. Also, the old drive being taken out could be put into external USB caddy for additional space
- response two: the 'unused' space on each drive (ie, the size difference between it and the smallest/oldest) could be partitioned into non-raid usable emergency spaces too!
My system
Software
# cat /etc/debian_version lenny/sid # uname -a Linux falcon 2.6.18-6-amd64 #1 SMP Mon Jun 16 22:30:01 UTC 2008 x86_64 GNU/Linux # mdadm --version mdadm - v2.5.6 - 9 November 2006 # lvm version LVM version: 2.02.07 (2006-07-17) Library version: 1.02.08 (2006-07-17) Driver version: 4.7.0
Setup
# cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md3 : active raid5 sda4[0] sdd4[3] sdc4[2] sdb4[1] 2637302400 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] md2 : active raid1 sdc3[0] sdd3[1] 87891520 blocks [2/2] [UU] md0 : active raid1 sdb2[0] sdc2[2](S) sdd2[3](S) sda2[1] 8787456 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 87891520 blocks [2/2] [UU] unused devices: <none> # pvs PV VG Fmt Attr PSize PFree /dev/md1 vg_home lvm2 a- 83.82G 0 /dev/md2 vg_home lvm2 a- 83.82G 0 /dev/md3 vg_shared lvm2 a- 2.46T 0 # vgs VG #PV #LV #SN Attr VSize VFree vg_home 2 1 0 wz--n- 167.63G 0 vg_shared 1 1 0 wz--n- 2.46T 0 # lvs LV VG Attr LSize Origin Snap% Move Log Copy% lv_home vg_home -wi-ao 167.63G lv_shared vg_shared -wi-ao 2.46T
The plan
Upgrading each disc in turn to a larger physical disc... (1.5TB or 2TB, etc), all levels can be grown and expanded...
I'll start with /dev/sdd and work backwards (sda and sdb are the enterprise and the quiet drive, respectively), making each partition larger as needed. Due to a change from MBR to EFI type partition tables, I'm no longer limited to four partitions - and so instead of having unusable space in an oversize partition on /shared, I'll create an extra partition out of that space to use, and repartition into this space when all drives are available to do so. I'll also be increasing the size of md2 and so vg_home will increase in size.
So - like this
- within mdadm
- Remove sdd2 from within the md0 raid1 - this array has 2 spares)
- Remove sdd3 from within the md2 raid1 - this has NO SPARE)
- Remove sdd4 from within the md3 raid5)
- Hardware setup
- Replace drive physically
- Partition
- in mdadm again
- join sdd1, sdd2, sdd3 and sdd4 into their respective MD devices
- within LVM
- enlarge the respective VG and LVs in turn
- Finally, enlarge the filesystem.
- any spare partitions do "stuff" (possibly mirror and
Implementation
Removing partitions from the RAIDs
(this is for /dev/sdd - I repeated for sdc also)
mdadm --fail /dev/md0 /dev/sdd2 mdadm --fail /dev/md2 /dev/sdd3 mdadm --fail /dev/md3 /dev/sdd4 mdadm --remove /dev/md0 /dev/sdd2 mdadm --remove /dev/md2 /dev/sdd3 mdadm --remove /dev/md3 /dev/sdd4
Note: You can watch a spare in the mirrors take over with
# cat /proc/mdstat
This process starts immediately after the --fail. No, you don't get to choose which spare will be used. There is an internal order (mdstats shows it inside [] brackets).
Change the drive physically
Timeframe: ~20minutes
- shut down the machine
- remove drive physically and replace with new
- powerup
- partition new drive as required
In my system, my new (2TB) drives have GUID partition tables with 5 partitions each. The original 1TB drives have 4 partitions each on MBR partition tables. My linux system (lenny/sid) handles this fine. use gdisk for GUID partition table editing.
Adding partitions to the RAIDs
mdadm --add /dev/md0 /dev/sdd2 added as spare. ~instant mdadm --add /dev/md2 /dev/sdd3 syncing the mirror. ~20min mdadm --add /dev/md3 /dev/sdd4 syncing the raid5... ~3h40min
finishing up
(note that I swapped /dev/sdc also - using the identical process as above
- mkswap for sdc1 and sdd1, and swapon for each
- grow md2 for /home (this is due to my partitions for md2 now being doubled in size)
mdadm --grow /dev/md2 --size=max
- executed when sdc4 was still syncing in the raid5. This sync then was DELAYED till that was done...
- approx 20 min.
pvresize /dev/md2
- pvs now shows PV md2 for vg_home as having 96gig free :)
reboot and finish
or is it?
problems?
Here are my notes...
...md0 fails due to sdc2 and sdd2 superblocks looking far too similar! :( ...so I tried this: mdadm --zero /dev/sdd2 ...and it works. but lacking that one partition now! gar! ...ok, I think it's cos whilst sdc2 and sdd2 are technically set as mirrors, they have nothing syncd to them, no filesystem... (tune2fs was claiming it had nothing on there, even though it was spare. and still same even after it was sync'd to the live mirror by failing the other drives! grrrrr ...so, using mdadm --add to add the drive to the md0 ...and then using mdadm --grow -n 3 /dev/md0 - to make the drive an active mirror - forcing sync. ...same then for sdd2 - so it's a 4drive LIVE mirror - no spares. ...and reboot... = no did: mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sda2 /dev/sdb2 ...to recreate the superblock. a mirror with no spare yet ...and sda errors. GAR!!! ...turn off and on... ...ok ok ok I assembled from sdd2, since it was the only partition with the right uuid left, then added sd{a,b,c}2 in turn and that gave them all the right uuid then grew to 2 devices (ie, shrunk it from 4 active) - mdadm --grow and then added the 2 devices back in - making them now spare, not active and finally mdadm --assemble --scan works let's see if it boots now! WOOT!
yet more notes?
- what about sdc5 and sdd5
...a mirror? and then added to vg_shared? or left as the overload vg? (benefit of this is we restrict the space used which will make true expansion easier later on :) benefit to adding to vg_shared is havign space to snapshot straight awya problem is removing that pv later on when we wish to expand the original raid... ...otoh, seperate raid = no room for snapshots, but no pvreduce headaches... * make it a mirror. the space will be reclaimed layer (when all four drives can expand the /shared across it), so for now the simplest thing is to make it a new mirror and not within lvm. (does this undermind the point of an LVM on /shared ? maybe. In the long run I want to snapshot /shared anyway. we'll see :) # mdadm --create --level=mirror -n 2 /dev/md4 /dev/sdc5 /dev/sdd5 mdadm: array /dev/md4 started. # mkfs.ext3 /dev/md4 # tune2fs -r 1 /dev/md4 ...and mounted :)
Still TODO
- Extend lv_home into the new spare space, and embiggen the filesystem...
lvextend -L +50G /dev/vg_home/lv_home resize2fs /dev/vg_home/lv_home
- script snapshots of lv_home for backup improvement :)
Other notes
- expanding a lv snapshot is as simple as
lvresize -L+5G /dev/vg_home/homesnapshot
Nothing else needs be done.
External reference
- This looks like it explains everything I need: https://raid.wiki.kernel.org/index.php/Growing
- LVM snapshot stuff for future backups - http://www.cyberciti.biz/tips/consistent-backup-linux-logical-volume-manager-snapshots.html