RAID upgrade

From ThorxWiki
Revision as of 21:02, 17 June 2010 by Nemo (Talk | contribs)

Jump to: navigation, search

Contents

For my home server, I have filesystems on LVMs on RAIDs on partitions. Upgrading a disk is then, arguably, rather non-trivial. However the intended benefit of this setup is, hopefully, that I can roll over the oldest disk in the array every year or so, and so the whole lot grows incrementally as needed. "live" expansion on all levels means I should never have to create a new filesystem and copy data over, as per historic efforts.

These are my notes-to-self as of the time leading up to my first hardware change. Prior to this all disks are identical in size. There will be no significant size benefit until the fourth disk (smallest) is upgraded. After that, every upgrade (of the smallest disk - presumably replacing it to become the new 'largest') will yield a size increase - based upon the limits set by the 'new' smallest (oldest) disk.

This is not optimal use of available disk space for any given drive over it's life. However, it is hopefully rather nice in terms of budgetry upgrade requirements! :)

Pros

  • Rolling upgrades are win.
    • response: rolling upgrades are the planning headache

Cons

  • Each drive increase size is predicated on the drive purchased 3 drives back! So 'instant embiggening' is difficult.
    • response: you don't use this system unless you plan to think ahead anyway. Also, the old drive being taken out could be put into external USB caddy for additional space
    • response two: the 'unused' space on each drive (ie, the size difference between it and the smallest/oldest) could be partitioned into non-raid usable emergency spaces too!

My system

Software

# cat /etc/debian_version 
lenny/sid
# uname -a
Linux falcon 2.6.18-6-amd64 #1 SMP Mon Jun 16 22:30:01 UTC 2008 x86_64 GNU/Linux
# mdadm --version
mdadm - v2.5.6 - 9 November 2006
# lvm version
  LVM version:     2.02.07 (2006-07-17)
  Library version: 1.02.08 (2006-07-17)
  Driver version:  4.7.0

Setup


# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md3 : active raid5 sda4[0] sdd4[3] sdc4[2] sdb4[1]
      2637302400 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md2 : active raid1 sdc3[0] sdd3[1]
      87891520 blocks [2/2] [UU]

md0 : active raid1 sdb2[0] sdc2[2](S) sdd2[3](S) sda2[1]
      8787456 blocks [2/2] [UU]

md1 : active raid1 sda3[0] sdb3[1]
      87891520 blocks [2/2] [UU]

unused devices: <none>

# pvs
  PV         VG        Fmt  Attr PSize  PFree 
  /dev/md1   vg_home   lvm2 a-   83.82G     0 
  /dev/md2   vg_home   lvm2 a-   83.82G     0
  /dev/md3   vg_shared lvm2 a-    2.46T     0 

# vgs
  VG        #PV #LV #SN Attr   VSize   VFree 
  vg_home     2   1   0 wz--n- 167.63G     0
  vg_shared   1   1   0 wz--n-   2.46T     0

# lvs
  LV        VG        Attr   LSize   Origin Snap%  Move Log Copy% 
  lv_home   vg_home   -wi-ao 167.63G
  lv_shared vg_shared -wi-ao   2.46T

File:NemoLVM.png

The plan

Upgrading each disc in turn to a larger physical disc... (1.5TB or 2TB, etc), all levels can be grown and expanded...

I'll start with /dev/sdd and work backwards (sda and sdb are the enterprise and the quiet drive, respectively), making each partition larger as needed. Due to a change from MBR to EFI type partition tables, I'm no longer limited to four partitions - and so instead of having unusable space in an oversize partition on /shared, I'll create an extra partition out of that space to use, and repartition into this space when all drives are available to do so. I'll also be increasing the size of md2 and so vg_home will increase in size.

So - like this

  1. within mdadm
    1. Remove sdd2 from within the md0 raid1 - this array has 2 spares)
    2. Remove sdd3 from within the md2 raid1 - this has NO SPARE)
    3. Remove sdd4 from within the md3 raid5)
  2. Hardware setup
    1. Replace drive physically
    2. Partition
  3. in mdadm again
    1. join sdd1, sdd2, sdd3 and sdd4 into their respective MD devices
  4. within LVM
    1. enlarge the respective VG and LVs in turn
  5. Finally, enlarge the filesystem.
  6. any spare partitions do "stuff" (possibly mirror and

Implementation

Removing partitions from the RAIDs

(this is for /dev/sdd - I repeated for sdc also)

mdadm --fail /dev/md0 /dev/sdd2
mdadm --fail /dev/md2 /dev/sdd3
mdadm --fail /dev/md3 /dev/sdd4

mdadm --remove /dev/md0 /dev/sdd2
mdadm --remove /dev/md2 /dev/sdd3
mdadm --remove /dev/md3 /dev/sdd4

Note: You can watch a spare in the mirrors take over with

# cat /proc/mdstat

This process starts immediately after the --fail. No, you don't get to choose which spare will be used. There is an internal order (mdstats shows it inside [] brackets).

Change the drive physically

Timeframe: ~20minutes

  • shut down the machine
  • remove drive physically and replace with new
  • powerup
  • partition new drive as required

In my system, my new (2TB) drives have GUID partition tables with 5 partitions each. The original 1TB drives have 4 partitions each on MBR partition tables. My linux system (lenny/sid) handles this fine. use gdisk for GUID partition table editing.

Adding partitions to the RAIDs

mdadm --add /dev/md0 /dev/sdd2
	added as spare. ~instant
mdadm --add /dev/md2 /dev/sdd3
	syncing the mirror. ~20min
mdadm --add /dev/md3 /dev/sdd4
	syncing the raid5...  ~3h40min

finishing up

(note that I swapped /dev/sdc also - using the identical process as above

  • mkswap for sdc1 and sdd1, and swapon for each
  • grow md2 for /home (this is due to my partitions for md2 now being doubled in size)

mdadm --grow /dev/md2 --size=max

  • executed when sdc4 was still syncing in the raid5. This sync then was DELAYED till that was done...
  • approx 20 min.

pvresize /dev/md2

  • pvs now shows PV md2 for vg_home as having 96gig free :)

reboot and finish

or is it?

problems?

Here are my notes...

...md0 fails due to sdc2 and sdd2 superblocks looking far too similar! :(
...so I tried this:
mdadm --zero /dev/sdd2
...and it works. but lacking that one partition now! gar!
...ok, I think it's cos whilst sdc2 and sdd2 are technically set as mirrors, they have nothing syncd to them, no filesystem... (tune2fs was claiming it had nothing on there, even though it was spare. and still same even after it was sync'd to the live mirror by failing the other drives! grrrrr
...so, using mdadm --add to add the drive to the md0
...and then using mdadm --grow -n 3 /dev/md0 - to make the drive an active mirror - forcing sync. 
...same then for sdd2 - so it's a 4drive LIVE mirror - no spares. 
...and reboot... 
= no
did:
mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sda2 /dev/sdb2
...to recreate the superblock. a mirror with no spare yet
...and sda errors. GAR!!!
...turn off and on... 
...ok ok ok 
I assembled from sdd2, since it was the only partition with the right uuid left, 
then added sd{a,b,c}2 in turn and that gave them all the right uuid
then grew to 2 devices (ie, shrunk it from 4 active) - mdadm --grow
and then added the 2 devices back in - making them now spare, not active
and finally mdadm --assemble --scan works
let's see if it boots now!
WOOT!

yet more notes?

  • what about sdc5 and sdd5
...a mirror? and then added to vg_shared? or left as the overload vg?
(benefit of this is we restrict the space used which will make true expansion easier later on :)
	benefit to adding to vg_shared is havign space to snapshot straight awya
	problem is removing that pv later on when we wish to expand the original raid...
	...otoh, seperate raid = no room for snapshots, but no pvreduce headaches...
* make it a mirror. the space will be reclaimed layer (when all four drives can expand the /shared across it), so for now the simplest thing is to make it a new mirror and not within lvm. (does this undermind the point of an LVM on /shared  ? maybe. In the long run I want to snapshot /shared anyway. we'll see :)
# mdadm --create --level=mirror -n 2 /dev/md4 /dev/sdc5 /dev/sdd5
mdadm: array /dev/md4 started.
# mkfs.ext3  /dev/md4
# tune2fs -r 1 /dev/md4
...and mounted :)

Still TODO

  • Extend lv_home into the new spare space, and embiggen the filesystem...
lvextend -L +50G /dev/vg_home/lv_home
resize2fs /dev/vg_home/lv_home
  • script snapshots of lv_home for backup improvement :)

Other notes

  • expanding a lv snapshot is as simple as
lvresize  -L+5G /dev/vg_home/homesnapshot

Nothing else needs be done.


External reference

Personal tools
Namespaces

Variants
Actions
Navigation
meta navigation
More thorx
Tools