RAID upgrade

From ThorxWiki
Jump to: navigation, search

Contents

For my home server, I have filesystems on LVMs on RAIDs on partitions. Upgrading a disk is then, arguably, rather non-trivial. However the intended benefit of this setup is, hopefully, that I can roll over the oldest disk in the array every year or so, and so the whole lot grows incrementally as needed. "live" expansion on all levels means I should never have to create a new filesystem and copy data over, as per historic efforts.

These are my notes-to-self as of the time leading up to my first hardware change. Prior to this all disks are identical in size. There will be no significant size benefit until the fourth disk (smallest) is upgraded. After that, every upgrade (of the smallest disk - presumably replacing it to become the new 'largest') will yield a size increase - based upon the limits set by the 'new' smallest (oldest) disk.

This is not optimal use of available disk space for any given drive over it's life. However, it is hopefully rather nice in terms of budgetry upgrade requirements! :)

Pros

  • Rolling upgrades are win.
    • response: rolling upgrades are the planning headache

Cons

  • Each drive increase size is predicated on the drive purchased 3 drives back! So 'instant embiggening' is difficult.
    • response: you don't use this system unless you plan to think ahead anyway. Also, the old drive being taken out could be put into external USB caddy for additional space
    • response two: the 'unused' space on each drive (ie, the size difference between it and the smallest/oldest) could be partitioned into non-raid usable emergency spaces too!

My system

Software

# cat /etc/debian_version 
lenny/sid
# uname -a
Linux falcon 2.6.18-6-amd64 #1 SMP Mon Jun 16 22:30:01 UTC 2008 x86_64 GNU/Linux
# mdadm --version
mdadm - v2.5.6 - 9 November 2006
# lvm version
  LVM version:     2.02.07 (2006-07-17)
  Library version: 1.02.08 (2006-07-17)
  Driver version:  4.7.0

Setup


# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md3 : active raid5 sda4[0] sdd4[3] sdc4[2] sdb4[1]
      2637302400 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md2 : active raid1 sdc3[0] sdd3[1]
      87891520 blocks [2/2] [UU]

md0 : active raid1 sdb2[0] sdc2[2](S) sdd2[3](S) sda2[1]
      8787456 blocks [2/2] [UU]

md1 : active raid1 sda3[0] sdb3[1]
      87891520 blocks [2/2] [UU]

unused devices: <none>

# pvs
  PV         VG        Fmt  Attr PSize  PFree 
  /dev/md1   vg_home   lvm2 a-   83.82G     0 
  /dev/md2   vg_home   lvm2 a-   83.82G     0
  /dev/md3   vg_shared lvm2 a-    2.46T     0 

# vgs
  VG        #PV #LV #SN Attr   VSize   VFree 
  vg_home     2   1   0 wz--n- 167.63G     0
  vg_shared   1   1   0 wz--n-   2.46T     0

# lvs
  LV        VG        Attr   LSize   Origin Snap%  Move Log Copy% 
  lv_home   vg_home   -wi-ao 167.63G
  lv_shared vg_shared -wi-ao   2.46T

The plan

Upgrading each disc in turn to a larger physical disc... (1.5TB or 2TB, etc), all levels can be grown and expanded...

I'll start with /dev/sdd and work backwards (sda and sdb are the enterprise and the quiet drive, respectively), making each partition larger as needed. Due to a change from MBR to EFI type partition tables, I'm no longer limited to four partitions - and so instead of having unusable space in an oversize partition on /shared, I'll create an extra partition out of that space to use, and repartition into this space when all drives are available to do so. I'll also be increasing the size of md2 and so vg_home will increase in size.

So - like this

  1. within mdadm
    1. Remove sdd2 from within the md0 raid1 - this array has 2 spares)
    2. Remove sdd3 from within the md2 raid1 - this has NO SPARE)
    3. Remove sdd4 from within the md3 raid5)
  2. Hardware setup
    1. Replace drive physically
    2. Partition
  3. in mdadm again
    1. join sdd1, sdd2, sdd3 and sdd4 into their respective MD devices
  4. within LVM
    1. enlarge the respective VG and LVs in turn
  5. Finally, enlarge the filesystem.
  6. any spare partitions do "stuff" (possibly mirror and

Implementation

Removing partitions from the RAIDs

(this is for /dev/sdd - I repeated for sdc also)

mdadm --fail /dev/md0 /dev/sdd2
mdadm --fail /dev/md2 /dev/sdd3
mdadm --fail /dev/md3 /dev/sdd4

mdadm --remove /dev/md0 /dev/sdd2
mdadm --remove /dev/md2 /dev/sdd3
mdadm --remove /dev/md3 /dev/sdd4

Note: You can watch a spare in the mirrors take over with

# cat /proc/mdstat

This process starts immediately after the --fail. No, you don't get to choose which spare will be used. There is an internal order (mdstats shows it inside [] brackets).

Change the drive physically

Timeframe: ~20minutes

  • shut down the machine
  • remove drive physically and replace with new
  • powerup
  • partition new drive as required

In my system, my new (2TB) drives have GUID partition tables with 5 partitions each. The original 1TB drives have 4 partitions each on MBR partition tables. My linux system (lenny/sid) handles this fine. I used gdisk for GUID partition table editing.

Adding partitions to the RAIDs

mdadm --add /dev/md0 /dev/sdd2
	added as spare. ~instant
mdadm --add /dev/md2 /dev/sdd3
	syncing the mirror. ~20min
mdadm --add /dev/md3 /dev/sdd4
	syncing the raid5...  ~3h40min

finishing up

(note that I swapped /dev/sdc also - using the identical process as above

  • mkswap for sdc1 and sdd1, and swapon for each
  • grow md2 for /home (this is due to my partitions for md2 now being doubled in size)

mdadm --grow /dev/md2 --size=max

  • executed when sdc4 was still syncing in the raid5. This sync then was DELAYED till that was done...
  • approx 20 min.

pvresize /dev/md2

  • pvs now shows PV md2 for vg_home as having 96gig free :)

reboot and finish

or is it?

problems?

Here are my notes...

...md0 fails due to sdc2 and sdd2 superblocks looking far too similar! :(
...so I tried this:
mdadm --zero /dev/sdd2
...and it works. but lacking that one partition now! gar!
...ok, I think it's cos whilst sdc2 and sdd2 are technically set as mirrors, they have nothing syncd to them, no filesystem... (tune2fs was claiming it had nothing on there, even though it was spare. and still same even after it was sync'd to the live mirror by failing the other drives! grrrrr
...so, using mdadm --add to add the drive to the md0
...and then using mdadm --grow -n 3 /dev/md0 - to make the drive an active mirror - forcing sync. 
...same then for sdd2 - so it's a 4drive LIVE mirror - no spares. 
...and reboot... 
= no
did:
mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sda2 /dev/sdb2
...to recreate the superblock. a mirror with no spare yet
...and sda errors. GAR!!!
...turn off and on... 
...ok ok ok 
I assembled from sdd2, since it was the only partition with the right uuid left, 
then added sd{a,b,c}2 in turn and that gave them all the right uuid
then grew to 2 devices (ie, shrunk it from 4 active) - mdadm --grow
and then added the 2 devices back in - making them now spare, not active
and finally mdadm --assemble --scan works
let's see if it boots now!
WOOT!

yet more notes?

  • what about sdc5 and sdd5

They are a mirror outside the LVM structure. Mounted as 'overflow' for shared data.

Some data shuffling will be required when all drives are ready to expand the /shared/ raid5 however!

# mdadm --create --level=mirror -n 2 /dev/md5 /dev/sdc5 /dev/sdd5
mdadm: array /dev/md5 started.
# mkfs.ext3  /dev/md5
# tune2fs -r 1 /dev/md5
...and mounted :)

Still TODO

  • Extend lv_home into the new spare space, and embiggen the filesystem...
lvextend -L +50G /dev/vg_home/lv_home
resize2fs /dev/vg_home/lv_home
  • script snapshots of lv_home for backup improvement? :)

Other notes

  • expanding a lv snapshot is as simple as
lvresize  -L+5G /dev/vg_home/homesnapshot

Nothing else needs be done.


2011 January upgrade

In which I swapped a 1TB drive to 3TB.

The partitions to match existing were created, with the only note being to exactly round the swap to 1GB, system to 10GB, and home to 190GB (though /home/ cannot embiggen untill the next drive is upgraded so the mirror can grow. Also, was originally 200GB, but an error in partition sizes meant it had to be reduced to create space (later partitions were locked in with data).

a Raid1-to-Raid5 conversion was attempted on the 1TB raid1 on the two 2TB drives, but power was lost part way through reshaping to three devices and the raid was lost. Fortunately it's data had been backed up to the new spare TB partition on the 3TB drive, so that is a RAID5 being created from scratch, at which point the data will be copied back.

Future plans

I should make /home sit on a RAID6, rather than VG over a pair of mirrors. That would protect me from ANY two drive failures, instead of only ⅔protection after the first failure. RAID6 should also make adding drives simpler (and can move to an odd number of drives smoothly). The downside is that size is 2x smallest partition. Current means size is smallest + 3rd smallest.

July 2012 raid rebuild

6 months of failing drives... finally identified as the 3TB from January... and I added a fifth drive in. Time to rebuild it all. (ok, except the root raid1 - md0)

My notes...

md3 : active raid5 sda5[0] sdd5[3] sdc5[2] sdb5[1]

repartition so that the 5th partition is as follows:

3TB disks, shared starts at sector 526387240 (240G home)
2TB disks, shared starts at sector 421529640 (190G home)

mdadm --create --verbose /dev/md3 --level=5 --raid-devices=5 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5 missing
mkfs.ext4 -v -m .1 -b 4096 -E stride=128,stripe-width=512  /dev/md3
mdadm --grow --bitmap=internal /dev/md3


mdadm --create /dev/md1  -l 6 -n 5 -b internal  /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4 /dev/sde4
mkfs.ext4 -v -m .1 -b 4096 -E stride=128,stripe-width=384 /dev/md1

mdadm --grow --bitmap=internal --bitmap-chunk=256M /dev/md1

when >1 drives are out of a raid5 ...

root@falcon:~# mdadm --assemble /dev/md3 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5
 
mdadm: /dev/md3 assembled from 2 drives - not enough to start the array.
root@falcon:~# mdadm --assemble --force /dev/md3 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5
mdadm: cannot open device /dev/sda5: Device or resource busy
mdadm: /dev/sda5 has no superblock - assembly aborted
root@falcon:~# mdadm --stop /dev/md3                                            
mdadm: stopped /dev/md3
root@falcon:~# mdadm --assemble --force /dev/md3 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5
mdadm: forcing event count in /dev/sda5(0) from 2107 upto 2109
mdadm: forcing event count in /dev/sdb5(1) from 2107 upto 2109
mdadm: Marking array /dev/md3 as 'clean'
mdadm: /dev/md3 has been started with 4 drives (out of 5).

(cos in this example, I was only working with 4 drives in a deliberately degraded raid5 (5th drive to come later). The above has also worked with two failed out of a 4 drive raid5, and returning all 4 to a working raid5 with no apparent data loss


External reference

Personal tools
Namespaces

Variants
Actions
Navigation
meta navigation
More thorx
Tools