view · edit · print · history


This page describes setting up RAID 1 (mirror drives) on the NSLU2. The process starts with a blank drive and an unslung drive and ends up with two mirrored drives containing all the data stored on the originally-unslung drive. This is achieved by creating a raid array with just the blank second drive at first, copying the entire contents of the unslung drive onto it and then hotadding the unslung drive to the raid array.

Diversion scripts have been written to start the raid array during the boot process and stop the array before shutdown. Unfortunately it is not possible to stop the array that contains the root filesystem during the shutdown process. This means that that array is marked as 'dirty' on shutdown and it will resync when restarted. Resyncing can take hours for a large partition and no other process can run while the resyncing is happening. To get around this problem this howto splits the root partition into a small partition containing the system files and a much larger partition to hold public files.

This howto uses custom partitions on the attached hardrives making it impossible to return to the normal Unslung setup without repartitioning the drives and wiping all data. (Of course you can transfer the data off the drives before re-partitioning).


I'm just curious about the use of mirrored swap partitions. I would basically set up swap in striped mode (Raid0) for performance. But if you take a look at http://linas.org/linux/Software-RAID/Software-RAID-8.html (Question 18) you'll see that you don't need to set up Raid0 for swap because the linux kernel is doing this automatically.

Initial Setup

  • Back up all your data before beginning.
  • Start with two identical drives, one of them unslung. It probably doesn't matter but I marked the usb cable of the unslung drive so that I always plugged it into the DISK 1 slot.
  • Copy any data you want to keep onto the unslung drive
  • Format the second drive if not already formatted by the NSLU2. Note: due to a bug in the Linksys firmware the second drive may say 0MB capacity on the NSLU2 webpage, but if it is listed as 'Ready' then all is well.
  • This howto completely wipes the unslung-drive after all data has been copied to the raid array. If you want to maximise your ability to return to your original configuration (should something go wrong) then prepare a second blank, formatted drive, identical to the first and leave your unslung-drive untouched.
  • Telnet/SSH into the box and download required software packages. You need mdadm, the updated version of busybox (sometimes busybox requires help from --force-depends) and two kernel modules:
# ipkg update
# ipkg install busybox-base
# ipkg install mdadm
# ipkg install kernel-module-md
# ipkg install kernel-module-raid1
  • Install modules so kernel can use them.
# /sbin/insmod md.o
# /sbin/insmod raid1.o

Change Hard Drive Partitions

The standard NSLU2-formatted disk has three partitions: A 50MB swap partition, a 100MB config partition (mounted as /share/hdd/conf) and the rest of the disk as a data partition (mounted as /share/hdd/data). Unslung 5.5 also uses the data partition as the root filesystem. When the NSLU2 is switched off or reboots it will try to stop all the raid arrays. Any arrays that are not cleanly stopped will resync during the startup process. The resyncing process can take up to 10 hours for a 300GB harddrive and no other processes can run during that time. That would mean that your sytem is out of action for up to 10 hours everytime you reboot and to avoid this I split the data partition into a 1GB root partition and a 299GB data partition.

This howto mirrors all four partitions. I had thought that mirroring the swap partition could have a performance cost when writing to disk (and indeed it does) but I believe there are more read operations (where there is a performance gain) than write operations on swap space. The files in the conf partition are used by Samba and the passwd utility so the conf partition must be mirrored if mirroring the data partition.

In order for the RAID arrays to work on a reboot we also need to change the partition types from 83 (Linux) and 82 (Linux swap) to fd (raid autodetect). We use the busybox version of fdisk to accomplish all this (the standard version of fdisk has been heavily modified by Linksys) as follows:

 #/opt/bin/busybox fdisk /dev/sdb

Be careful to use /dev/sdb, the repartitioning process wipes all existing data from the drive.

Then press 'p' to see the partition table. You should see screen output something like:

 Disk /dev/sdb: 300.0 GB, 300090728448 bytes
 255 heads, 63 sectors/track, 36483 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot    Start       End    Blocks   Id  System
 /dev/sdb1               1       36461   292872951   83  Linux
 /dev/sdb2           36462       36476      120487+  83  Linux
 /dev/sdb3           36477       36483       56227+  82  Linux swap

Use option 'd' to delete all three partitions

Now use option 'n' to add the four partitions. On my system the first partition was 150 blocks, the second was 14 blocks, the third was 6 blocks and the final partition (/dev/sdb4) took up the remainder of the disk. The sizes of the second and third partitions match the standard Linksys sizes.

Now use option 't' to reset the partition types to 'fd' (these codes are in hex in case you're wondering). You have to do that for each partition: 1-4. Using 'p' again should show something like:

 Device Boot    Start       End     Blocks   Id  System
 /dev/sdb1               1       150     1204843+  fd  Linux raid autodetect
 /dev/sdb2             151       165      120487+  fd  Linux raid autodetect
 /dev/sdb3             166       172       56227+  fd  Linux raid autodetect
 /dev/sdb4             173     36483   291668107+  fd  Linux raid autodetect

The sizes of the first three partitions should match this table exactly, the size of the fourth partition will depend on the size of the hard drive. My hard drive is 300GB.

Now use 'w' to write the new partition table to the disk & exit fdisk.

If fdisk requests a reboot for the changes to the partition table to become effective, do so. After rebooting, re-enabling telnet and logging in, don't forget to install the kernel modules again:

 # /sbin/insmod md.o
 # /sbin/insmod raid1.o

Create and Mount RAID arrays

Create one array for each partition (four in total). The 'missing' parameter indicates that the array is incomplete and that we will supply the second device later. This is referred to as starting the raid in 'degraded' mode. Full descriptions of the option parameters can be found in the mdadm man page.

 # mknod /dev/md4 b 9 4 2>/dev/null
 # /opt/sbin/mdadm --create --level=1 --raid-devices=2 /dev/md4 /dev/sdb4 missing 
 # /opt/sbin/mdadm --create --level=1 --raid-devices=2 /dev/md3 /dev/sdb3 missing
 # /opt/sbin/mdadm --create --level=1 --raid-devices=2 /dev/md2 /dev/sdb2 missing
 # /opt/sbin/mdadm --create --level=1 --raid-devices=2 /dev/md1 /dev/sdb1 missing

Run cat /proc/mdstat to check on the raid status:

 # cat /proc/mdstat
 Personalities : [raid1]
 read_ahead 1024 sectors
 md1 : active raid1 scsi/host0/bus0/target0/lun0/part1[0]
       1204736 blocks [2/1] [U_]
 md2 : active raid1 scsi/host0/bus0/target0/lun0/part2[0]
       120384 blocks [2/1] [U_]
 md3 : active raid1 scsi/host0/bus0/target0/lun0/part3[0]
       56128 blocks [2/1] [U_]
 md4 : active raid1 scsi/host0/bus0/target0/lun0/part4[0]
       291668032 blocks [2/1] [U_] 
 unused devices: <none>

There are other monitor functions that you can play around with such as "mdadm --examine /dev/sdb1" or "mdadm --detail /dev/md1".

Now we create the file systems on each of the four partitions, starting with the swap partition:

 # /sbin/mkswap /dev/md3
 # /sbin/swapon /dev/md3

Then the conf and data partitions. On my 300GB harddrive it takes 10mins to create the filesystem on the /dev/md4, even with an overclocked slug.

 # /usr/bin/mke2fs -j /dev/md2
 # /usr/bin/mke2fs -j /dev/md1
 # /usr/bin/mke2fs -j /dev/md4

Mount the new partitions on the 'flash' directory temporarily. We will remount them to their rightful place (/share/hdd/data) on reboot.

 # mount -t ext3 /dev/md2 /share/flash/conf
 # mount -t ext3 /dev/md1 /share/flash/data
 # mkdir /share/flash/data/public
 # chown admin.everyone /share/flash/data/public
 # chmod 775 /share/flash/data/public
 # mount -t ext3 /dev/md4 /share/flash/data/public

Copy entire file system to RAID partitions

This code was nabbed from the unsling script. I'm sure the authors of that script will be quick to disclaim all responsibility for what you are about to do ;)

 # cd /share/hdd/conf
 # /usr/bin/find ./ -print0 -mount | /usr/bin/cpio -p -0 -d -m -u /share/flash/conf
 # cd /
 # /usr/bin/find . -path './public' -prune -o -print0 -mount | /usr/bin/cpio -p -0 -d -m -u /share/flash/data
 # cd /public
 # /usr/bin/find ./ -print0 -mount | /usr/bin/cpio -p -0 -d -m -u /share/flash/data/public

The slug can manage about 10 Mbytes/sec at most so this last command could take a long time if you have a lot of data. There might be a quicker way using the dd command but this way works for me. (I have found the dd command useful for files that are too large for the cp command e.g. DVD ISO files)

Diversion Scripts

In Unslung 5.5 the root filesystem is mounted on one of the harddrives on reboot. The script that actually does the mounting is called 'linuxrc' and it can be found in /initrd. (There is another copy of linuxrc in the root directory but that is not used.) We have to modify this script to start the RAID arrays (with only one disk first, then 2 later).

  • Take a copy of the original file
# cp /initrd/linuxrc /initrd/linuxrc.orig
  • Using a text editor on /initrd/linuxrc, replace "/bin/mount -rt ext3 /dev/$prefroot /mnt" with these 9 lines:
/bin/mknod /dev/md4 b 9 4 2>/dev/null
/sbin/insmod /unslung/md.o
/sbin/insmod /unslung/raid1.o
/unslung/mdadm -A /dev/md4 -R /dev/sdb4
/unslung/mdadm -A /dev/md3 -R /dev/sdb3
/unslung/mdadm -A /dev/md2 -R /dev/sdb2
/unslung/mdadm -A /dev/md1 -R /dev/sdb1
/bin/sleep 5
/bin/mount -rt ext3 /dev/md1 /mnt
  • Copy mdadm, md.o and raid1.o into /initrd/unslung. Ensure that the new mdadm file has execute permissions (i.e. chmod 755 /initrd/unslung/mdadm). You'll find md.o and raid1.o in the /lib/modules/2.4.22-xfs/kernel/drivers/md directory.
  • Here is listing of my /initrd/unslung directory (files sizes may be slightly different from yours, mine have extra debugging lines):
# ls -l /initrd/unslung
-rw-rw-r-- 1 root root 53392 Jul 19 22:15 md.o
-rwxrwxr-x 1 root root 121368 Jul 19 22:15 mdadm
-rw-rw-r-- 1 root root 20192 Jul 19 22:15 raid1.o
  • Copy rc.sysinit, rc.1, rc.halt and rc.reboot into /share/flash/data/unslung. Exact permissions of the diversion scripts don't seem to matter. The diversion scripts can be found at HowTo.Raid1DiversionScripts. This is a listing of my unslung directory:
# ls -l /share/flash/data/unslung
-rw-r--r-- 1 root root 1902 Aug 29 11:34 rc.1
-rw-r--r-- 1 root root 1488 Aug 29 14:50 rc.halt
-rw-r--r-- 1 root root 1140 Aug 29 14:50 rc.reboot
-rw-r--r-- 1 root root 1437 Aug 29 11:37 rc.sysinit
  • If you are using SSH to access your slug (and I recommend you do) then please examine the rc.1 script closely. It creates a link to the root user's home directory called /root. If your root user's home directory is not /opt/user/root then you need to edit or comment out this line.
  • Umount the raid arrays:
# cd /
# umount /share/flash/conf
# umount /share/flash/data/public
# umount /share/flash/data
  • Stop swapping on /dev/md3 so we can umount it:
# swapoff /dev/md3
  • Stop the raid arrays:
# /opt/sbin/mdadm -S /dev/md4
# /opt/sbin/mdadm -S /dev/md3
# /opt/sbin/mdadm -S /dev/md2
# /opt/sbin/mdadm -S /dev/md1
  • Switch off slug
  • Unplug DISK 1 (The disk that is unslung)
  • Switch on slug and wait for it to reboot. It should work identically in every way to your old unslung setup. A full check for me is:
    1. NSLU2 web pages operational
    2. Samba works ok
    3. I have access to the box via openssh/telnet
    4. Twonkyvision media server working
    5. etc., etc.
  • If all is well then continue. If not then skip to the Troubleshooting section

Resyncing the RAID Arrays

At this stage we have a working RAID array containing just a single drive. The second drive is still unslung and you could still return to your original configuration. The next step is to add the unslung drive to the raid array to take it out of degraded mode.

This step in the process wipes all data from your unslung drive. If you're not confident about doing that for any reason then just perform these steps on another blank drive and leave your unslung drive untouched. That way, in order to return to your original configuration you just have to replace the original linuxrc file and reboot with the unslung drive attached.

While still running the slug from the second drive, prepare the partitions on the unslung drive in exactly the same way as it was done for the second drive before:

  • If there are any instances of usb_detect running (check with ps -ef) then kill them as they will attempt to mount your drives before we can sync them.
  • Plug in the first drive (unslung drive) and wait 30sec for it to be recognised
  • Use /opt/bin/busybox fdisk /dev/sda to delete the existing partitions, create new partitions (same layout/sizes as done before for the second drive) and change the partition types of the drive to 'fd' . Finally write the new partition table to the disk and exit fdisk.

Now the array can be prepared to use both disks:

  • Switch off slug
  • Unplug both drives and switch back on
  • Telnet into slug (after using web interface to enable telnet)
  • If there are any instances of usb_detect running (check with ps -ef) then kill them as they will attempt to mount your drives before we can sync them. Also, you might get locked out from the slug because the 'root' password will change.
  • Plug in both drives and wait 30sec for them to be recognised
  • Create the 4 raid arrays in degraded form:
# mknod /dev/md4 b 9 4 2>/dev/null
# /unslung/mdadm -A /dev/md4 -R /dev/sdb4
# /unslung/mdadm -A /dev/md3 -R /dev/sdb3
# /unslung/mdadm -A /dev/md2 -R /dev/sdb2
# /unslung/mdadm -A /dev/md1 -R /dev/sdb1
  • Hot-Add the unslung drive:
# /unslung/mdadm -a /dev/md3 /dev/sda3
# /unslung/mdadm -a /dev/md2 /dev/sda2
# /unslung/mdadm -a /dev/md1 /dev/sda1
# /unslung/mdadm -a /dev/md4 /dev/sda4

The order is important here: The /dev/md4 aray will take hours to resync so you should do the other three first.

  • Run cat /proc/mdstat to see that the disks are re-syncing
# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 scsi/host1/bus0/target0/lun0/part1[2] sdb1[0]
1204736 blocks [2/2] [UU]
md2 : active raid1 scsi/host1/bus0/target0/lun0/part2[1] sdb2[0]
120384 blocks [2/2] [UU]
md3 : active raid1 scsi/host1/bus0/target0/lun0/part3[1] sdb3[0]
56128 blocks [2/2] [UU]
md4 : active raid1 scsi/host1/bus0/target0/lun0/part4[2] sdb4[0]
291668032 blocks [2/1] [U_]
[>....................] recovery = 0.0% (25664/291668032) finish=569.7min speed=8554K/sec
unused devices: <none>

Yes, it really will take 569 minutes (almost 10 hours) to resync my two 300GB drives. For $70 you get 10MByte/sec throughput and that's it! While the disks are resyncing it's best not to touch the slug. I have found that if I try to perform any I/O tasks the speed of resyncing falls precipitiously and never rises back up so I just leave it alone (say, overnight) until it's finished. There are parameters that can be adjusted (Google for mdadm speed_limit_max) but i have found them to be ineffective. This is also the reason why the resyncing is done on the standard Linksys filesystem rather than the unslung filesystem: The slug is constantly writing to /var/log and running other cron jobs and this would cause the resyncing to die. If you prematurely switch off the slug while this resync is writing to the disks (drive lights flashing) nasty things can happen. I lost one of my harddrives by switching the slug off this way (I got impatient). Luckily I didn't lose any data as the other drive was ok but the damaged drive was rendered useless.

Once the resyncing has completed we create a mdadm.conf file, then stop the raid arrays, edit linuxrc once again and reboot:

 # /bin/echo "DEVICE    /dev/sd[ab][1234]" > /unslung/mdadm.conf
 # /unslung/mdadm --detail --scan >> /unslung/mdadm.conf

Mount the new root filesystem and copy mdadm.conf into place:

 # mount -t ext3 /dev/md1 /share/hdd/data
 # cp /unslung/mdadm.conf /share/hdd/data/opt/etc/mdadm.conf
 # umount /share/hdd/data

Stop the raid arrays:

 # /unslung/mdadm --stop --scan --config=/unslung/mdadm.conf

Edit linuxrc (at this point located in the filesystem root /) as follows: Replace

/unslung/mdadm -A /dev/md4 -R /dev/sdb4
/unslung/mdadm -A /dev/md3 -R /dev/sdb3
/unslung/mdadm -A /dev/md2 -R /dev/sdb2
/unslung/mdadm -A /dev/md1 -R /dev/sdb1


/unslung/mdadm --assemble --scan --config=/unslung/mdadm.conf
/bin/sleep 140

I picked 140 seconds as that is enough time for my (overclocked) slug to resync the boot (1GB) partition. Switch off the slug and switch it back on again, telnet/ssh in and run cat /proc/mdstat to check that all four partitions are active and that none are resyncing.

Err... that's it! You should have a working raid 1 array.


Reboot with one drive attached failed: If you were getting cat /proc/mdstat results similar to the examples listed above then the most likely cause of error is in the diversion scripts. To check these follow these steps:

  • Switch off the slug
  • Unplug all disks
  • Switch back on and wait for it to reboot, then telnet in (password will be uNSLUng because no drives attached)
  • Check linuxc for any errors
  • Check the unslung directory for the presence of mdadm, md.o and raid1.o. If these files are missing you can run the following commands to create them (I have tested this and it doesn't fill the flash):
# /usr/bin/ipkg-cl update
# /usr/bin/ipkg-cl install mdadm
# /usr/bin/ipkg-cl install kernel-module-md
# /usr/bin/ipkg-cl install kernel-module-raid1
# mv /opt/sbin/mdadm /unslung/mdadm
# chmod 755 /unslung/mdadm
# cd /
# mv /lib/modules/2.4.22-xfs/kernel/drivers/md/md.o /unslung/md.o
# mv /lib/modules/2.4.22-xfs/kernel/drivers/md/raid1.o /unslung/raid1.o
  • Check that mdadm has the executable bit set (maybe run chmod 755 /unslung/mdadm to be sure)
  • If you can't find any errors then plug in the drive that was blank (and now is part of the degraded raid array) and wait for 30secs.
  • Restart the raid arrays
# /sbin/insmod /unslung/md.o
# /sbin/insmod /unslung/raid1.o
# /unslung/mdadm -A /dev/md4 -R /dev/sdb4
# /unslung/mdadm -A /dev/md3 -R /dev/sdb3
# /unslung/mdadm -A /dev/md2 -R /dev/sdb2
# /unslung/mdadm -A /dev/md1 -R /dev/sdb1
  • Mount the root partition
# /bin/mount -t ext3 /dev/md1 /share/hdd/data
  • Check your diversion scripts in /share/hdd/data/unslung
  • If you still can't find anything wrong and you want to go back to your original configuration then continue to the next section

Return to original unslung configuration:

  • Stop raid arrays
# cd /
# /bin/umount /share/hdd/data
# /unslung/mdadm -S /dev/md4
# /unslung/mdadm -S /dev/md3
# /unslung/mdadm -S /dev/md2
# /unslung/mdadm -S /dev/md1
  • Erase raid superblocks:
# /unslung/mdadm --zero-superblock /dev/sdb4
# /unslung/mdadm --zero-superblock /dev/sdb3
# /unslung/mdadm --zero-superblock /dev/sdb2
# /unslung/mdadm --zero-superblock /dev/sdb1
  • Delete all four partitions using busybox fdisk
  • Return linuxrc to the original format, perhaps using "mv linuxrc.orig linuxrc"
  • Switch off the slug, plug in your two disks and switch back on. Use the web interface to reformat the second drive.

Slug doesn't reboot with no drives attached: This is probably caused by an error in the linuxrc script. To fix it you will need to flash the 5.5 firmware onto the slug again. No data should have been lost at this stage and you can still return to your original unslung configuration. This section will not work unless you were unslung on 5.x (or maybe 4.x at a pinch).

  • Switch slug off and put into upgrade mode using the reset button
  • Flash 5.5 firmware using Upslug or Sercomm upgrade tool
  • Telnet into the slug with no drives attached
  • Create the boot flag file (touch /.sda1root)
  • At this stage you can return to your original unslung configuration by switching off the slug, plugging in your unslung drive and starting up again.
  • However if you want to continue setting up the raid array don't switch off the slug. Instead edit the linuxrc file as described in the Diversion Scripts section.
  • Then install required software. I have tested this and it doesn't fill the flash memory
# /usr/bin/ipkg-cl update
# /usr/bin/ipkg-cl install mdadm
# /usr/bin/ipkg-cl install kernel-module-md
# /usr/bin/ipkg-cl install kernel-module-raid1
  • Find and move (not copy) required mdadm, md.o and raid1.o to /unslung
  • Switch off the slug, leave all drives unattached and switch it back on again. Hopefully it should reboot this time and you can continue with the howto.

failed to RUN_ARRAY: When you try to create the raid array if you get an error message that says "mdadm: failed to add /dev/sdb2 to /dev/md2: Invalid argument mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument " then the problem could be that you forgot to kill the usb_detect process and the slug has mounted the /dev/sd[ab][12] partitions. Unmount these and try again.

Failed Drive: Repeat the steps in this howto starting at "Resyncing the RAID arrays".

Power Loss: Raid 1 obviously won't help if you lose both drives simultaneously but unless you're very unlucky you shouldn't lose any data. The worst that might happen is that one of the disks will have to be replaced if the slug was performing an I/O operation on it when the power went.

After a power loss you must restart the slug with no drives attached and then repeat the steps in this howto starting at "Resyncing the RAID arrays". The reason that you cannot simply reboot is that all 4 raid arrays will try to resync and, because of the load on the CPU during startup, they will never finish the resync.

If you suspect that the slug was performing a disk read or write operation when you lost power then the disks might be corrupted. Read http://www.tldp.org/HOWTO/Software-RAID-0.4x-HOWTO-4.html and follow steps 1 to 3 of method 2 before repeating the steps in this howto starting at "Resyncing the RAID arrays". PS The fsck function on the slug has been renamed fsck.ext3.

This leads to the obvious warning: Never switch the slug off while it is reading/writing to a raid array (Drive lights flashing).

view · edit · print · history · Last edited by nsc.
Based on work by PatrickSchneider, nsc, Torsten Bitz, and dcordes.
Originally by nsc.
Page last modified on June 03, 2007, at 03:33 AM