This is the software side of Fattening Your Slug.
In order to actually use the extra RAM, the kernel needs to be able to see it. To do that, the RAM must be enabled during bootup. The bootloader does this by writing a hardware register (SDR_CONFIG) during boot. It is best not to fiddle with a bootloader as it is the piece of software you need to recover from a disaster. The RedBoot loader which comes with the nslu2 sets the hardware for 32M of memory (there is some dispute about how much). This cannot be changed without rebuilding the software and that is risky. Early builds such as OpenSlug 2.7 and earlier had the command line hardcoded into the kernel and they ignored any information passed to them by the bootloader. There are two alternatives; replace the bootloader or add a second bootloader that can be changed. The ultimate goal, in any case, is to configure the hardware and pass the correct memory size to a kernel that is listening.
An alternate bootloader is the APEX bootloader. The advantages of it is that the size of memory can be adjusted and it boots a bit faster. It can be compiled from source and supports the NSLU2. Current versions include a command to scan for hardware and configure the SDRAM controller as needed and a memory scanning command to determine the amount of contiguous memory available. These can be used configure the RAM and pass the memory size to the kernel. Previous versions required that you specify in the config file the number of banks and their size. Apex can be used as the primary bootloader or as you will note in recent slugos releases, the secondary bootloader.
Changing the Bootloader
Changing to the Apex bootloader is very serious business. Essentially, you get it right the first time or you start putting together a JTAG interface to start over. A serial connection is an absolute must as you must communicate with the bootloader to do things. The wise person would confirm that their JTAG rig works before changing bootloaders..... This means, specifically, confirm that you can detect the processor and flash with these instructions.
To change to APEX, you must boot into Redboot using your serial connection. You then use Redboot to download the APEX bootloader to RAM. You then start APEX by jumping to the beginning of the downloaded file. If all is well, you will see APEX boot up. For testing, you can now use APEX to boot into linux to see if it works. Of course this will overwrite APEX and you have to start over if you want to actually replace Redboot.
To replace Redboot, you get APEX running and then use it to copy itself into Flash memory. After this is done, Redboot will no longer be there and APEX will begin execution on powerup. You can use APEX to download other copies of APEX via XMODEM (over the serial cable) and can test these and write them to Flash. You can also download other images and execute them from APEX. You will notice that APEX automatically copies the kernel to RAM when it starts. The APEX website has the info for doing these steps. See http://wiki.buici.com/wiki/Main_Page
If you want to return to Redboot, you must download the entire image and then put your slugs MAC address in the right place before telling apex to overwrite itself (one way street again) It carries all the risks described above. One way to do this would be to copy APEX to RAM, then run it from RAM, then download the entire flash backup (you did back it up!) into RAM, then copy just the Redboot part back into the flash. Some instructions can be found here.
Modifying Redboot in Place
In my case I had to edit:
I finally managed to get my 64M (2 chips 16Mx16) configuration working!
- Using Apex built for 32MB or 64MB is exactly the same when testing Apex in RAM :-(
- But if I flash a 64MB version of Apex all is well. It puts 0x1A in the SDR_CONFIG
Update Nov 19 '06: I just used the information on BuildYourOwnRedBoot to make a version of RedBoot that will boot my 64M slug. Actually I got a binary RedBoot by doing this on my other (Unslung) slug.
NOTE: I used the 20061119 information from Steve G to dump my firmware, hexedit the memory controller (for 64MB), and flash my bootloader. I only had 32MB of memory -- I was thinking of upgrading soon. Worked great EXCEPT I couldn't boot into my OS anymore. RedBoot worked great, but the device would just sit there flashing all day long. I tried everything I could think of, and then finally went back to the original firmware. Rebooted and everything worked fine. I guess you can't set the memory controller to 64MB when you only have 32MB. Anyone have thoughts on this? --Mannkind
See also FatSlug
Things have gotten much easier with this version of slugos. The principle reason is that the Apex bootloader has been added as a second stage bootloader. The boot sequence is: Redboot loads and launches Apex which loads and launches Linux. It is therefore possible to have control over the boot parameters via apex while not risking a corrupted the initial bootloader (Redboot). Very safe and very configurable. You have upslug2 to flash new images which is much better and safer than the serial port downloads required if apex is the only bootloader.
The default startup command used by apex does not do any memory scanning to check for more than the default amount of SDRAM (32MB total). You must add these scanning commands to the apex startup command. The program apex-env can be used if you don't have a serial terminal. Apex saves these changes to the parameters in the flash chip and they are not lost between resets or power cycles.
For my fatslug, I interrupted the apex boot by pressing ^C at the appropriate time. I then entered the command 'setenv startup sdram-init; memscan -u 0x0+256m; copy $kernelsrc $bootaddr; wait 10 Type ^C key to cancel autoboot; boot'. This stored the startup command string at nor:0x7c000 which is the configured env(ironment) storage area.
This version uses the 126.96.36.199 kernel. At the moment, there is a problem with ARM kernels and more than 64M of RAM. You might want to configure for 64M until it is fixed. Maybe 'memscan -u 0x0+64M' but I haven't tried it. More details here.
To use the additional memory, you must modify the startup string in apex by adding the commands:
sdram-init; memscan -u 0x0+256M;
to the beginning of the string. Otherwise it will default to 32M.
The program apex-env can be installed to manipulate the apex environment without requiring a serial port. Each time you flash a new image, any changes to the apex environment strings are lost. The reason doing this is to allow recovery from a possibly damaged apex. So rewriting the info is a small price to pay for the recovery ability.
It is possible to modify the defconfig file in openembedded/packages/apex/apex-nslu2-1.5.13/ to add those commands as part of the default startup environment.
You must delete the apex-nslu2-1.5.13 stamps as well as the contents of the /tmp/work/armv5teb-linux/apex-nslu2-1.5.13-r1 directory and rebuild the image. You do not have to rebuild the kernel as it is all put together in 'slugos-image' and each time you run slugos-image, it builds a new image so no need to delete stamps there either.
This version runs linux kernel 188.8.131.52 and seems as solid as all of the recent 'beta/alpha' versions, that is, it works as expected and is very reliable. It can be installed using upgrade mode and upslug2 and it automatically configures itself for available memory. This is done using apex as the second bootloader as described above. One thing to consider though is to modify /etc/profile to include /usr/sbin and /sbin in the path to avoid some package problems. Unslung packages can be installed as well if you add the /opt/bin and /opt/sbin directories to the path. This has not been tested recently so please edit this if necessary.
The >64M Memory Problem
There has been an ongoing discussion regarding the inability of ARM processors to support memory sizes larger than 64M. Under heavy disk access, the USB subsystem will die and make the fatslug effectively hung. It is actually still sane but it can't do much of anything since it relies completely on the USB subsystem for its' rootfs, swap etc. One of the symptoms of this state is that you can still ping the thing.
The problem is apparently in the kernel code that requests DMA buffers beyond the pre-allocated pools. It has a problem (please correct me if my simplification is wrong) that if things get so busy that the kernel either runs out of buffers (unlikely) or cannot supply one of the requested size, the response is 'not ideal', so to speak. Some testing revealed that while the kernel was requesting USB buffers ranging in size from 4k to 64k, the subsystem was only offering sizes of 2k and 4k. A simple fix offered by Levente Nagy ( http://bugzilla.kernel.org/show_bug.cgi?id=7760 ) was to increase the size of the buffers offered by the subsystem. He used 16k and 128k. Testing shows that 32k and 64k works fine especially since the kernel does not request more than 64k anyway.
This is not a complete fix of the issue but one that ensures that there is always a buffer available to the USB subsystem. This is probably a good idea anyway since this subsystem must work for the operating system to function. Since it is only necessary on ARM processors that have a lot of memory anyway, it seems a reasonable accommodation.
Since making the modification to the kernel suggested, I have seen no problems or errors. I have done multiple rsync transfers of approximately 1 million files and 14G and, while it takes a while, it works. Earlier reports of a serious loss of performance on an Ext3 filesystem are also resolved by this kernel modification.
The problem really is, the ixp4xx processor series has a DMA (Direkt Memory Access) unit which can only do transfers to the first 64MB of memory. DMA transfers to and from higher memory regions need the "dma trampolin" code which transfers the data by other ways to this regions. If a lot of memory/dma space is used, situations can happen where this hook may reject a dma transfer request, and when this was a data transfer involving the usb drive (via internal pci bus), its doomed. There seems to be no easy fix, inherent problem in the DMA code...
Published patches try to lower the risk of trapping into such situations, but are more or less hacks only. Since problems happen mostly under high memory usage conditions, a slug with more then 64MB of memory can run perfectly fine, and sudden crash if for example a fsck on the filesystem starts, using lots of memory...