← Back to the Blog

Synology NAS - Bootrom recovery

By Tim Butler
Synology NAS - Bootrom recovery

We use a number of Synology branded Network Attached Storage (NAS) devices to store backups of some of our data. They’re a great platform and we have a number of them deployed in both the office and the Data Centre (DC).

Synology have been rolling out some great updates recently, most notably the DSM 5.0 release. After upgrading the office system, I started the firmware upgrade process for the systems deployed in the DC. All of them upgraded without error… except one. There’s always one!

Normally, after running the firmware update the system reboots. When you’re doing this remotely it’s a waiting game and you sit and wait for ping packets to start returning. So, I waited. And waited. And waited. What normally takes less than 5 minutes, I had now waited 20 minutes and the system hadn’t returned. Ahh the joys of doing things remotely.

The first step was to power cycle the system remotely, which unfortunately didn’t resolve the problem. Thankfully this system was only holding non-critical data, so the extended downtime wasn’t having any impact for our customers.

Finding the Fault

So, now it came time to diagnose the issue further and see what we can do to resolve it. After arriving in the DC, I could see the status light continually flashing. Reading through the Synology Forums, I tried some of the suggestions of removing all the drives as well as resetting the unit via the pinhole switch at the back. Still no luck. This is where the fun begins!

I had seen the console port on the back of the failed RS-812. The Synology documentation lists this as “Manufacturer use only”, but that wasn’t going to deter me. The unit was now out of warranty so the only thing I had to lose was time.

Another quick scour of the web and Synology forums suggested that the serial port could be utilised via 115200 8/n/1, which is quite common. I connected up the serial cable, opened a terminal program (PuTTY).

Marvell>> ¦
         __  __                      _ _
        |  \/  | __ _ _ ____   _____| | |
        | |\/| |/ _` | '__\ \ / / _ \ | |
        | |  | | (_| | |   \ V /  __/ | |
        |_|  |_|\__,_|_|    \_/ \___|_|_|
 _   _     ____              _
| | | |   | __ )  ___   ___ | |_
| | | |___|  _ \ / _ \ / _ \| __|
| |_| |___| |_) | (_) | (_) | |_
 \___/    |____/ \___/ \___/ \__|  ** LOADER **
 ** MARVELL BOARD: Synology Disk Station LE

U-Boot 1.1.4 (Jul  8 2011 - 12:09:44) Marvell version: 3.5.9

U-Boot code: 00600000 -> 0067FFF0  BSS: -> 0068B3D4

Soc: 88F6282 A1CPU running @ 1600Mhz L2 running @ 533Mhz
SysClock = 533Mhz , TClock = 200Mhz

DRAM (DDR3) CAS Latency = 7 tRP = 7 tRAS = 20 tRCD=7
DRAM CS[0] base 0x00000000   size 256MB
DRAM CS[1] base 0x10000000   size 256MB
DRAM Total size 512MB  16bit width
Addresses 8M - 0M are saved for the U-Boot usage.
Mem malloc Initialization (8M - 7M): Done
Using default environment

[4096kB@f8000000] Flash:  4 MB

CPU : Marvell Feroceon (Rev 1)

Streaming disabled
Write allocate disabled


USB 0: host mode
PEX 0: PCI Express Root Complex Interface
PEX interface detected Link X1
PEX 1: interface detected no Link.

Synology Model: RS812
Fan Status: Good

## Booting image at f8080000 ...
   Image Name:   Linux-2.6.32.12
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1664912 Bytes =  1.6 MB
   Load Address: 00008000
   Entry Point:  00008000
   Verifying Checksum ... Bad Data CRC

The good news is the system is alive! The bad news is the “Bad Data CRC” error means there’s obviously some corruption with the boot rom. So what can we do in the little U-Boot interface? Here’s the help output:

Marvell>> help
?        - alias for 'help'
WOLTest  - Wake On Lan u-boot testing
base     - print or set address offset
bootm    - boot application image from memory
bootp    - boot image via network using BootP/TFTP protocol
bubt     - Burn an image on the Boot Flash.
cmp      - memory compare
cp       - memory copy
cpumap   - Display CPU memory mapping settings.
crc32    - checksum calculation
echo     - echo args to console
erase    - erase FLASH memory
flinfo   - print FLASH memory information
go       - start application at address 'addr'
help     - print online help
icrc32   - checksum calculation
iloop    - infinite loop on address range
imd      - i2c memory display
imm[.b, .s, .w, .l]     - i2c memory modify (auto-incrementing)
imw      - memory write (fill)
inm      - memory modify (constant address)
iprobe   - probe to discover valid I2C chip addresses
loop     - infinite loop on address range
md       - memory display
mm       - memory modify (auto-incrementing)
mtest    - simple RAM test
mw       - memory write (fill)
nm       - memory modify (constant address)
pci      - list and access PCI Configuration Space
phyRead  - Read Phy register
phyWrite - Write Phy register
ping     - send ICMP ECHO_REQUEST to network host
printenv - print environment variables
protect  - enable or disable FLASH write protection
rarpboot - boot image via network using RARP/TFTP protocol
reset    - Perform RESET of the CPU
resetenv - Return all environment variable to default.
setenv   - set environment variables
sflash   - read, write or erase the external SPI Flash.
sg       - scanning the PHYs status
Temp     - read chip Tj temp
tftpboot - boot image via network using TFTP protocol
version  - print monitor version

Ok, so we have some options! Seeing the option to use Trivial File Transfer Protocol (TFTP) at least gave hope that recovery would be possible.

On the road to recovery

To get the system into a bootable state again I firstly need to get it to actually boot. As I’ve previously worked with embedded Linux systems, fiddling with TFTP and compressed Linux kernels is something I at least had experience with. 

First things first was to get a copy of the Synology firmware to extract a boot loader from. Synology name these .PAT files, but they’re actually just a standard TAR file. Since the last firmware update had gone wrong, I double checked the file size and even generated an MD5 hash of the file I originally used to compare. Both the new download and the previous ones were the same, so this wasn’t the cause.

So, firstly we extract the files from the PAT file uzing 7zip and we end up with these files:

Synology NAS - PAT file extraction

In this tar was the familiar zImage, which is a compressed Linux Kernel. This is exactly what I was hoping to find and at least gave hope of recovery.

Next was to configure a Trivial FTP server. Running a "printenv" command via the u-boot interface showed the following: 

printenv
bootcmd=bootm F8080000 F8280000
baudrate=115200
loads_echo=0
ipaddr=192.168.1.154
serverip=192.168.1.155
rootpath=/mnt/ARM_FS/
netmask=255.255.254.0

As I was running the recovery from a Windows system, I downloaded Tftpd32 and configured it to point to the files extracted from the PAT file:

Synology recovery - tftpd32 server

Looking through the available commands for the Marvell U-Boot I could see that it contained a “tftpboot” command. Giving this a go gave the following:

Marvell>> tftpboot
*** Warning: no boot file name; using 'C0A8019A.img'
Using egiga0 device
TFTP from server 192.168.1.155; our IP address is 192.168.1.154
Filename 'C0A8019A.img'.
Load address: 0x800000
Loading: *
TFTP error: 'File not found' (1)
Starting again

It was looking for a particular filename so I made a copy of the zImage file and called it C0A8019A.img to match. This was closer again:

TFTP from server 192.168.1.155; our IP address is 192.168.1.154
Filename 'C0A8019A.img'.
Load address: 0x0
Loading: #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #
done
Bytes transferred = 1664976 (1967d0 hex)

There's now an embedded Linux kernel transferred to memory. Finally let’s see if that boots using the “bootm” command:

Marvell>> bootm
## Booting image at 00000000 ...
   Image Name:   Linux-2.6.32.12
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1664912 Bytes =  1.6 MB
   Load Address: 00008000
   Entry Point:  00008000
   Verifying Checksum ... OK
OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
[    0.000000] Linux version 2.6.32.12 (root@build3) (gcc version 4.6.4 (Linaro GCC branch-4.6.4. Marvell GCC Dev 201309-2126.3d181f66 64K MAXPAGESIZE ALIGN) ) #4458 Thu Mar 6 14:16:04 CST 2014
[    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977
[    0.000000] CPU: VIVT data cache, VIVT instruction cache
[    0.000000] Machine: Synology 6282 board
[    0.000000] Using UBoot passing parameters structure
[    0.000000] Sys Clk = 200000000, Tclk = 166666667
[    0.000000] Synology Board ID: 25
....

I could ping the system again and the web management interface was loading. Success! Or, so I thought...

Another Stumbling Block

Of course, despite such a tough road so far, it couldn’t all be back to 100% now could it?! Nope, it seems not. Despite the fact that the web interface was loading, attempting to login generated the error message “System is getting ready. Please log in later”.

Using the Synology Discovery software showed the same problem and still didn’t allow me to run any management commands. The system was stuck in a bit of a loop. This was compounded by the fact that it appeared some standard system libraries were missing, if I ran a command like “ps” or “top” I was simply greeted with an error about the “libproc-3.2.8.so” file missing. 

My only hope was to try and run a full firmware update again. I couldn’t see how to do it via the shell so I needed to get the ability to login to the web interface to call it. Firstly, I had to resolve the fact that the system didn’t believe it’d finished booting. After more searching through the Synology forums (Google is your friend!) I managed to find someone else with a similar problem. I used the following to correct this:

synobootseq --set-boot-done
synobootseq --is-ready

Finally, I was at a point where the web login works. Everything appeared to be mostly working, so I went through the process of updating the firmware. A full recovery was very close.

And one last thing…

I triggered the system to reboot after the update and waited. And waited. This time was at least different, the machine was still responding to pings. With broken system libraries, the NAS simply couldn’t complete a clean reboot process. Using the serial console interface I could see that it was stuck but at least still responding to commands. To force a reset, I used the following:

echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-trigger

This is a “hard” reset and a last resort, as there’s no final sync of file systems or clean unmounting. However, it’s just the same as risk a power cycle and at this point I still had nothing to lose. Thankfully, the Magic SysRq Keys were enabled in this embedded Linux kernel and the system rebooted.

The joy of success

It’s working! Pings started to respond and a few minutes later I could login to the web interface. I now had a fully working NAS with the latest firmware running, all without any data loss. There were no more boot rom errors, nor were there any problems with missing library files. I scanned through the system logs to see any sign of ongoing niggly issues and couldn’t find any. Finally, it was all back to normal.

I certainly don’t think it was any bad coding or bad implementation from Synology’s part but simply a case of bad luck. With many other systems it would simply be a case of throwing the system out and starting again (since it was out of warranty).

The fact that I could further debug the problem using standard tools actually increases my faith in the products. After dozens of updates across many of their products, this is the only time I’ve had an issue. Thankfully with a bit of fiddling and learning, it all ended with a working system again.