Initial play with my new NAS

I recently played with a home NAS and found that it was great fun - actually so much fun, that I decided to buy one myself. The fun thing about it is the built-in Linux computer, on which you can install a lot of optional software packages - like apps on a smartphone. Also the fun thing is to see hardware perform well and get the best performance using available options.

 

Don't go for the cheapest model

There are several price classes when it comes to external storage. Don't expect anything from the cheapest products, investigate the technical specs anf find out where they differ and what they cost. Then settle for one with the desired techincal specs. Spend the cash and buy a proper product. There is no reason (except financial) not to get full use of hardware nowadays.

 

What's so important about the built-in Linux?

The most important is, that if your NAS burns out and you have your harddrives intact, then you can mount them on a standard PC and still read them. Data is not lost because of some proprietary closed protocol, the open standards of Linux apply to this product. Secondly, you can run a lot of software on the NAS as it is a stand-alone computer running a widely used operating system.

 

So what makes a great model?

Well, a great model is not a NAS but a SAN, but that's not for home use yet. It will get cheaper when gigabit goes obsolete. A NAS gives you a server, where all disk operations are performed internally - there will be network shares available and an administration interface. A SAN gives you a block device on the network, on top of which you can create volumes with file systems.

 

So what makes a great NAS?

A stationary PC (power monster) will give you all you need and much more. So you want a power-saving NAS which still has enough power to handle the load you expect. Some cheaper NAS products deliver inferior performance even as a single-user copying one file at a time

A NAS contains

  • A CPU (preferably dual-core or high frequency)
  • A NIC or two (two gigabit network interfaces should give up to 2 gigabit if configured that way).
  • A number of harddrive bays (preferably four or more).
  • A lot of cache RAM to speed up internal performance.
  • A ROM file area for the built-in Linux (may be copied to RAM during boot).

The CPU must be able to handle the interrupts. If the CPU runs at 1 GHz and the interrupt routine takes 1000 cycles to execute, then one interrupt service handling takes 1 microsecond. If the NIC sends an interrupt for every frame it receives (setting the frame size to 1536 bytes * 8 bits + some header bits = 12500 bits), then it sends an interrupt every 8 microseconds. In other words, the interrupt servicing takes more than 10% of the CPU time in this scenario, which is not far from reality. If you can increase frame size, then this may give you very much better performance. Remember to increase frame size (Maximum Transfer Unit - MTU) in both ends.

Performance tweaks - Jumbo frames and large MTUs

Increasing the MTU may help you, because the interrupts aren't generated so fast with larger MTUs, but the optimal size of the MTU is not easy to establish - in some cases a large MTU is bad (for example with lots of retransmissions of lost packets). The optimal system should be fast enough to handle two gigabit NICs without having to set the MTU. And if your laptop does not support large MTUs, then nothing is gained - the lowest MTU sets the largest possible size.

What to conclude from that?

Just note that 1 gigabit on the network matches 1 gigahertz on the CPU, so the CPU doesn't have plenty of time to handle the NIC. Do not underestimate the load from handling two gigabit interfaces as well as a number of harddrives. All goes well as long as the protocol is simple, but even unencrypted protocols can be slow if the CPU spends most of its time handling interrupts.

That's why you see the simplest protocol FTP have the best performance, while other protocols, such as NFS, which do more than just setting up a single data stream, will yield a lower network bandwith.

The rest of the hardware (network controller, harddrive controller, memory controller) is made from standard chipsets with little variation. The absolute determinatior of force is the CPU and the amount of installed memory. And of course a proper firmware which is well-written, rock-solid, bug-free and efficient.

 

So which one did I buy?

I won't mention the price, but I found a good offer on a QNAP TS-439 PRO II+, not because of its poetic name, but because it featured a high-frequency dual-core CPU, one gigabyte of internal RAM, four drive bays and a lot of other options which all made it the best buy at the moment. I want to see a "very good" performance from my NAS, it wasn't in any way cheap, they do cost a fair amount of money still. You can easily buy a much more advanced NAS if you want to handle a large load (industry use - not home use). And you can even buy a SAN. So actually it wasn't that expensive when seen from the correct point of view. I still want a SAN, I was only modest and settled for a good NAS.

 

Playing with the NAS first time

Let's throw a number of tests at it. All of the rest results should be close to optimal performance. The reference system is a full-blown power-consuming PC.

I would have liked to have a new set of hard drives, but the postman had a NAS for me and the harddrives wasn't in his bag, so I had to make do with two old 320G drives and a 1T drive, I had lying around.

As the two 320G drives are almost identical, they will be the target for the RAID tests. The 1T drive will be a single drive on the NAS. The setup of the four bays are

  • A: 320 GB
  • B: 320 GB
  • C: 1 TB
  • D: Empty

The two 320G drives are configured as RAID0 - stripe. This should deliver a harddrive data rate higher than the highest possible network bandwidth, and thus network efficiency can be tested properly. The 1T drive is considerably newer than the 320G, and thus has significantly better performance. However, it should be outperformed by the stripe.

 

Raw device performance

The raw devices are /dev/sda, /dev/sdb and /dev/sdc. The raid device is /dev/md0. No filesystem is involved yet, let's see the raw device performance

[~] # hdparm -tT /dev/sd[abc] /dev/md0

/dev/sda:
 Timing cached reads:   3436 MB in  2.00 seconds = 1717.95 MB/sec
 Timing buffered disk reads:  226 MB in  3.01 seconds =  75.12 MB/sec

/dev/sdb:
 Timing cached reads:   3464 MB in  2.00 seconds = 1731.93 MB/sec
 Timing buffered disk reads:  232 MB in  3.02 seconds =  76.76 MB/sec

/dev/sdc:
 Timing cached reads:   3452 MB in  2.00 seconds = 1727.25 MB/sec
 Timing buffered disk reads:  280 MB in  3.01 seconds =  92.94 MB/sec

/dev/md0:
 Timing cached reads:   3472 MB in  2.00 seconds = 1735.64 MB/sec
 Timing buffered disk reads:  452 MB in  3.01 seconds = 150.40 MB/sec
[~] #

The cached reads are actually memory timings rather than disk timings, the buffered reads are the disk timings.

The results are significantly slower than a PC for the cached reads, because the memory is slower. The buffered reads are the same - it is what the interface can deliver.

Test OK

These results are almost the same as a PC motherboard would have delivered. Also note that the stripe gives almost twice the speed of the individual drives they are made from. The only way to increase the numbers is getting faster disks. If two old disks in a stripe can outperform a new disk as single disk, then what about two new disks in a stripe? Still awaiting the postman ...

 

Filesystem performance

The two drives (sda, sdb) which makes the stripe, can not be accessed from the file system layer, only the raid can be accessed. The single drive (sdc) which is not in any raid configuration, can be accessed as sdc.

[~] # df -h | grep share
/dev/md0                583.9G    200.5M    583.7G   0% /share/MD0_DATA
/dev/sdc3               915.4G    199.7M    915.2G   0% /share/HDC_DATA
[~] #

Now lets see the performance for read and write. A 16GB file will be created, giving write performance, then it can later be read, giving read performance. The test can be repeated with a very large file to reduce effects of caching. The zero-device (/dev/zero) delivers a constant never-ending stream of zeroes as fast as the receiver can consume them.

[~] # time dd if=/dev/zero of=/share/HDC_DATA/testfile.tmp bs=1048576 count=16384
16384+0 records in
16384+0 records out

real    2m48.300s
user    0m0.114s
sys    1m6.379s
[~] # time dd if=/dev/zero of=/share/MD0_DATA/testfile.tmp bs=1048576 count=16384
16384+0 records in
16384+0 records out

real    1m53.529s
user    0m0.109s
sys    1m9.311s
[~] #

A translation of these numbers into other numbers gives a WRITE transfer speed of

  • sdc : 16 GB in 168.3 seconds = 97 MB / sec,
  • md0: 16 GB in 113.5 seconds = 144 MB / sec.

All in all very well, the read results should be the same

[~] # time dd if=/share/HDC_DATA/testfile.tmp of=/dev/null
33554432+0 records in
33554432+0 records out

real    2m46.833s
user    0m20.556s
sys    1m9.419s
[~] # time dd if=/share/MD0_DATA/testfile.tmp of=/dev/null
33554432+0 records in
33554432+0 records out

real    1m48.466s
user    0m20.211s
sys    1m11.942s
[~] #

The read speed is roughly the same as the write speed. Any differences must be because of caching and other issues.

  • sdc: 16 GB in 166.8 seconds = 98 MB / sec
  • md0: 16 GB in 108.5 seconds = 151 MB / sec

Test OK

The file system gives roughly the same performance as the raw device performance. Also, there is no difference in read and write speed. This is expected when operating on a single large file om an empty filesystem. True file system performance is difficult to test.

 

FTP performance

Optimal performance can be achieved by copying a single large file to/from the NAS using the simplest protocol of all - FTP. Both ends of the FTP process are stripes, so the network sets the bandwidth limit. The MTU is the default 1536 bytes and has not been changed. This should give full bandwith usage, there should be no limiting factor. The below image shows a 16 GB file being transferred twice, first put to the NAS, then get from the NAS.

 

Not shown on the graph is the system load average, which slowly settles around 1 while the FTP transfer runs. The FTP program takes up 10% of the CPU time when sending data, but eats up all the CPU while receiving data. This may have a simple explanation, just like the fact that data is not sent and received at the same speed - may have a simple explanation.

Test OK

The FTP transfer shows full bandwith usage. No MTU change necessary (yet).

 

NFS performance

The NFS test is performed excactly as the FTP test with the only difference that the NFS protocol is used instead of the FTP protocol. Thus a NFS share (from the stripe) is mounted on a directory, and files are copied to/from it using another stripe as the destination. This should give the same result as FTP.

 

Not shown on the graph is the system load average which is around 3 while the NAS receives data, and arond 1 while sending. There is a number of nfsd processes running, and they seem to have more to do while receiving data than while sending data.

Test OK

The NFS performance is the same as the FTP performance. The CPU is fast enough to handle all the work and make full use of the bandwidth.

 

File copy test

The NFS share used for the test above is now used for a simple file copy test. A representative set of files containg a few very large files (archives), some medium-size files (images and other media) and a lot of very small files (documents, source code, html files). A total of approx. 30.000 files summing approx. 40 GB are copied.

The copy process consists of

  • Opening a file in source and destination
  • The actual transfer of file data
  • Setting of permissions, date, ownership etc on the destination file

Thus a large number of small files are very much slower than a small set of large files.

First the set of files is copied to the NAS (green graph), then it is copied from the NAS (red graph).

This result can then be compared against a fully powered PC, the important thing to notice is that large files are copied at full speed. I actually tried copying a large set of camera images from a PC to the NAS, then the same set to another PC over NFS. Same result, average speed = 40 MB/sec, average file size a couple of megabytes. I then tried changing the MTU, no change in result.

File copy tests are always slower than the optimal large-single-file-ftp test, and a huge amount of small files is simply something to avoid. On the other hand, that's why I want a SAN.

 

Is it worth the money?

Yes. It delivers performance to the hardware limit without bottlenecks in my single-user tests. The 1.8 GHz dual-core atom CPU seems to be powerful enough to handle everything without having to tweak anything or search the net for other possible optimisations. It performs like a stationary PC out of the box, and that was what I wanted.

 

Next thing to test

  • Trunking of two NICs for up to 2GB/sec.
  • Various apps

[ the end for now ]