NFS+IPsec Performance

We’ve recently moved to having our filestore NFS exported from a cluster. This provides almost complete resilience from hardware failures, and moves us away from depending on individual end-user systems with locally attached filestore.

Given the inherent insecurities with NFS we opted to use IPsec authentication (but not encryption) between the hosts involved. The NFS server only accepts connections from a list of hosts, and we know those hosts are who they say they are by relying on the IPsec authentication. We’ve also made it use privileged ports to ensure local users don’t try any spoofing :-)

The trade-off here appears to be latency. I’ve done some completely unscientific tests that involved shovelling UDP data at a fixed rate between two machines. These are the ”jitter” figures they produced:

  • 0.10ms – direct
  • 0.30ms – via router
  • 0.70ms – via router with IPsec

Bear in mind that those figures might not bear any relation to the latencies involved with NFS packets, but it should give an idea of the relative delays added by routing and IPsec.

We could, to some extent, reduce those figures by replacing hardware. Quicker routers would undoubtedly remove some of the routing latency, and quicker machines could perform the IPsec calculations faster. But this probably isn’t the cheapest solution.

The first test I want to try is adding a private network between the NFS server and NFS client, with no routing involved. Seeing as it’s private we can reasonably trust that people won’t be able to spoof packets on that network and remove the IPsec authentication. In theory, these differences could signficantly reduce the latencies involved.

We’ll continue to monitor this for a while first, though. We need to keep an eye on loading on the NFS server, network usage, and so on. But, at the moment, it seems likely the problems are in the network part of NFS communication process.

  • Share/Bookmark

Related posts:

  1. NFS Performance, continued Back in May I wrote about the performance problems we were having with our new NFS based user filestore. It’s been a while since then, and the problems have continued. We have noticed that it appears to be load related – not just the network, but also the machine. This suggests that our theories about [...]...
  2. NFS Performance, concluded Back in the middle of last year I wrote about our plans to tackle our NFS performance issues by introducing a direct and dedicated network link to carry our NFS traffic between the clients and the servers. We’d done the tests so we just had to implement it. First we waited for the financial year [...]...
  3. Router rebuild (or, an excuse to play with IPv6?) So recently my router decided it didn’t want to whir its fans anymore and consequently gave up on life. It’s a dual CPU machine and both CPU fans had managed to wedge. After fixing them and getting things running again I heard klunking noises coming from the front of the case; one of the disks [...]...
  4. “Disc quota exceeded” Today we saw a strange problem on our Solaris hosts that NFS mount VxFS filestore from our Veritas cluster. The users were seeing “Disc quota exceeded” messages, whilst the quota command wasn’t showing they’d hit their limit. After some digging on the cluster node we found the following error message: Sep 12 11:04:33 bes vxfs: [ID 702911 kern.warning]         WARNING: msgcnt 10 mesg 089: V-2-89: [...]...
  5. Upgrading Debian If you’ve been following my blog you’ll know that I’ve been working on a new filestore project at work for a while now. After getting things working nicely on our Solaris machines, and finally moving my home directory over, I decided to tackle our Debian server. It quickly became apparent that I’d need to upgrade [...]...

Leave a Reply