EXT4 vs. XFS for HDFS

HDFS sits atop a local filesystem.  The FS type used can impact performance and resilience of the HDFS Cluster, so picking the right one is important.

According to the Hadoop Wiki, either of these are acceptable.  Yahoo tends to use XFS (or at least they did in 2009).

After some intense soul searching, I’ve decided we’ll use EXT4.  We’re expecting to have a couple of prod clusters; 1 tweaked for maximum performance and 1 for maximum data preservation.  I expect to tweak the format (mkfs) and mount options to dangerous levels. 🙂

My thinking is that 1 data node failure should not impact the security of our data.  The “slowness” of EXT4 comes with the benefit of no loss due to power failures.  Since I have at least 3 copies of each block in the cluster, why should I care about 1 node corrupting a block or two?  If I can do it, I may disable the entire journal for data nodes.  I’m still pondering that.

All of this is yet to be tweaked and tested.  Our hardware arrives in mid January 2013, then we’ll have some serious fun. 😉

4 Responses to EXT4 vs. XFS for HDFS

  1. Interesting discussion on this subject found here.

  2. JB says:

    Sounds good – but I thought you were a raging XFS fan!!!!

    • XFS is good in theory. It has fewer support tools and recently had a LOT of bug fixes (data loss type). For as long as it’s been around it shouldn’t have these problems, IMHO.

      EXT4 can do big data files too and has better support.

  3. jbattisti says:

    The SUSE guys really hate EXT4 and think you should either use BTRFS or XFS

