HDFS sits atop a local filesystem. The FS type used can impact performance and resilience of the HDFS Cluster, so picking the right one is important.
According to the Hadoop Wiki, either of these are acceptable. Yahoo tends to use XFS (or at least they did in 2009).
After some intense soul searching, I’ve decided we’ll use EXT4. We’re expecting to have a couple of prod clusters; 1 tweaked for maximum performance and 1 for maximum data preservation. I expect to tweak the format (mkfs) and mount options to dangerous levels.
My thinking is that 1 data node failure should not impact the security of our data. The “slowness” of EXT4 comes with the benefit of no loss due to power failures. Since I have at least 3 copies of each block in the cluster, why should I care about 1 node corrupting a block or two? If I can do it, I may disable the entire journal for data nodes. I’m still pondering that.
All of this is yet to be tweaked and tested. Our hardware arrives in mid January 2013, then we’ll have some serious fun.