Silly Admin, you forgot to Rack Aware Enable your Hadoop cluster.  Now you’ve got all of your data in Rack 1.  Lucky for you, there’s a way to fix it.

Create and configure your rack aware script and restart your cluster.  Have no fear, re-balancing will not start immediately.  So how do you get your blocks safely scattered?  Thanks to John Meagher we have a solution:

for f in `hadoop fsck / | grep "Replica placement policy is violated"
| head -n80000 | awk -F: '{print $1}'`; do
    hadoop fs -setrep -w 4 $f
    hadoop fs -setrep 3 $f

This handy “for” loop will find all of the blocks that are not safely scattered and add a new, “safe” replica and then remove the “unsafe” 4th block of data.  This assumes that you want a replication factor of 3.

