Silly Admin, you forgot to Rack Aware Enable your Hadoop cluster. Now you’ve got all of your data in Rack 1. Lucky for you, there’s a way to fix it.
Create and configure your rack aware script and restart your cluster. Have no fear, re-balancing will not start immediately. So how do you get your blocks safely scattered? Thanks to John Meagher we have a solution:
for f in `hadoop fsck / | grep "Replica placement policy is violated" | head -n80000 | awk -F: '{print $1}'`; do hadoop fs -setrep -w 4 $f hadoop fs -setrep 3 $f done
This handy “for” loop will find all of the blocks that are not safely scattered and add a new, “safe” replica and then remove the “unsafe” 4th block of data. This assumes that you want a replication factor of 3.
Read the whole thread here.
