You forgot to WHAT?!

Silly Admin, you forgot to Rack Aware Enable your Hadoop cluster.  Now you’ve got all of your data in Rack 1.  Lucky for you, there’s a way to fix it.

Create and configure your rack aware script and restart your cluster.  Have no fear, re-balancing will not start immediately.  So how do you get your blocks safely scattered?  Thanks to John Meagher we have a solution:

for f in `hadoop fsck / | grep "Replica placement policy is violated"
| head -n80000 | awk -F: '{print $1}'`; do
    hadoop fs -setrep -w 4 $f
    hadoop fs -setrep 3 $f
done

This handy “for” loop will find all of the blocks that are not safely scattered and add a new, “safe” replica and then remove the “unsafe” 4th block of data.  This assumes that you want a replication factor of 3.

Read the whole thread here.

 

About Grease Monkey

30+ Years of IT Geekiness, Linux Fanboy and Open Source patriot.
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to You forgot to WHAT?!

  1. Pingback: Replication FAIL | Data for Profit $$

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>