Rack Awareness can be stale

Rack Topology/Awareness is very cool.  I’m a little disappointed that you can’t ever update it without taking down NameNode Services.

We’re using DHCP for DataNodes and sometimes they change IP Addresses.  Unless the NameNode has been recycled, it won’t re-evaluate the DataNode via the Rack Topology Script without re-cycling the NameNode.  WFT?  How hard is it to refresh cache?  Why would you EVER assume that nothing changes in your cluster?  Love you HDFS, but someone dropped the ball on this. It should be an EASY fix to implement a Rack Flush.

Worse, documentation says this script should resolve Hostname or IP; however, only the IP Address is used for DataNodes.  None of this would be a problem if NN Topology could actually use Hostname.

It’s coming in 2.x, but… REALLY????????

PS:  See the note in my comment, there’s already a patch.

About Grease Monkey

30+ Years of IT Geekiness, Linux Fanboy and Open Source patriot.
This entry was posted in Administration and tagged , , . Bookmark the permalink.

One Response to Rack Awareness can be stale

  1. Wow! As if to validate my Grease Monkey thoughts, a patch has been submitted to enable dfsadmin -refreshTopology. (HADOOP-8928.txt)

    Thanks Liang. :)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>