Rack Topology/Awareness is very cool. I’m a little disappointed that you can’t ever update it without taking down NameNode Services.
We’re using DHCP for DataNodes and sometimes they change IP Addresses. Unless the NameNode has been recycled, it won’t re-evaluate the DataNode via the Rack Topology Script without re-cycling the NameNode. WFT? How hard is it to refresh cache? Why would you EVER assume that nothing changes in your cluster? Love you HDFS, but someone dropped the ball on this. It should be an EASY fix to implement a Rack Flush.
Worse, documentation says this script should resolve Hostname or IP; however, only the IP Address is used for DataNodes. None of this would be a problem if NN Topology could actually use Hostname.
It’s coming in 2.x, but… REALLY????????
PS: See the note in my comment, there’s already a patch.