I’m about to tell you how we build Hadoop clusters without install an OS on the DataNodes. We simply PXE boot them, assign them to a cluster and they join automatically. Read on (and on) to see how we do it. This is not quite a how to, but the astute admin will be able to divine the obligatory details.
There is a pretty cool open source project — oneSIS — that helps you create RAMFS images and NFS Boot Systems. Previous NFS Root projects have required individual directories structures for each server. oneSIS helps solve this problem by creating a RAM FS to host all of the “change-y” stuff that each server needs to be unique. It works on the concept that you manage your cluster by owning the DHCP and TFTP Servers on the same oneSIS enabled system. It extends DHCP attributes to boot specific servers classes based on MAC Address.
As cool as that is, it didn’t suit my needs. The author – Josh England – has done an excellent job on this project. He not only solved his problems, he added some really cool hooks into the code to allow me to solve mine.
I work at a large “enterprise-y” company and I’m not allowed to own DHCP, even on my dedicated Hadoop VLAN. At least the I’m able to tell DHCP that I own the “next sever” in the list for TFTPBOOT files. Because oneSIS’s init script is so very well written, I was able to easily modify it to deal with my needs.
My Needs: Assign static IP address from a database (MySQL table in this case), assign specific boot system based on MAC Address assignment. I only need to assign static IP addresses because the Name Node is picky about IP’s changing. I use MAC address to determine which OS Version (including Hadoop Configs) to boot. That required a minor change to the init script. I just used wget to assign the values for NFS hosted rootfs. We may look to include glusterfs and use that later. 🙂
To summerize: During RAMFS boot, the init script calls home using the postnetconfig hook already in oneSIS and then I use wget to grab the full URL of the NFS Root. Keep in mind, that oneSIS could deal with this, if I also owned DHCP.
So now I’m booting an operating system version of my choice. I inject another init.d script to do another wget to the database to determine, some of the other services I should start. I could have those all in the NFS image but I require database verification in order to start some services. In my database I have an “Active” flag that tells a system if it’s really supposed to start some services. I use this in the event that a system has been offline more than an acceptable amount of time. I just set the active flag to “no” and nothing runs. I could do this by changing the boot image, but that seemed like more work for my needs.
So how does this help? Say I roll in a rack of servers. I boot them all up and they call home to my database, because of the VLAN they’re cabled with. Since I don’t know their MAC address (it’s not in my database) I assign them a generic OS that still knows to check in with me. When it checks in, it tells me the MAC address, System Serial Number, RAM and CPU values. As soon as that rack of servers is confirmed, I add their rack ID to the database. Something like “update nodes set rackid=”newrack” where rackid=NULL;” Season to taste.
Now I can assign any of the nodes in that rack to a particular role in life, including environment. I simply update my database to tell it that SN XXXX should be in PROD and that links to another table that tells it which Path to use for Root. You need to apply some database linkage knowledge here. I can map it out if required, but I think our audience can follow the plan so far.
I run some black magic scripts in the background that updates rack-awareness scripts, update NFS /etc/hosts, and all of the other things that really matter. All of these scripts require updated database info. 🙂
At the end of the day, I can boot any node to any environment and run any version of my stack of stuff. I rely on really cool open source projects and take full advantage of all they offer. And finally, I have a very flexible, stable and rapidly deployed cluster. Next week I tackle world peace. 😉
~~ GM