We are usually left to bash for scripting Hadoop functions. It’s the default in Linux and it’s usually good enough.
There are enough “bash-isms” that will cause your Java/pig/database people serious heart ache. If you’re new to Hadoop, go ahead and let the developers develop. After a few months you will have solved some common problems and now is the time to regroup. Take a couple of weeks to “sharpen the saw” by finding the best of the good and standardize on your solution. Life is so much better when every Hadoop developer does not have to solve common problems such as:
- How do I know which Cluster I’m in?
- How do I do config files so I’m not hard coding my paths, nodes, etc.
- How do I notify on failure/success
- When do I notify
- How should I structure my processing, processed and archive directories.
There are many more common questions to ask and answer. You should plan on having a reset every 3 to 6 months.
If you don’t take the time to consolidate, you’ll end up supporting dozens of different solutions to the same problem. I don’t know about you, but I’d rather have 1 process to understand.
Sharpen the saw or spend your life supporting bash scripts created by Java devs! I should have saved that horror story for Halloween!