Category Archives: Administration

Discussions around administration of Hadoop

GlusterFS and Hadoop, not replacing HDFS

Enterprise Hadoop must cooperate with many other forms of data transmission and ingestion. Any form of MFT, Mqueue or file landing zone requires disk space. Not HDFS disk, just disk that we can mount, MFT, SFTP, etc. to until we … Continue reading

Posted in Administration, Deployment, Tuning | Leave a comment

Replication FAIL

We’ve had our clusters running for a few months without significant issues. Or at least so we thought. I’m not sure of the why and how yet, but it seems that even rack topology scripts running, replication factor of 3 … Continue reading

Posted in Administration, Deployment, Tuning | Leave a comment

Hadoop Hindsight #1 Start Small

I thought we would start a weekly series on some lessons we’ve learned.  Many of the topics we’ve learned the hard way so we thought it might be helpful for those a few steps behind us.  YMMV, but we wish … Continue reading

Posted in Administration, Development, Opinions | Leave a comment

Cinderella has left the Hadoop Cluster

It’s Friday evening before our Hadoop Administrator leaves for a week of vacation in New Hampshire and about an hour before he leaves he says “it’s turning into a pumpkin in an hour”.  Of course we wanted to go live … Continue reading

Posted in Administration, Career | 1 Comment

Building Clusters without installing an Operating System

I’m about to tell you how we build Hadoop clusters without install an OS on the DataNodes.  We simply PXE boot them, assign them to a cluster and they join automatically.  Read on (and on) to see how we do … Continue reading

Posted in Administration, Deployment | Tagged , , , , , | Leave a comment

Working without a net

In a previous post, I mentioned that we’d be using a radically tuned EXT4 FS for our Hadoop DataNodes.  Well we did it.  Then I turned off journaling! It was scary, exciting and it seems to be working pretty well. … Continue reading

Posted in Administration, Tuning | Tagged , , | 1 Comment

It’s scary… but sometimes things go really well!

If you’ve been in IT for more than a few minutes, you know that things seldom go as planned.  Sometimes they go sideways and you’re left to clean up an abomination.  Been there, done that, expect it at least 15% of … Continue reading

Posted in Administration, Deployment, Development, Tuning | Tagged , , | 1 Comment

Rack Awareness can be stale

Rack Topology/Awareness is very cool.  I’m a little disappointed that you can’t ever update it without taking down NameNode Services. We’re using DHCP for DataNodes and sometimes they change IP Addresses.  Unless the NameNode has been recycled, it won’t re-evaluate … Continue reading

Posted in Administration | Tagged , , | 1 Comment

Life on the edge of data node writes

If you’re serious about using Hadoop  you should subscribe to the User Mailing Lists.  They are a great source of insight as to how things are performing, new features and common problems. I’m currently working on a JIRA to clarify documentation … Continue reading

Posted in Administration, Development | Tagged , , , | 2 Comments

HCatalog – Embrace the independence

Codd’s Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data … Continue reading

Posted in Administration, Data | 2 Comments