Monthly Archives: December 2012

Life on the edge of data node writes

If you’re serious about using Hadoop  you should subscribe to the User Mailing Lists.  They are a great source of insight as to how things are performing, new features and common problems. I’m currently working on a JIRA to clarify documentation … Continue reading

Leave a comment

HCatalog – Embrace the independence

Codd’s Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data … Continue reading

Leave a comment

Trivia for Christmas

To start the ball rolling, what was the quote on the wall in “It’s a Wonderful Life” during the beginning of the “Run on the banks?” It’s a fairly famous quote and should be much more popular than it is. … Continue reading


The Ten Commandments of Hadoop (Work In Progress – feel free to edit)

Thy namenode shall always persist (We will have multiple recovery methods) We accept data as is (Come as you are) We love and expect metadata (Nothing enters, changes, or exits without metadata) The family tree will always be maintained.  (We … Continue reading

Leave a comment

EXT4 vs. XFS for HDFS

HDFS sits atop a local filesystem.  The FS type used can impact performance and resilience of the HDFS Cluster, so picking the right one is important. According to the Hadoop Wiki, either of these are acceptable.  Yahoo tends to use XFS (or … Continue reading

Leave a comment

Thought of the day. Storing Confio detailed data in Hadoop?

If you manage a lot of Oracle, M$SQL, or DB2 – you need to check out Confio. Its an agent-less performance trending tool that will make you’re life much easier.  It tracks every SQL run by user/client machine/sql statement/etc and … Continue reading


Hello, McFly?

Should you really trust a data integration team who spends 30 minutes arguing that they SHOULDN’T include all source system customer attribute data in MDM?

Leave a comment

Hadoop cluster failure SS style.

Learn from history or you’ll be doomed to repeat it.


Confessions of a data architect

My name is DataG and I’m a data modeler.  It’s been 6 weeks since my last star-schema. Lets face it.  Codd, Imhoff, Inmon, and Kimball paved the way for almost every data analyst and app-dev professional since the relational model … Continue reading

1 Comment

You forgot to WHAT?!

Silly Admin, you forgot to Rack Aware Enable your Hadoop cluster.  Now you’ve got all of your data in Rack 1.  Lucky for you, there’s a way to fix it. Create and configure your rack aware script and restart your … Continue reading

Leave a comment