-
Archives
-
Meta
Monthly Archives: December 2012
Life on the edge of data node writes
If you’re serious about using Hadoop you should subscribe to the User Mailing Lists. They are a great source of insight as to how things are performing, new features and common problems. I’m currently working on a JIRA to clarify documentation … Continue reading
HCatalog – Embrace the independence
Codd’s Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data … Continue reading
Trivia for Christmas
To start the ball rolling, what was the quote on the wall in “It’s a Wonderful Life” during the beginning of the “Run on the banks?” It’s a fairly famous quote and should be much more popular than it is. … Continue reading
The Ten Commandments of Hadoop (Work In Progress – feel free to edit)
Thy namenode shall always persist (We will have multiple recovery methods) We accept data as is (Come as you are) We love and expect metadata (Nothing enters, changes, or exits without metadata) The family tree will always be maintained. (We … Continue reading
EXT4 vs. XFS for HDFS
HDFS sits atop a local filesystem. The FS type used can impact performance and resilience of the HDFS Cluster, so picking the right one is important. According to the Hadoop Wiki, either of these are acceptable. Yahoo tends to use XFS (or … Continue reading
Thought of the day. Storing Confio detailed data in Hadoop?
If you manage a lot of Oracle, M$SQL, or DB2 – you need to check out Confio. Its an agent-less performance trending tool that will make you’re life much easier. It tracks every SQL run by user/client machine/sql statement/etc and … Continue reading
Hello, McFly?
Should you really trust a data integration team who spends 30 minutes arguing that they SHOULDN’T include all source system customer attribute data in MDM?
Confessions of a data architect
My name is DataG and I’m a data modeler. It’s been 6 weeks since my last star-schema. Lets face it. Codd, Imhoff, Inmon, and Kimball paved the way for almost every data analyst and app-dev professional since the relational model … Continue reading
You forgot to WHAT?!
Silly Admin, you forgot to Rack Aware Enable your Hadoop cluster. Now you’ve got all of your data in Rack 1. Lucky for you, there’s a way to fix it. Create and configure your rack aware script and restart your … Continue reading