Monthly Archives: December 2012
If you’re serious about using Hadoop you should subscribe to the User Mailing Lists. They are a great source of insight as to how things are performing, new features and common problems. I’m currently working on a JIRA to clarify documentation … Continue reading
Codd’s Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data … Continue reading
To start the ball rolling, what was the quote on the wall in “It’s a Wonderful Life” during the beginning of the “Run on the banks?” It’s a fairly famous quote and should be much more popular than it is. … Continue reading
Thy namenode shall always persist (We will have multiple recovery methods) We accept data as is (Come as you are) We love and expect metadata (Nothing enters, changes, or exits without metadata) The family tree will always be maintained. (We … Continue reading
HDFS sits atop a local filesystem. The FS type used can impact performance and resilience of the HDFS Cluster, so picking the right one is important. According to the Hadoop Wiki, either of these are acceptable. Yahoo tends to use XFS (or … Continue reading
If you manage a lot of Oracle, M$SQL, or DB2 – you need to check out Confio. Its an agent-less performance trending tool that will make you’re life much easier. It tracks every SQL run by user/client machine/sql statement/etc and … Continue reading
Should you really trust a data integration team who spends 30 minutes arguing that they SHOULDN’T include all source system customer attribute data in MDM?
Learn from history or you’ll be doomed to repeat it.
My name is DataG and I’m a data modeler. It’s been 6 weeks since my last star-schema. Lets face it. Codd, Imhoff, Inmon, and Kimball paved the way for almost every data analyst and app-dev professional since the relational model … Continue reading