Category Archives: Development

Discussions regarding programming, coding, etc. for Hadoop clusters.

T vs. V and W Shaped People

We talk a lot about hiring T shaped people at my current gig and I think it’s a misnomer for a couple of reasons. First, it implies a ratio of depth to width that is askew. Developers and Admins in … Continue reading

Posted in Administration, Career, Development, Opinions | Leave a comment

Experimenting w/ Neo4j

Graph databases are a really neat concept. We’ve started playing with Neo here as we attempt to link customers with visits and actions based on those visits. It seems like a really good fit at first glance. Our challenge is … Continue reading

Posted in Data, Development, Tuning | Leave a comment

Just give it a nudge.

The second definition of nudge, according to Webster, is to “prod lightly: urge into action.” We use that concept in our data environments for various long running processes; for things that we want to happen frequently, but with an unknown … Continue reading

Posted in Administration, Deployment, Development | Leave a comment

Redshift ups and downs

AWS Redshift has been popular lately around my current gig. We’ve got a couple of clusters in use and a few more in POC mode. The in-use clusters are easy to justify pre-paid instances. A few thousand dollars and you … Continue reading

Posted in Administration, Development | 1 Comment

hello woRld!

R is the latest Hadoop darling. It is an open source language that “is widely used among statisticians and data miners for developing statistical software and data analysis. Polls and surveys of data miners are showing R’s popularity has increased … Continue reading

Posted in Administration, Deployment, Development, Tuning | Tagged , , , | Leave a comment

Weaponizing Hadoop

We are usually left to bash for scripting Hadoop functions. It’s the default in Linux and it’s usually good enough. There are enough “bash-isms” that will cause your Java/pig/database people serious heart ache. If you’re new to Hadoop, go ahead … Continue reading

Posted in Administration, Deployment, Development, syndicated | Tagged , , , | Leave a comment

Hadoop Hindsight #2 Keep it simple: more than likely someone else has encountered your problem.

An adventure is only an inconvenience rightly considered. An inconvenience is an adventure wrongly considered. -G.K. Chesterton Sometimes our ego gets the best of us.  This seems to occur more often in Hadoop than anywhere else I’ve worked.  I’m not … Continue reading

Posted in Development, Hindsight, Opinions | Tagged , | Leave a comment

Intelligent Design – A hindsight lesson.

Our Boot from Network Datanode design was conceived in ignorance of real world application. Serial Number vs. MAC Address debates ensued in Ivory Tower minds and a schema was built. I’m currently in-between designs. I’ve consulted our resident data genius … Continue reading

Posted in Deployment, Development, Hindsight, Opinions | Leave a comment

Hadoop Hindsight #1 Start Small

I thought we would start a weekly series on some lessons we’ve learned.  Many of the topics we’ve learned the hard way so we thought it might be helpful for those a few steps behind us.  YMMV, but we wish … Continue reading

Posted in Administration, Development, Opinions | Leave a comment

The 3 Pillars of Data Democracy

In order to promote the use of data within the enterprise, we need to provide a collaborative environment which gives people the freedom and incentive to try new things.  This gives everyone the chance to prove great ideas, or at … Continue reading

Posted in Data, Development, Opinions | Leave a comment