Category Archives: Deployment

Kafka on AWS EC2 w/ SSL and External Visibility

I’m truly shocked by how difficult this information is to gather up in 1 place. Maybe because AWS has their own version of Kafka functionality. At any rate, after much reading and irritation I have it working. There is still … Continue reading

Posted in Administration, Deployment | Leave a comment

Just give it a nudge.

The second definition of nudge, according to Webster, is to “prod lightly: urge into action.” We use that concept in our data environments for various long running processes; for things that we want to happen frequently, but with an unknown … Continue reading

Posted in Administration, Deployment, Development | Leave a comment

Cloud Hadoop? Buzzword Fiesta!

We haven’t quite jumped the shark yet, but this is going to be full of buzzwords. Started a new gig where we’re building Dev, POC and possibly some prod clusters on AWS. Once again the first 80% of this was … Continue reading

Posted in Deployment, Security | Leave a comment

hello woRld!

R is the latest Hadoop darling. It is an open source language that “is widely used among statisticians and data miners for developing statistical software and data analysis. Polls and surveys of data miners are showing R’s popularity has increased … Continue reading

Posted in Administration, Deployment, Development, Tuning | Tagged , , , | Leave a comment

Weaponizing Hadoop

We are usually left to bash for scripting Hadoop functions. It’s the default in Linux and it’s usually good enough. There are enough “bash-isms” that will cause your Java/pig/database people serious heart ache. If you’re new to Hadoop, go ahead … Continue reading

Posted in Administration, Deployment, Development, syndicated | Tagged , , , | Leave a comment

GlusterFS and Hadoop, not replacing HDFS

Enterprise Hadoop must cooperate with many other forms of data transmission and ingestion. Any form of MFT, Mqueue or file landing zone requires disk space. Not HDFS disk, just disk that we can mount, MFT, SFTP, etc. to until we … Continue reading

Posted in Administration, Deployment, Tuning | Leave a comment

Replication FAIL

We’ve had our clusters running for a few months without significant issues. Or at least so we thought. I’m not sure of the why and how yet, but it seems that even rack topology scripts running, replication factor of 3 … Continue reading

Posted in Administration, Deployment, Tuning | Leave a comment

Intelligent Design – A hindsight lesson.

Our Boot from Network Datanode design was conceived in ignorance of real world application. Serial Number vs. MAC Address debates ensued in Ivory Tower minds and a schema was built. I’m currently in-between designs. I’ve consulted our resident data genius … Continue reading

Posted in Deployment, Development, Hindsight, Opinions | Leave a comment

Building Clusters without installing an Operating System

I’m about to tell you how we build Hadoop clusters without install an OS on the DataNodes.  We simply PXE boot them, assign them to a cluster and they join automatically.  Read on (and on) to see how we do … Continue reading

Posted in Administration, Deployment | Tagged , , , , , | Leave a comment

It’s scary… but sometimes things go really well!

If you’ve been in IT for more than a few minutes, you know that things seldom go as planned.  Sometimes they go sideways and you’re left to clean up an abomination.  Been there, done that, expect it at least 15% of … Continue reading

Posted in Administration, Deployment, Development, Tuning | Tagged , , | 1 Comment