Category Archives: Administration

Discussions around administration of Hadoop

Kafka on AWS EC2 w/ SSL and External Visibility

I’m truly shocked by how difficult this information is to gather up in 1 place. Maybe because AWS has their own version of Kafka functionality. At any rate, after much reading and irritation I have it working. There is still … Continue reading

Posted in Administration, Deployment | Leave a comment

T vs. V and W Shaped People

We talk a lot about hiring T shaped people at my current gig and I think it’s a misnomer for a couple of reasons. First, it implies a ratio of depth to width that is askew. Developers and Admins in … Continue reading

Posted in Administration, Career, Development, Opinions | Leave a comment

System Administration Rules to Live By

I’ve had a variation of these running around for a while. Tweaks may come and go with trends, but the concepts are the same. When they say “Go Big!” they don’t mean it. Start with optimistic scripts. Finish them defensively. … Continue reading

Posted in Administration, Opinions | Leave a comment

Just give it a nudge.

The second definition of nudge, according to Webster, is to “prod lightly: urge into action.” We use that concept in our data environments for various long running processes; for things that we want to happen frequently, but with an unknown … Continue reading

Posted in Administration, Deployment, Development | Leave a comment

Redshift ups and downs

AWS Redshift has been popular lately around my current gig. We’ve got a couple of clusters in use and a few more in POC mode. The in-use clusters are easy to justify pre-paid instances. A few thousand dollars and you … Continue reading

Posted in Administration, Development | 1 Comment

Quick Split to Fix data silliness

We have a vendor sending us daily updates on shipping info. We have a well known and defined structure for each type of data and those types map neatly to tables in our database. We have about 9 tables that … Continue reading

Posted in Administration, Data | Tagged , , | Leave a comment

MapR is a better base?

I’ve heard about MapR for a long time and haven’t given it much consideration vs. OSS stacks. I reconsidering my position and conduction some evaluations. Why? MaprFS is a real POSIX File system that runs on Raw devices, not atop … Continue reading

Posted in Administration, Market Segment/Growth, Opinions | Tagged , , , , | Leave a comment

Ubuntu sucks and so does Debian

Full disclosure, I cut my teeth on Slackware and Redhat in the mid 90′s.  I even tried Yggdrasil once.  That being said… I fully fail to understand the allure of Ubuntu or it’s Mommy distro, Debian.  Yes I know Ubuntu … Continue reading

Posted in Administration, Opinions | Leave a comment

hello woRld!

R is the latest Hadoop darling. It is an open source language that “is widely used among statisticians and data miners for developing statistical software and data analysis. Polls and surveys of data miners are showing R’s popularity has increased … Continue reading

Posted in Administration, Deployment, Development, Tuning | Tagged , , , | Leave a comment

Weaponizing Hadoop

We are usually left to bash for scripting Hadoop functions. It’s the default in Linux and it’s usually good enough. There are enough “bash-isms” that will cause your Java/pig/database people serious heart ache. If you’re new to Hadoop, go ahead … Continue reading

Posted in Administration, Deployment, Development, syndicated | Tagged , , , | Leave a comment