Kafka on AWS EC2 w/ SSL and External Visibility

I'm truly shocked by how difficult this information is to gather up in 1 place. Maybe because AWS has their own version of Kafka functionality. At any rate, after much reading and irritation I have it working. There is still

Drilling thru Multiple Clusters

…or Using Apache Drill to join data across discreet domains. We've been doing some work with Redshift lately. While it's an effective tool for storing and crunching thru large amounts of structured data, it's limited by a few "-ism's" that

T vs. V and W Shaped People

We talk a lot about hiring T shaped people at my current gig and I think it's a misnomer for a couple of reasons. First, it implies a ratio of depth to width that is askew. Developers and Admins in


System Administration Rules to Live By

I've had a variation of these running around for a while. Tweaks may come and go with trends, but the concepts are the same. When they say "Go Big!" they don't mean it. Start with optimistic scripts. Finish them defensively.

A wonderful, ugly script that just keeps working

Today were going to look at parts of a complex "nudge" script as I've described previously. It has a few more bells and whistles and constantly amazes me how well it adapts. I'll show the good bits in sections so

The 3 Question Test

A burger and fries costs $1.10; the burger costs $1 more than the fries. How much do the fries cost? 5 servers can sort 5 TB of data in 5 minutes; how long would 100 servers take to sort 100

Experimenting w/ Neo4j

Graph databases are a really neat concept. We've started playing with Neo here as we attempt to link customers with visits and actions based on those visits. It seems like a really good fit at first glance. Our challenge is

Just give it a nudge.

The second definition of nudge, according to Webster, is to "prod lightly: urge into action." We use that concept in our data environments for various long running processes; for things that we want to happen frequently, but with an unknown

Redshift ups and downs

AWS Redshift has been popular lately around my current gig. We've got a couple of clusters in use and a few more in POC mode. The in-use clusters are easy to justify pre-paid instances. A few thousand dollars and you

Quick Split to Fix data silliness

We have a vendor sending us daily updates on shipping info. We have a well known and defined structure for each type of data and those types map neatly to tables in our database. We have about 9 tables that

