Tag Archives: hadoop

MapR is a better base?

I’ve heard about MapR for a long time and haven’t given it much consideration vs. OSS stacks. I reconsidering my position and conduction some evaluations. Why? MaprFS is a real POSIX File system that runs on Raw devices, not atop … Continue reading

Posted in Administration, Market Segment/Growth, Opinions | Tagged , , , , | Leave a comment

hello woRld!

R is the latest Hadoop darling. It is an open source language that “is widely used among statisticians and data miners for developing statistical software and data analysis. Polls and surveys of data miners are showing R’s popularity has increased … Continue reading

Posted in Administration, Deployment, Development, Tuning | Tagged , , , | Leave a comment

Weaponizing Hadoop

We are usually left to bash for scripting Hadoop functions. It’s the default in Linux and it’s usually good enough. There are enough “bash-isms” that will cause your Java/pig/database people serious heart ache. If you’re new to Hadoop, go ahead … Continue reading

Posted in Administration, Deployment, Development, syndicated | Tagged , , , | Leave a comment

Hadoop Hindsight #2 Keep it simple: more than likely someone else has encountered your problem.

An adventure is only an inconvenience rightly considered. An inconvenience is an adventure wrongly considered. -G.K. Chesterton Sometimes our ego gets the best of us.  This seems to occur more often in Hadoop than anywhere else I’ve worked.  I’m not … Continue reading

Posted in Development, Hindsight, Opinions | Tagged , | Leave a comment

Building Clusters without installing an Operating System

I’m about to tell you how we build Hadoop clusters without install an OS on the DataNodes.  We simply PXE boot them, assign them to a cluster and they join automatically.  Read on (and on) to see how we do … Continue reading

Posted in Administration, Deployment | Tagged , , , , , | Leave a comment

Cloudera and VMware Work Together to Accelerate Enterprise Use of Hadoop in Virtual and Cloud Environments

Thought this might be interesting to our readers. It doesn’t apply to our implementation, but it might be useful to someone.

Posted in Uncategorized | Tagged , , , | Leave a comment

Hadoop Security is a Like Securing a Thousand Doors

As we look to bring private data into Hadoop I find myself imagining the management of thousands of separate doors for individual data elements.  With regulated data this means we’ll need someone going around continuously checking doors to make sure … Continue reading

Posted in Security | Tagged , , | 1 Comment

I Don’t Want the Chains of Vendor Dependence

I sat in a meeting this week and discussed how a certain company could speed up our effectiveness, efficiency and seamlessly integrate with our Hadoop cluster.  As I sat through that meeting the further along I went and the more … Continue reading

Posted in Uncategorized | Tagged , , | 1 Comment

What is Hadoop anyway?

Just to get this out of the way, Hadoop is more of a grid computing ecosystem than a platform. HDFS is the underlying Cluster Filesystem. It has some unique attributes that allow grid processes to know where data resides within … Continue reading

Posted in Uncategorized | Tagged , | Leave a comment