Enterprise Hadoop must cooperate with many other forms of data transmission and ingestion. Any form of MFT, Mqueue or file landing zone requires disk space. Not HDFS disk, just disk that we can mount, MFT, SFTP, etc. to until we actually ingest the data into Hadoop. (where life if beautiful all the time.)
Traditional “Enterprise” disk space is provided by SAN or NAS mounts. There are reasons for this: snapshots, flashcopies, highly available nodes, re-redundant disks and de-duplication oh my! There are many valid reasons for using these technologies. Most – if not all – of those reasons do not apply to Hadoop landing zones.
Enter GlusterFS; a striped, redundant, multiple access point solution. My SPOF Hadoop v. 1.x NameNode can write to a GlusterFS mount, I can boot my DataNodes to a GlusterFS mount that has a backup server baked right into the mount command. I can point MFT, SFTP, Mqueue, etc. to a mount that has redundancy baked right in. This is sounding redundant.
My point is that GlusterFS meets the multi-node, replicated storage requirements enterprises demand, but using Local SATA disk at a ~35 times less than SAN cost. That SWAG is based on our internal cost of SAN @ $7.50/GB vs. $0.22/GB.
Good, Fast & Cheap — It’s a brave new world.