Insights for Articles from the Hadoop Summit 2013

Hadoop Summit KeynoteI just left the Hadoop Summit 2013 so my next series of articles are going to be on some insights I learned.  For this post I’m going to just post a long list of future topics  – let me know which ones are the most interesting and I’ll prioritize:

  1. The biggest complaint around Hadoop is it is pretty immature and needs more Enterprise capabilities
  2. Only 30% of companies are doing Hadoop today
  3. Definition of big data – high volume, velocity, and variety of data
  4. Definition of Hadoop – collection of things in a framework to process across a distributed network
  5. Amazon (AWS) started 5 ½ million Hadoop clusters last year
  6. Traditional IT vs Big Data Style
  7. Tractor vendors are becoming analytics vendors
  8. It takes a community to raise an elephant
  9. Yahoo is massive in the Hadoop space (365 PB and 40k nodes) – here is what I learned from them

10. APM GPU, Atom – not ready for big data

11. Solid State Storage, In Memory, SATA, SAS – when to use which in Hadoop clusters

12. YARN – Yet Another Resource Negotiator or a necessary component for solidifying Hadoop and moving it to the next level

13. Top 10 Things to Make Your Cluster Run Better (This one is my favorite)

14. Why LinkedIn and Yahoo are going to kick butt with big data

15. How to create good performance tests for a cluster

16. Hadoop Cluster Security – authorization, authentication, encryption, etc

17. Automating Hadoop Clusters

18. Hadoop and OpenStack  and Why You need to consider using them together

19. Email Archiving with Hadoop – the perfect use case…maybe…maybe not

20. File Ingesting

21. Lustre

22. Apache Falcon and data life cycle management

23. Storm

24. In Memory DB as a component of Hadoop

25. Tez – game changer – I think so…

26. Knox Security Gateway

27. NFS into Hadoop directly

28. HDFS Snapshots

29. Why we don’t use Ambari and why we should use the metrics

30. HBASE and all of the things you need to consider before deploying (it’s different)

31. Excel Data Explorer and Geoflow and how they might displace more expensive data mining solutions

32. Hadoop Scaling to Internal Cloud

33. YARN – is this just virtualization on top of Hadoop

34. Hadoop Infrastructure rethought

35. Cluster Segmentation – when one big cluster just won’t do…

Let me know which topics are most intriguing and I’ll post those first.


This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Insights for Articles from the Hadoop Summit 2013

  1. 1. Top 10 things to make your clusters faster
    2. How to create performance tests
    3. Cluster Segmentation
    4. YARN / Tez
    5. Knox
    6. Lustre
    7. Ambari stats

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.