241 properties have been deprecated in Hadoop 3.0. Is that enough change for you? Dozens of sites have dedicated space to discovering and explaining all the new goodness in Hadoop 3.0. I haven’t read any that discuss the problems that this massive overhaul brings to Hadoop stability. It may be that I don’t read enough?
Way back in Dec. 2017, The Apache Software Foundation announced GA of 3.0. Hortonworks waited until v 3.1 before releasing HDP 3.0 in the Summer of 2018. Now, vendors are trying to decide when the massive changes are worth implementing. Hadoop may be on 3.1, but HDP is 3.0 and I know very few enterprise class users who will be using this in production. This is part of the problem.
I’m personally excited about advancements in software and especially Hadoop. YARN now supports Docker containers, Namenode can now be split (is that different than federated) and other good things that you can read about in other blogs. I am not, however, an enterprise. I also don’t have to develop software that needs to contend with all of these changes.
I have friends that develop software for Hadoop and I do not envy them. They must now incorporate and test 241 property name changes in their code and maintain backward comparability. This does not include the myriad of changes required to resolve other deprecated systems. For example, the MapReduce engine for Hive is no longer an option. That flat out breaks a piece of software I use. I’m sure the Hadoop devs had good reason for removing this, but it does cause issues and we’ve seen this kind of thing before.
Surely Hadoop could not fall into this trap. The Python guys (gender neutral) have been battling with broken upgrade paths for years. Lots of years. I just hope that Hadoop can overcome this divide; otherwise I feel it’s market share will continue to erode and deepen the trough of dissolution.
You may ask, “What can I do to help prevent this erosive divide?” I guess the only thing you can do is push for 3.0 migrations. Hadoop definitely needs to grow or it will fade away. Application developers and integrators are going to need to bite the bullet and spend time and money on refactoring their code. The only way that will happen is by applying the pressure of customer demands. Customers need to communicate to their vendors that they have a timeline for migrating to Hadoop 3.x and they need to know their vendors will have a functional version by then or be replaced.
I’m sure your mileage will vary, but my first experience connecting software to a HDP 3.0 stack did not go well. Nothing exploded, but nothing worked either.
~~ GM