Since its inception, Hadoop has been a leader in accessible, open-source software, but its role in computing has since been changed by the rise of big data and the demand for systems to handle applications in real-time.
Where it all began
When Hadoop was first built , its creators sought to replicate an infrastructure introduced by Google in a paper on MapReduce. This strategy found a way to allow several computers to simultaneously work on a problem, which solved the bulk of computer server problems at the time. Before Hadoop’s scale-out storage model, computers tended to “starve” each other in operations. In completing a task, one computer had to wait for another, which was already waiting for something from a different computer, and this chain of waiting often led to lock ups and poor synchronization. Monte Zweben, the CEO of Splice Machine, told TDWI that to get all these computers to work together you “pretty much needed a PhD in distributed systems in computer science.”
Google’s paper on the MapReduce model provided a technique to solve this issue, and the days of needing a PhD to figure out how to synchronize different computers faded fast. This innovation was bolstered by Hadoop’s creators in the open-source community who wanted to make this method accessible to anyone who needed it. With Hadoop, they were able to recreate what Google had accomplished in its MapReduce computation engine and in its Bigtable database, which is a distributed storage system for structured data. This allowed programmers to take advantage of this new computing structure to solve their synchronization issues. These days, there are new issues to tackle.
A new generation
Zweben explained that now Hadoop must look to bring IT “out of the dark.” He said that while IT departments are using these platforms, they aren’t developing the programs and software. These systems serve a different purpose now, shaped by the Internet of Things. IT departments know what they need from a Hadoop server, but they don’t necessarily have the means to get there. With new demands for infrastructure come new strategies to solve them. David Richards, the CEO of WANdisco, is looking to a new generation of Hadoop to make it happen.
Now that more Hadoop users need it to support applications in real-time, Richards explained to TechRepublic it’s important to think of it as an applications platform and not just a storage system. The era of big data is pushing Hadoop away from the small lab environments it came from to large-scale deployment, and the second generation of the software will need to focus on the support of real-time applications. Richards said that because Hadoop is being used for large scale data processing, its customers have to think more critically about service-level agreements as continuous availability and recovery become increasingly important to enterprise computing. According to Richards, this new era of Hadoop is an adjustment to different priorities and the “new normal” of mission-critical cloud applications