As a professional technologist and amateur futurologist it’s always exciting to be on the cutting edge of technology. Watching driverless cars turn from a Google project into a virtual inevitability is a thrilling everyday impact. Cryptocurrencies such as Bitcoin promise to change how we use and interact with money. Across the board, technology is changing all of our lives and jobs. Sometimes this seems to be happening faster than it appears possible to keep track of, and the field of analytics is both a cause of this and deeply affected by it.
Cloud technologies are perhaps not quite as novel as they were five years ago—I’m drafting this blog post on my home computer and it’s flawlessly syncing to my OneDrive without me even thinking about it. Hadoop, too, is moving away from the labs of technology giants and further into the corporate mainstream, but it is still a challenging platform to work with that relies heavily on command-line interfaces. The analytics field has been promising to change the world for a long time, and is only now reaching the completeness of data required to achieve that goal.
HDInsight represents such an exciting intersection between the fields of cloud, big data, and analytics that is hard for any analytics professional to resist playing with.
So what is HDInsight?
Microsoft has brought their passion for technology to the Hadoop platform to make it accessible and easy to use. They started by partnering with a leading Hadoop distribution provider, HortonWorks, to put the Hadoop platform—a Linux native citizen—onto Windows Server technologies. Next they used their considerable experience in cloud services to make it available on the Windows Azure platform as an on-demand service. As an aside, if you are an infrastructure geek, some of the things they have done in their data centers to get it to perform as well—if not better—than on-site servers are mind-boggling. Finally, they took their impressive experience in delivering business intelligence capabilities and made it a key part of their Analytics Platform.
Now it may not be immediately obvious to the non-expert why this represents a significant dash of AwesomeSauce, so let me it spell out. From a PowerShell command line you can spin up a Hadoop cluster in a matter of minutes, execute Hadoop jobs against your data stored in Azure BlobStorage, and then either analyze the results on the spot using Excel or dump the results to SQL Server for later use or incorporation into something broader. Then, shut down the whole cluster again via PowerShell, ensuring you only paid for the infrastructure you needed.
In summary: big data is now an on-demand service.
How can I learn more?
As I mentioned at the start of the post, Hadoop is a challenging platform to work with, but its scope for performing amazing analytics across huge volumes of data presents too big an opportunity to let that put you off. It continually improves and the community is growing, so someone starting with Hadoop today has an easier ride than only a few years ago.
I was lucky enough to get my hands dirty with the HDInsight platform nearly a year before it was released to the public in October 2013, doing some text analytics for a customer service system. That process redefined for me the concept of a steep learning curve!
The HDInsight platform is now considerably more stable and easy to use, but many of the lessons I learned still apply. Syncfusion has graciously given me the opportunity to share those experiences by writing an e-book. To find out more and get a feel for working with HDInsight, please download the e-book HDInsight Succinctly today.
James Beresford can be found at his blog “BI Monkey” at http://www.bimonkey.com and also on twitter @BI_Monkey. The subject material in the e-book is also covered in part by his TechEd presentation, which is viewable online.