Wednesday, October 21, 2015

How to run HDP on Azure

There are few options if you are looking at Azure to host Hadoop

  1. HDInsight - which is Microsoft's flavor of Hadoop (built on top of HDP).  This does provide good separation of storage Azure Blobstore + compute on hardware.
  2. HDP on Azure is new option, where you can get real Hortonworks distribution spun up as VM's.   Each data node is serving data and is managing compute. However in this model you cannot use Azure Blob storage easily.  If you shutdown the cluster, data is gone, unless you script the storage back to Azure blob storage and have the reverse in case you want to bring data back to Hadoop.
  3. Azure is also supporting another model that is useful for screnarios like "OnPremp" Hadoop cluster for day to day operations, but backup to Azure Blob storage.  This utilizes HDFS abstraction.  The demo below will show you how to setup a feed with 2 data paths... primary and secondary.  The secondary is on Azure blobstorage.  

 

No comments:

Post a Comment