Install Cloudera CDH4 Hadoop in Microsoft Windows 8 Hyper-V

1

Microsoft is pushing it’s own HDInsight server, and it has a lot of resources behind it. With that said, Cloudera is probably one of the best known Hadoop shops out there. Cloudera’s “free” platform is where a huge number of Hadoop developers got their start. This guide will let you have a Cloudera CDH4 virtual machine in Windows 8 Hyper-V. This is certainly not something  to put into production. This is something that can be done quickly in order to start playing with Hadoop on a Windows 8 desktop. Read on to see how easy it is.

Test Configuration for Windows 8 Hyper-V

For this guide we are using the Windows 8 X79 test bed. For this, the Windows 8 iSCSI initiator is being installed in order to support Hyper-V virtual machines.

  1. CPU(s): Intel Core i7-3930K
  2. Motherboard: ASUS P9X79 WS
  3. Memory: 32GB (8x 4GB) G.Skill Ripjaws X DDR3 1600
  4. Drives: Corsair Force3 120GBOCZ Vertex 3 120GB
  5. Power Supply: Corsair AX850 850w 80 Plus Gold
Let’s see how easy it is to install CDH4 in Windows 8 Hyper-V.

How to Install the Cloudera CDH4 Hadoop platform in Microsoft Windows 8 Hyper-V

Download VMware image. It is about 1.2GB so depending on your network speed, it may be worth a few minute wait. Since we are in Windows 8, use 7Zip to unpack the tar.gz file.

Cloudera CDH4 Hadoop in Windows 8 Hyper-V Download VM
Cloudera CDH4 Hadoop in Windows 8 Hyper-V Download VM

Next, we need to convert the VMware VMDK to a Hyper-V VHD solution. I used the Starwind converter which worked well and was free with registration. First you need to select the downloaded VMDK.

Cloudera CDH4 Hadoop in Windows 8 Hyper-V VDMK to VHD Conversion
Cloudera CDH4 Hadoop in Windows 8 Hyper-V VDMK to VHD Conversion

At this point, you have a few conversion options. For Hyper-V, you will likely want either the growable or pre-allocated option.

Cloudera CDH4 Hadoop in Windows 8 Hyper-V VDMK to VHD Conversion Format
Cloudera CDH4 Hadoop in Windows 8 Hyper-V VDMK to VHD Conversion Format

After a few minutes, you should see the conversion process as being successful.

Cloudera CDH4 Hadoop in Windows 8 Hyper-V VDMK to VHD Conversion Format Success
Cloudera CDH4 Hadoop in Windows 8 Hyper-V VDMK to VHD Conversion Format Success

Next, save the VHD version of Cloudera CDH4 to the Hyper-V data store. In this case, I used an iSCSI target on the Synology DS1812+ that we have been testing.

Save Cloudera CDH4 Disk Image to Data Store
Save Cloudera CDH4 Disk Image to Data Store

Once this is completed, create a Hyper-V VM for the Cloudera CDH4 installation.

Cloudera CDH4 Hadoop in Windows 8 Hyper-V Create VM
Cloudera CDH4 Hadoop in Windows 8 Hyper-V Create VM

Much of the virtual machine creation portion is the same as the Ubuntu on Hyper-V installation. The big difference is that instead of creating a new volume and attaching the installation ISO, with this installation you just need to attach the VHD created earlier.

Cloudera CDH4 Hadoop in Windows 8 Hyper-V Connect VM to VHD
Cloudera CDH4 Hadoop in Windows 8 Hyper-V Connect VM to VHD

Once the wizard is done, you can easily fire up the virtual machine. This may take a few minutes but soon you will be greeted by the home screen, including the GUI!

Cloudera CDH4 Hadoop in Windows 8 Hyper-V Boot Screen
Cloudera CDH4 Hadoop in Windows 8 Hyper-V Boot Screen

Now there is one small catch that you will run into. Cloudera CDH4 does not have Hyper-V integration components installed. Stepping back, this makes sense.

Need Windows 8 Hyper-V Integration Components
Need Windows 8 Hyper-V Integration Components

There are a few options:

  1. Leave as-is (not so good).
  2. Use compatibility mode hardware.
  3. Do manual install on a Linux flavor with integration components installed.
  4. Install integration components yourself.

Of these, the second option is the easiest. Sure enough, I will note that this is not something I would run in a production environment. With that being said, for those curious about Hadoop, this is a great way to work locally. One other cool thing is that you can have more than one VM and potentially have a mini-virtualized Hadoop cluster to work with while in an airplane working on a Windows 8 device. Hope that helps those interested but without a dedicated test machine.

1 COMMENT

  1. Thanks for the informative article Patrick.

    I tried using StarWind V2V Converter on Cloudera QuickStart VM (https://www.cloudera.com/content/support/en/downloads.html).

    V2V converter reports the following error:
    Invalid file format (10) [0]
    Not all descriptor fields are present

    Basic internet surfing indicates that V2V converter may not be able to handle the file format.

    Did you run into something similar during VMware VMDK to Hyper-V VHD conversion for CDH4? Any other tips to address this issue?

    Thanks!
    Manoj

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.