How to setup Hadoop on Mac OS X 10.9 Mavericks

Brief

Hadoop is an open-source Apache project that enables processing of extremely large datasets in a distributed computing environment. It can be run in three different modes:

Standalone Mode

Hadoop runs everything in a single JVM with no daemons. This mode is only suitable for testing and debugging MapReduce programs during development.

Pseudodistributed Mode

Hadoop daemons run on the local machine, simulating a small cluster.

Fully Distributed Mode

Hadoop daemons run on a cluster of machines.

This tutorial covers setting up Hadoop 1.2.1 stable in a Pseudodistributed ModeBefore getting started with the  installation and configuration of Hadoop, there are some  prerequisites.

Requirements

Java version 1.6.* or higher is required for Hadoop. Running the following command will prompt you for installation if you don’t already have Java installed:

Homebrew. Though we can go without it, Homebrew will make installing Hadoop on a Mac significantly easier:

SSH keys. First, ensure Remote Login under System Preferences -> Sharing is checked to enable SSH. If you have SSH keys already setup, ssh into your localhost machine. If you don’t, set those bad boys up:

To authorize your public key and avoid being asked for a password every time you ssh into localhost:

Now ssh into your localhost and allow authorization

Installation

This is where Homebrew saves us time. Install Hadoop:

If for some reason you dislike Homebrew or want a specific version of Hadoop, you may visit http://hadoop.apache.org/releases.html and download the release of your choice. Unpack the .tar to the location of your choice and assign ownership to the user setting up Hadoop.

Configuration

Every component of Hadoop is configured using an XML file specifically located in /usr/local/Cellar/hadoop/1.2.1/libexec/conf . MapReduce properties go in mapred-site.xml, HDFS properties in hdfs-site.xml and common properties in core-site.xml. The general Hadoop environment properties are found in hadoop-env.sh.

hadoop-env.sh

Assuming Homebrew was used to install Hadoop, add the following line in hadoop-env.sh after line

“# export HADOOP_OPTS=-server”

If Homebrew was not used, you have to add the following line as well:

core-site.xml

Note: fs.default.name value is set to localhost currently for development purposes. If you’re setting up multiple nodes on your network, you will have to set the value to hdfs://<ComputerName>.local:9000. To find out your computer name, go to System Preferences -> Sharing. For the purpose of this tutorial, we will stick to localhost to get a feel of Hadoop in Pseudodistributed mode.

hdfs-site.xml

The Hadoop Distributed File System properties go in this config file. Since we are only setting up one node, we set the value of dfs.replication to 1.

mapred-site.xml

The map-reduce config below sets the job tracker port connection port.

 Almost Ready!

We must format the newly installed HDFS before we can start running the daemons. Formatting creates an empty filesystem by creating storage directories and initial metadata.

Unleash the Daemons

Make sure you are still ssh’d into localhost. You can start HDFS by:

and start MapReduce by:

or alternatively, start all:

You now have Hadoop installed! Try running an example!

 Monitoring

You can monitor your HDFS, MapReduce and Tasks:

HDFS Administrator : http://localhost:50070

Task Tracker : http://localhost:50060

MapReduce Administrator : http://localhost:50030

 

Finally, run the following command to stop all daemons:

Source : http://shop.oreilly.com/product/9780596521981.do

shayan

13 Comments

  1. Thanks for this dude, it really helped me get set up on Mac. There are a couple of things that have changed now with hadoop 2.3.0 so I’m not fully set up yet. You should do an updated version for 2.3.0! Thanks again brother; best of luck with school

  2. Can’t believe I completed this in only about 30mins and that’s because I kept looking errors…thanks much for these instructions.

  3. Thank you for the instructions.. Made my life easy.. conf can be found in : /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop

  4. I have setup everything but I can’t seem to find correct directory to do the namenode -format, my Hadoop version is 2.4.0.. Thanks

    • Please also note that I can’t start Hadoop, it is saying that command not found.

      • I was able to start Hadoop by running this command in Hadoop-2.40 home directory: $ ./sbin/start-all.sh, although there is warning to that the command has been deprecated it works fine, you are advised to start all processes separately henceforth. Thanks

  5. This is a great help, thanks for putting this up.
    I’m at the point of formatting for HDFS, I know this might sound stupid but I should partition my harddrive and format the new drive partition. Basically I need a separately formatted drive to play around with Hadoop, correct?

  6. Great help.
    Two things:
    1. As Swetha said, conf files are now under /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop
    as of version 2.4
    2. As Ridwan said, the sh scripts are under a new folder sbin.

  7. Procedure works great!
    One more thing apart from what Abhijit and others mentioned is the location of the example files is: /usr/local/Cellar/hadoop/2.5.1/libexec/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.5.1-test-sources.jar

Leave a Reply

Your email address will not be published. Required fields are marked *