Getting started with Hadoop: My First Try

Given the growing popularity of Hadoop, I decided to give it a try by myself. As normal, I searched for a tutorial first and got one by Yahoo, which is based on Hadoop 0.18.0 virtual machine. I knew the current stable version is 1.x, but that is OK because I just wanted to get a big picture and I didn’t want to refuse the convenience of ready-to-use Hadoop virtual machine.

The tutorial is not that long so I just tried to walk through it. Because I’ve have Java and Eclipse set up, so I just downloaded the Hadoop virtual machine and ran it on VMware Player. Then I got stuck because the Eclipse plug-in required in the tutorial could not be found – I didn’t have the CD mentioned in the tutorial. It took me a while but I found a newer version of the plug-in.

Lost VMs or Containers? Too Many Consoles? Too Slow GUI? Time to learn how to "Google" and manage your VMware and clouds in a fast and secure HTML5 App.

After installing the plug-in, I could add a new Hadoop location in the Map/Reduce Locations view. The Hadoop location also showed up in the Eclipse Project Explorer view under the DFS Locations root node, but when it’s expanded I got error node “Error: null.” Later one I found out that the command line can do most works.

Then came the WordCount sample code which was the fun part for me. Before that, I copied the hadoop-0.18.0 directory under the hadoop-user home directory to the machine where my Eclipse runs. I then created a new project using the MapReduce project wizard (coming with Hadoop plug-in) and specify Hadoop library location there. The Hadoop plug-in simply adds all the required libraries (jar files) in Java build path so you don’t need to worry about them. If you don’t have Hadoop plug-in installed, you can manually add them, the most important one of which is the hadoop-0.18.0-core.jar.

After the project is created, I typed in the source code from the tutorial. Somehow it didn’t compile right away, I had to search around and found a similar code from Cloudera Hadoop tutorial.

With a few tweaks, the application compiled. The following are the three java files:

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount 

	public static void main(String[] args) throws Exception
		JobConf conf = new JobConf(WordCount.class);
		FileInputFormat.setInputPaths(conf, new Path("input"));
		FileOutputFormat.setOutputPath(conf, new Path("output"));
import java.util.StringTokenizer;

import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WordCountMapper extends MapReduceBase
	implements Mapper<LongWritable, Text, Text, IntWritable>
	private final IntWritable one = new IntWritable(1);
	private Text word = new Text();
	public void map(LongWritable key, Text value,
			OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
		String line = value.toString();
		StringTokenizer itr = new StringTokenizer(line.toLowerCase());
			output.collect(word, one);
import java.util.Iterator;

import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class WordCountReducer extends MapReduceBase 
	implements Reducer<Text, IntWritable, Text, IntWritable>
	public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
		int sum = 0;
			IntWritable value = (IntWritable);
			sum += value.get();
		output.collect(key, new IntWritable(sum));

I then jarred it up as wordcount.jar and sent it to the Hadoop virtual machine. Finally, I created a new directory in the HDFS and copied a text file so that the sample can read in it to count words.

The following are a few commands I used in the virtual machine:

hadoop-user@hadoop-desk:~ $ ./init-hdfs
hadoop-user@hadoop-desk:~ $ ./start-hadoop
hadoop-user@hadoop-desk:~ $ hadoop fs -mkdir input
hadoop-user@hadoop-desk:~ $ hadoop fs -put ../foo.txt /user/hadoop-user/input
hadoop-user@hadoop-desk:~ $ hadoop fs -ls input/
hadoop-user@hadoop-desk:~/hadoop-0.18.0$ hadoop jar wordcount.jar WordCount
hadoop-user@hadoop-desk:~ /hadoop-0.18.0$ hadoop fs -ls output/
hadoop-user@hadoop-desk:~ /hadoop-0.18.0$ hadoop fs -get output/part-00000
hadoop-user@hadoop-desk:~/hadoop-0.18.0$ hadoop fs -rmr /user/hadoop-user/output

After trying the WordCount sample and reading through two tutorials, I got a good understanding of MapReduce and Hadoop at a very high level. To get some real work done, I think I need to study more. That is why I order the book Hadoop – the definitive guide. I will write more after reading through the book in about one month. Stay tuned.

This entry was posted in Big Data, Software Development and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. beginner1010
    Posted July 3, 2012 at 4:24 pm | Permalink

    after two months, I could solve my problem. I`m happiest man in the world now !!!!!!
    I was using commons-logging-1.1.1 It doesn’t work for hadoop-0.18.0
    If you download hadoop-0.20.2 for example, use their lib, it works.

    thanks god 😀

  2. Vidya
    Posted September 24, 2012 at 6:59 pm | Permalink

    I am trying to run tutorial by Yahoo, which is based on Hadoop 0.18.0 virtual machine. I am getting error on eclipse – Call to / fail on local exception: – What might be missing in configuration on eclipse side?

  3. Shiva
    Posted April 6, 2013 at 4:59 pm | Permalink

    Hi…Thanks for the brief explanation of your experience. I face the same problem in eclipse configuration. I am getting the Error code “Error : null”. Could you please tell me how you managed to get the configuration as given in the tutorial. Waiting for your valuable feedback.

  4. Gavaskar Rathnam
    Posted June 27, 2013 at 4:45 am | Permalink


    I am also getting the same error on eclipse – Call to / fail on local exception:

    Please guide us to resolve this issue.

  5. Chirag
    Posted December 23, 2013 at 12:12 am | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


    My company has created products like vSearch ("Super vCenter"), vijavaNG APIs, EAM APIs, ICE tool. We also help clients with virtualization and cloud computing on customized development, training. Should you, or someone you know, need these products and services, please feel free to contact me: steve __AT__

    Me: Steve Jin, VMware vExpert who authored the VMware VI and vSphere SDK by Prentice Hall, and created the de factor open source vSphere Java API while working at VMware engineering. Companies like Cisco, EMC, NetApp, HP, Dell, VMware, are among the users of the API and other tools I developed for their products, internal IT orchestration, and test automation.