Hadoop Development Environment with Eclipse

March 8, 2011March 25, 2011 ~ shuyo

A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development…
So I’ve enable Eclipse to run Hadoop application(Map-Reduce Job) on its standalone mode. It make debug easier so it enable to Run as Debug.

I assume to finish Eclipse setup. See here if not yet.
By the way, the eclipse-plugin of Hadoop is not used. I tried it but couldn’t run it. One of Hadoop 0.20 can’t Run on Hadoop. One of Hadoop 0.21 can’t resister Hadoop’s Name Node furthermore.

Create a project for Hadoop

In this article, Hadoop version is 0.20.2 though the last one is 0.21.0 at a point of writing.
Because 0.21 has many toubles and no document for updating API and Mahout(0.4 and 0.5-SNAPSHOT) supports Hadoop 0.20.2 only.

Now, import Hadoop as an Eclipse project by the following. Need to do manually Hence Hadoop doesn’t have any Maven project.

Create a new Java Project in Eclipce and name it “hadoop-0.20.2”.
Download an archive file of Hadoop 0.20.2 (hadoop-0.20.2.tar.gz) and import it into the above project.
Open Archive File Import dialog(File > Import > General > Archive File from Eclipse menu) and specify hadoop-0.20.2.tar.gz as archive file. Set the above project as Into Folder.
Set Apache Ant library(ant.jar) into the library folder of the project.
Download apache-ant-1.8.2-bin.zip from here and extract ant.jar from it and copy into $WORKSPACE/hadoop-0.20.2/hadoop-0.20.2/lib/ .

Rewrite $WORKSPACE/hadoop-0.20.2/.classpath to add source folders and necessary libraries into the project.

<?xml version="1.0" encoding="UTF-8"?>
<classpath>
	<classpathentry kind="src" path="hadoop-0.20.2/src/core"/>
	<classpathentry kind="src" path="hadoop-0.20.2/src/mapred"/>
	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-logging-1.0.4.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/xmlenc-0.52.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-net-1.4.1.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/kfs-0.2.2.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jets3t-0.6.1.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-util-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-codec-1.3.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/log4j-1.2.15.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-cli-1.2.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/ant.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar"/>
	<classpathentry kind="output" path="bin"/>
</classpath>

Refresh the project. Right click the hadoop-0.20.2 project and select “Refresh”.

Then importing Hadoop finishes. It is already running To build Hadoop in background.

Run Word Count Sample

We shall run a sample application on Hadoop in Eclipse for confirmation.

Create a Java Project for a sample application and name appropriately(e.g. wordcount).
Click Next in Create Java Project dialog (not Finish!) and set up references of projects and libraries.
- Add the above hadoop-0.20.2 project at Projects tab.
- Add hadoop-0.20.2/hadoop-0.20.2/lib/commons-cli-1.2.jar at Libraries tab.
If you’ve clicked Finish, open a Java Build Path dialog from project Properties.

Write classes of the sample application.

// WordCountDriver.java
import java.io.IOException;
import java.util.Date;
import java.util.Formatter;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCountDriver {

    public static void main(String[] args) throws IOException,
            InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();
        GenericOptionsParser parser = new GenericOptionsParser(conf, args);
        args = parser.getRemainingArgs();

        Job job = new Job(conf, "wordcount");
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        Formatter formatter = new Formatter();
        String outpath = "Out"
                + formatter.format("%1$tm%1$td%1$tH%1$tM%1$tS", new Date());
        FileInputFormat.setInputPaths(job, new Path("In"));
        FileOutputFormat.setOutputPath(job, new Path(outpath));

        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);

        System.out.println(job.waitForCompletion(true));
    }
}

// WordCountMapper.java
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends
        Mapper<LongWritable, Text, Text, IntWritable> {
    private Text word = new Text();
    private final static IntWritable one = new IntWritable(1);

    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
        }
    }
}

// WordCountReducer.java
import java.io.IOException;

import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

public class WordCountReducer extends
        Reducer<Text, IntWritable, Text, IntWritable> {
    protected void reduce(Text key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable value : values) {
            sum += value.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

Create $WORKSPACE/wordcount/bin/log4j.properties as the following.

log4j.rootLogger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

This enables Hadoop logs to output Eclipse console.

Create a sample text data into $WORKSPACE/wordcount/In .
Run the application. Click Run > Run from Eclipse menu or press ctrl+F11.

It succeeds if the application outputs logs to Eclipse console as the following.

11/02/18 19:52:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/02/18 19:52:39 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/02/18 19:52:39 INFO input.FileInputFormat: Total input paths to process : 1
11/02/18 19:52:39 INFO mapred.JobClient: Running job: job_local_0001
11/02/18 19:52:39 INFO input.FileInputFormat: Total input paths to process : 1
11/02/18 19:52:39 INFO mapred.MapTask: io.sort.mb = 100
11/02/18 19:52:39 INFO mapred.MapTask: data buffer = 79691776/99614720
11/02/18 19:52:39 INFO mapred.MapTask: record buffer = 262144/327680
11/02/18 19:52:40 INFO mapred.MapTask: Starting flush of map output
11/02/18 19:52:40 INFO mapred.MapTask: Finished spill 0
11/02/18 19:52:40 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.Merger: Merging 1 sorted segments
11/02/18 19:52:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 23563 bytes
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/02/18 19:52:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to Out0218195239
11/02/18 19:52:40 INFO mapred.LocalJobRunner: reduce > reduce
11/02/18 19:52:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/02/18 19:52:40 INFO mapred.JobClient:  map 100% reduce 100%
11/02/18 19:52:40 INFO mapred.JobClient: Job complete: job_local_0001
11/02/18 19:52:40 INFO mapred.JobClient: Counters: 12
11/02/18 19:52:40 INFO mapred.JobClient:   FileSystemCounters
11/02/18 19:52:40 INFO mapred.JobClient:     FILE_BYTES_READ=73091
11/02/18 19:52:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=110186
11/02/18 19:52:40 INFO mapred.JobClient:   Map-Reduce Framework
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce input groups=956
11/02/18 19:52:40 INFO mapred.JobClient:     Combine output records=0
11/02/18 19:52:40 INFO mapred.JobClient:     Map input records=94
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce output records=956
11/02/18 19:52:40 INFO mapred.JobClient:     Spilled Records=4130
11/02/18 19:52:40 INFO mapred.JobClient:     Map output bytes=19431
11/02/18 19:52:40 INFO mapred.JobClient:     Combine input records=0
11/02/18 19:52:40 INFO mapred.JobClient:     Map output records=2065
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce input records=2065
true

The result is outputed into $WORKSPACE/wordcount/Out**********/part-r-00000 .
Click Run > Debug if you would like to run it in debug mode. You can also use step runs and breakpoints as well as other Java applications.

If you encounter an Out of Memory error when running, change suitably the heap size configuration -Xmx in eclipse.ini. (thanks to Rick).

-Xmx384m

Published by shuyo

View all posts by shuyo

92 thoughts on “Hadoop Development Environment with Eclipse”

Rick Crawford says:

March 19, 2011 at 2:03 am

Thanks for the intro. I encountered an Out of Memory error when running the project in Eclipse. This was resolved by editing the run configuration and adding the following JVM parameter: -Xmx128m

Reply
1. swapna says:
  
  October 12, 2013 at 5:53 am
  
  Hi Rick,
  
  Where do we add this parameter? do we add ” -Xmx128m” or anything else ?
  
  Reply
shuyo says:

March 25, 2011 at 12:21 am

Thanks for your comment. I’ll add your advice!

Reply
Tolis Georgas says:

March 25, 2011 at 6:58 am

Thanks a lot for your post. It was really helpful!

Reply
nxhoaf says:

April 6, 2011 at 8:32 pm

useful introduction, thanks for your post.

Reply
nxhoaf says:

April 9, 2011 at 12:27 am

Hi, can you explain to me why we need the step :
“Create $WORKSPACE/wordcount/bin/log4j.properties as the following. ”
Because it locates in /bin directory, each time I rebuild the project, this file disappear and I get error. Is there anyway to avoid such error ?
By the way, How can I know that hadoop has a property as “key.value.separator.in.input.line” but not anything else, ex : “key.value.separator.input.line” , or “key.value.separator.in.line”…?

Reply
1. Apurva Siingh says:
  
  April 22, 2012 at 1:29 am
  
  put it in src. it will get auto copied to bin.. and wont get deleted from src either, plus you can now put it in source control too.
  
  Reply
shuyo says:

April 11, 2011 at 12:16 pm

I see.
It is because I wanted simpler steps(corner-cutting…).
I didn’t notice that the file is deleted when rebuild so I rearly rebuild…
I’ll try to solve it.

I don’t know about the latter. Sorry.

Reply
zyh4530 says:

June 23, 2011 at 10:12 am

Hi, can you tell me how to set the number of reducers?

Reply
shuyo says:

June 23, 2011 at 2:10 pm

Hence Hadoop executes as Standalone mode in this environment, the number of reducers is senseless, I guess.

Reply
vamshi says:

August 24, 2011 at 3:28 pm

Hi shuyo, i tried to build the hadoop code with eclipse, but it is showing 2 errors.

1)Description Resource Path Location Type
The type org.eclipse.core.runtime.IProgressMonitor cannot be resolved. It is indirectly referenced from required .class files HadoopServer.java /hadoopsimple/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/server line 1 Java Problem

2) The project was not built since its build path is incomplete. Cannot find the class file for org.eclipse.core.runtime.IProgressMonitor. Fix the build path then try building this projecthadoopsimpleUnknownJava Problem

please solve my problem to successfully build and run some examples. Thank you shuyo..

Reply
1. vamshi says:
  
  August 24, 2011 at 6:00 pm
  
  Hey shuyo, i rectified the above problem with adding the jar downloaded from below link:
  http://www.java2s.com/Code/Jar/DEF/Downloadeclipsejar.htm
  
  Reply
shuyo says:

August 26, 2011 at 5:07 pm

I’m grad to hear you solved your problem.

Reply
hikarimay10 says:

August 29, 2011 at 2:35 am

Hello Shuyo:

I used your code and it ran properly in Eclipse. However, how do I create a jar file to use on the Hadoop cluster? I used the Eclipse to export to jar but I am not sure my settings were correct because the job will not run. It keeps asking for the job jar file name. See error below:

11/08/28 18:18:22 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).

Thank you.

Reply
1. shuyo says:
  
  September 8, 2011 at 2:12 am
  
  Sorry for my late response.
  Could you try a Hadoop tutorial (without Eclipse)? The setting of Hadoop clusters might not finish.
  Or Hadoop’s versions between development and cluster might be different. Hadoop’s APIs are altered very very verrrrry often!
  
  Reply
2. khaha says:
  
  January 24, 2012 at 6:18 pm
  
  Hi hikarimay10,
  I am having the same problem right now – have you been able to solve yours?
  
  Thanks in advance!
  
  Reply
  1. khaha says:
    
    January 24, 2012 at 6:32 pm
    
    For me it helped to set the Jar in the driver as follows:
    job.setJarByClass(YourDriver.class); [1]
    
    Then, after creating a .jar and running it on hadoop, all other classes are found and the warning is not shown anymore.
    
    [1] http://stackoverflow.com/questions/5803445/hadoop-not-running-in-the-multinode-cluster
S says:

September 10, 2011 at 7:54 am

Very good introduction!

Reply
Abhishek says:

December 22, 2011 at 4:03 am

I tried everything but I get error ‘Expecting a line not the end of stream’

Here is the log:
11/12/22 00:30:48 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/12/22 00:30:48 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/12/22 00:30:48 INFO input.FileInputFormat: Total input paths to process : 1
11/12/22 00:30:48 INFO mapred.JobClient: Running job: job_local_0001
11/12/22 00:30:48 INFO input.FileInputFormat: Total input paths to process : 1
11/12/22 00:30:48 INFO mapred.MapTask: io.sort.mb = 100
11/12/22 00:30:48 INFO mapred.MapTask: data buffer = 79691776/99614720
11/12/22 00:30:48 INFO mapred.MapTask: record buffer = 262144/327680
11/12/22 00:30:49 INFO mapred.MapTask: Starting flush of map output
11/12/22 00:30:49 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Expecting a line not the end of stream
at org.apache.hadoop.fs.DF.parseExecResult(DF.java:109)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
11/12/22 00:30:49 INFO mapred.JobClient: map 0% reduce 0%
11/12/22 00:30:49 INFO mapred.JobClient: Job complete: job_local_0001
11/12/22 00:30:49 INFO mapred.JobClient: Counters: 0
false

Reply
1. shuyo says:
  
  December 28, 2011 at 6:07 pm
  
  I didn’t reproduce it, but found the same situation in Japanese blog’s article.
  
  http://www.ne.jp/asahi/hishidama/home/tech/apache/hadoop/pseudo.html (in Japanese)
  
  It says that it solves to add df command’s directory (i.e. c:\cygwin\bin ) to PATH.
  Could you try it?
  
  Reply
  1. Abhishek says:
    
    December 30, 2011 at 4:09 pm
    
    I didn’t try it, but thanx for reply!
    
    I’ve switched environment from windows to linux. And it’s work like a charm! I can test anything from any angle.
    I really prefer Linux over Windows for such system even for development purpose.
  2. shuyo says:
    
    December 31, 2011 at 4:17 pm
    
    I’m glad to hear your problem solved.
    Hadoop’s Windows support is not very enough, so using Linux is better if you can! 😀
Amit says:

December 23, 2011 at 12:23 am

When do you start Hadoop nodes in this example? The above example do not make use of Hadoop Cluster?

Reply
1. shuyo says:
  
  December 28, 2011 at 6:09 pm
  
  No, it do not. It treat standalone mode only.
  
  Reply
  1. Amit says:
    
    December 30, 2011 at 7:31 pm
    
    Even in a stand alone mode, you start the datenodes and namenodes. Also, you can view the status of namenode and datanode at 50030 and 50070 respectively. But in your example, we can’t. I dont understand how it works,
  2. shuyo says:
    
    December 31, 2011 at 4:22 pm
    
    No, invoking name and data nodes is not a standalone mode but pseudo-distribution mode.
Ali says:

December 24, 2011 at 7:04 pm

when i run the main project i have this error, please if you hace a solution help me, and i have question, How to create simple Text data ?

11/12/24 10:58:42 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

11/12/24 10:58:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50020. Already tried 0 time(s).

Exception in thread “main” java.net.ConnectException: Call to localhost/127.0.0.1:50020 failed on connection exception: java.net.ConnectException: Connection refused: no further information
at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:410)
at org.apache.hadoop.mapreduce.Job.(Job.java:50)
at org.apache.hadoop.mapreduce.Job.(Job.java:54)
at wordcount.WordCountDriver.main(WordCountDriver.java:25)
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
at org.apache.hadoop.ipc.Client$Connection.access$3(Client.java:288)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
… 9 more

Reply
1. shuyo says:
  
  December 28, 2011 at 6:14 pm
  
  Which article did you refer?
  In 0.20.2, hadoop-site.xml has been deprecated as the message mentioned.
  Hence Hadoops quite differ with each version, you should read documents with respect to 0.20.2 or the version which you are using.
  
  Reply
tareq says:

December 28, 2011 at 3:10 pm

Hello Shuyo,

This is Tareq.
I am trying to configure hadoop-0.18.0 mapReduce application with eclipse 3.3.1, but not able to create it 😦
I am using Yahoo hadoop-0.18.0 Virtual Machine. I haven’t modify any thing in downloaded hadoop, just following instruction of bellow link,
http://developer.yahoo.com/hadoop/tutorial/module3.html

Can you please help me to solve this problem

Thank you.

Reply
1. shuyo says:
  
  December 28, 2011 at 6:21 pm
  
  Hi Tareq,
  Hadoops quite differ with each version, in particular, between 0.18 earlier and 0.19 later.
  And an original distribution also differ from Apache Hadoop.
  So I’m afraid I cannot solve your trouble.
  I reckon you should use 0.20 or very new 1.0 (but it may not have enough documents…).
  
  Reply
  1. Tareq says:
    
    December 28, 2011 at 6:28 pm
    
    Sure Shuyo and thank you for your quick response.
    I will give a shot with 0.20.0 or 1.0.
    
    Actually in Eclipse I can not see hadoop.job.ugi param in hadoop-advance tab.
    I have modified hadoo-default.xml file but it unable reflect in eclipse.
    Do I need to re-compile hadoop virtual image?
    How do I compile/deploy it?
    
    Thank you and regards,
    -T
  2. Abhishek says:
    
    December 30, 2011 at 4:22 pm
    
    I’ve spent day and night to work 0.18 plug-in on eclipse and I was succeeded. The problem with eclipse version that is not compatible with plug-in in windows env. And tareq says hadoop.job.ugi was not able to see. I’ve managed to see by producing error,
    I’ve configured plugin and it shows error while navigating last tree node of DFS after that you can see “hadoop.job.ugi” in edit mode of current config. It’s working only for 0.18version
    I’ve tested it on unbuntu with Eclipse Europa. But using it, was useless for me as hadoop is being changed on every version as you said so you can’t find proper docs.
    
    Even yahoo’s 0.20 version doesn’t work for me. The plugin came with every version doesn’t seem to be maintained.
    
    I’ve built my on VM image that is configured for hadoop 0.20.2 single node, eclipse helios SR2 dev env(using your tute) and mongodb.
  3. shuyo says:
    
    December 31, 2011 at 4:35 pm
    
    Thanks for your useful information.
    As Abhishek mentioned, Hadoops much differ for each version.
    So I’m considering to need re-build (However I have not used 0.18, then can’t know exactly).
parth says:

December 30, 2011 at 5:11 am

hey shuyo i wanted to ask all of you so desperately whether hadoop runs on windows 07*64 edition?
have you done all this steps in windows 7?
or XP?
Its really troubling meee:-(

Reply
tareq says:

January 2, 2012 at 5:05 pm

Dear Abhishek/Shuyo,

Thank you very much for your investigation on this.
I have not tried yet, will try today and let you know.
Here is my understanding on this,
UNIX/LINUX is the best platform to explore Hadoop.
I should use latest eclipse for DFS.
Info: As per yahoo documentation I have worked on Eclipse 3.3.1and Hadoop 0.18.0.
Will go for latest Hadoop and Eclipse.

Thank you once again.

Reply
1. Amit Chotai says:
  
  January 3, 2012 at 6:55 pm
  
  hey tareq m new to hadoop….
  n i really want to start with it..
  please help me ..
  if it get installed on my windows 07?
  with good tutorial..
  waiting!!!!
  
  Reply
  1. tareq says:
    
    January 4, 2012 at 4:32 pm
    
    Amit,
    You can refer bellow link to learn Hadoop.
    http://developer.yahoo.com/hadoop/tutorial/
    You can download hadoop+virtualmachine+tutorial.
    Here yahoo guys have given everything with this bundle.
    Here is a link to get the yahoo hadoop bundle:
    http://ydn.zenfs.com/site/hadoop/hadooptutorial_v1.zip
    
    Hope so this will help you to learn it on Window 7.
  2. Amit Chotai says:
    
    January 5, 2012 at 8:14 pm
    
    thank you tareq,
    
    and as per your discussion i get to learn that LINUX is the best place to explore hadoop..
    
    so just want to ask you that if it works best with virtual machine?as per your experience?
    
    hope i am not disturbing u..
  3. tareq says:
    
    January 5, 2012 at 11:16 pm
    
    Amit,
    So far I have explored HDFS with the same and it worked perfect for me.
    Rest I am not sure.
    For beginning of Hadoop you can refer this.
    Your not disturbing me 🙂
Parth says:

January 2, 2012 at 6:36 pm

hey shuyo i really want to know if hadoop runs on windows 07?
have you done all d above steps for windows 07?
its really troubling mee bro..
HELP!!!!!

Reply
Amit Chotai says:

January 2, 2012 at 6:40 pm

hey shuyo i really want to know if hadoop runs on windows 07?
have you done all d above steps for windows 07?
its really troubling mee bro..
HELP!!!!!

Reply
1. shuyo says:
  
  January 4, 2012 at 10:57 am
  
  My environment is Windows 2008 Server, but they are not differ much as far as I know, and Windows 7 Home below and Professional above may differ much.
  If anything, I can response nothing without your trouble’s detail…
  
  Reply
Raghu Nallapeta says:

January 4, 2012 at 9:56 pm

Hello,

How to install Hadoop ?

– Raghu

Reply
1. tareq says:
  
  January 6, 2012 at 4:35 pm
  
  Raghu,
  Refer bellow link to install Hadoop:
  http://hadoop.apache.org/common/docs/r0.15.2/quickstart.html#Download
  
  Reply
2. shuyo says:
  
  January 6, 2012 at 6:45 pm
  
  This article is not about Hadoop installation. You should read other tutorials.
  
  Reply
Rosana says:

January 26, 2012 at 6:10 am

Thanks very much!!!

Reply
prazjain says:

February 4, 2012 at 7:25 pm

I know your post is very old, but this is how I resolved the cannot “Run on Hadoop” http://prazjain.wordpress.com/2012/02/02/issues-with-hadoop-eclipse-plugin/
Not a good fix, but it works.

Reply
Balakrishna says:

February 23, 2012 at 9:57 pm

Can u please provide more examples to work withhhhh

Reply
mayur says:

February 25, 2012 at 2:42 am

will you please post for hadoop in cluster mode,I really need it friend…

Reply
shuyo says:

February 29, 2012 at 4:50 pm

My current work is without Hadoop, so I have no plan to update this article. thanks!

Reply
Ananth says:

April 16, 2012 at 5:21 pm

Hi,
When I am trying to give the input file in the dfs from eclipse, it says FILENOTFOUND, even though i can see it present in HDFS. I am using 3 node cluster. Pls help

Reply
1. shuyo says:
  
  April 16, 2012 at 5:36 pm
  
  I have no idea from your report. Sorry.
  
  Reply
eng.m says:

May 7, 2012 at 3:26 pm

Thanks a lot really, I ran it without error, but the results are all zeros. Can you figure out why that is??

Reply
geeky says:

May 28, 2012 at 4:44 pm

Thanks for the very nice tutorial

Reply
Arjun says:

July 5, 2012 at 12:44 pm

Hi,
Nice tutorial for beginers.I have one problem while running the wordcount.please help in solving that one.
WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Cannot run program “bash”: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.(Unknown Source)
at java.lang.ProcessImpl.start(Unknown Source)
… 13 more

Thanks in Advance.

Reply
1. shuyo says:
  
  July 6, 2012 at 5:58 pm
  
  Are you trying it on Windows?
  I guess you forgot some settings about cygwin.
  
  Reply
Arjun says:

July 9, 2012 at 10:37 pm

yes.I didn’t install the cygwin.Whatever you told i just followed all the steps.Please clarify me where i am doing wrong.I am eagarly waiting to run that program.

Thanks
Arjun

Reply
Arjun says:

July 9, 2012 at 10:52 pm

OneMore update from my side,I have some chmode dlls copied there in the workplace.I am able to run the HelloWorld.Something in cygwin bash missing i guess.Please give me the advise how to resolve my issue.I tried to install the cygwin but somehow i am facing problem to get to start.

Thanks
Arjun

Reply
1. shuyo says:
  
  July 10, 2012 at 12:28 pm
  
  You should read https://shuyo.wordpress.com/2011/02/01/mahout-development-environment-with-maven-and-eclipse-1/ and do step by step on another clean environment. Thanks.
  
  Reply
  1. Arjun says:
    
    July 10, 2012 at 10:28 pm
    
    I am happy with Hadoop with eclipse.I just want to resolve the issue.Tell me one thing How to install cygin step by step.And then how to link that cygwin to Hadoop Eclipse.
    
    Thanks
    Arjun
  2. shuyo says:
    
    July 11, 2012 at 10:58 am
    
    I can not do such a thing.
    You must read Cygwin’s document by yourself.
    Thanks.
Arjun says:

July 12, 2012 at 1:17 pm

Hi,
Now i am getting this error

12/07/12 00:15:56 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread “main” java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:103)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:184)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.mapred.JobClient.getFs(JobClient.java:463)
at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:567)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at WordCountDriver.main(WordCountDriver.java:41)

Thanks
Arjun

Reply
Arjun says:

July 12, 2012 at 10:34 pm

Hi,
Finally I am able to run wordcount.Thank you so much.

Thanks
Arjun

Reply
Pingback: Building Hadoop-1.0.3 | Scientific Computing in the Cloud with Apache Hadoop
EE says:

August 3, 2012 at 5:47 am

Hello,
How do you build the Hadoop project after importing the zip file? I set the imported hadoop directory as source, but then I get package errors. For example:
The declared package “org.apache.hadoop.ant” does not match the expected package
“src.ant.org.apache.hadoop.ant”

Reply
1. shuyo says:
  
  August 3, 2012 at 1:03 pm
  
  I have no answer because I don’t know when you get the message.
  I reckon you may try on another clean environment from the start.
  
  Reply
  1. EE says:
    
    August 3, 2012 at 10:08 pm
    
    Hi Shuyo,
    The last step in the section “Creating the Hadoop project” is not clear for me. I am ok until changing the classpath. I did the zip file import and I can see it as a directory in the project, but it is not compiling. Now, how do I compile it? Do I make this directory a source directory? In that case I get package errors. If I move the whole directory under the default ‘src’ directory, I also get package errors.
    I appreciate your help.
    EE
  2. EE says:
    
    August 11, 2012 at 5:49 am
    
    I’d appreciate it if you can help me with my question above.
    Thanks shuyo.
  3. shuyo says:
    
    August 13, 2012 at 5:38 pm
    
    Did you set Apache Ant library as the article mentioned?
EE says:

August 14, 2012 at 2:00 am

Hello Shuyo,
Yes I did. My issue is how to add the imported zip file to the project and compile it. The way I added it makes the package: src.ant.org.apache.hadoop.ant, but the code is actually org.apache.hadoop.ant
Thanks!

Reply
1. shuyo says:
  
  August 20, 2012 at 5:09 pm
  
  I guess there were some mistakes in your process because the package name is obviously wrong. Why you need the source package of ant?
  
  Reply
DeviKiran says:

September 2, 2012 at 9:44 pm

hi i followed your steps and i configured on hadoop 1.0.1 but when i executed i am getting the following problem

12/09/02 18:04:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
12/09/02 18:04:12 ERROR security.UserGroupInformation: PriviledgedActionException as:devikiran.setty cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-devikiran.setty\mapred\staging\devikiran.setty1483959294\.staging to 0700
Exception in thread “main” java.io.IOException: Failed to set permissions of path: \tmp\hadoop-devikiran.setty\mapred\staging\devikiran.setty1483959294\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at WordCountDriver.main(WordCountDriver.java:44)

please suggest me how to get rid of this problem…

Reply
1. shuyo says:
  
  September 10, 2012 at 4:30 pm
  
  I cannot tell you how to do because I have not tried the current version of Hadoop.
  This article was written of version 0.20.2, therefore there are some differences.
  
  Reply
KANNIBALA says:

September 7, 2012 at 3:48 pm

I have followed the above procedure in windows. But i’m getting following exceptions. Please give a solution for this:
12/09/07 12:25:19INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/09/07 12:25:19WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/09/07 12:25:19INFO input.FileInputFormat: Total input paths to process : 0
12/09/07 12:25:20INFO mapred.JobClient: Running job: job_local_0001
12/09/07 12:25:20INFO input.FileInputFormat: Total input paths to process : 0
12/09/07 12:25:20WARN mapred.LocalJobRunner: job_local_0001
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:125)
12/09/07 12:25:21INFO mapred.JobClient: map 0% reduce 0%
12/09/07 12:25:21INFO mapred.JobClient: Job complete: job_local_0001
12/09/07 12:25:21INFO mapred.JobClient: Counters: 0
false

Reply
KANNIBALA says:

September 7, 2012 at 3:55 pm

Also in ubuntu 10.10, The .classpath is not visible

Reply
vasil says:

September 10, 2012 at 4:25 am

@Abhishek: Could you please more precise. I’m dealing with the same problem but no metter how the plugin is configured the property: hadoop.job.ugi is not shown in the “Advance Tab”. Thanks in advance

Reply
Steve says:

October 4, 2012 at 5:36 am

Thanks for the tutorial. I followed you steps but I got the following errors. Do you have an idea on how to solve the errors? Thanks

Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.hadoop.conf.Configuration.(Configuration.java:139)
at com.cloudsen.hadoop.WordCountDriver.main(WordCountDriver.java:22)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
… 2 more

Reply
1. infysam says:
  
  October 5, 2012 at 6:58 pm
  
  Hi Shuyo,
  
  In the first paragraph you mentioned that “A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development…”
  Can you please specify the steps to recreate new jar file after modifying source code in hadoop-0.20.2.tar.gz
  
  Reply
Katja says:

November 24, 2012 at 3:02 am

Hi there – I just updated your tutorial for the setup notes of our university Hadoop cluster. I will send you the updated files (wordcount classes and the classpath) if you post me a mail 🙂 Please answer to this comment in case you cannot see my mail address.

Reply
1. shuyo says:
  
  November 27, 2012 at 12:01 pm
  
  Though I don’t use Hadoop 1.0 now, I am interested in your update. So I’ll send a mail to you 🙂 Thanks!!
  
  Reply
Katja says:

November 24, 2012 at 3:06 am

(I updated them for Hadoop 1.0.3.)

Reply
Pingback: Menjalankan Hadoop MapReduce Mode Pseudo-distributed dengan Linux | Wonogiri Linux Community
misterblinky says:

January 30, 2013 at 5:55 am

By the way, for those who are getting the ‘Expecting a line not the end of stream’ IOException in Eclipse on Windows, a solution to this is:

1) Install cygwin
2) open cygwin console
3) Execute:
/cygdrive/c/eclipse/eclipse.exe
This assumes eclipse.exe is under c:\eclipse. Change as needed for your system.

Now try running the example code. It should work.

Reply
misterblinky says:

January 30, 2013 at 5:56 am

Note in my post that the idea is to start eclipse from cygwin. That’s the solution.

Reply
Pingback: Menjalankan Aplikasi Hadoop MapReduce dengan Eclipse Java SE « DokterPC14's Blog
Anonym says:

May 11, 2013 at 2:10 am

Thank you so much ! 😀 Very useful !

Reply
Amith says:

May 31, 2013 at 2:08 pm

follow these steps to create eclipse plugin for any hadoop versions
https://docs.google.com/document/d/1yuZ4IjlquPkmC1zXtCeL4GUNKT1uY1xnS_SCBJHps6A/edit?pli=1

Reply
Lalit Kumar Jha (@pklalitjha) says:

June 11, 2013 at 3:20 am

Thanks…..this helped me to start with hadoop

Reply
Sandeep Patil says:

December 10, 2013 at 9:14 pm

Is there any way to after modifying source of some core files we can test these changes under hadoop environment ? I mean creating core jar again placing under hadoop environment. Please let me know.

Reply
rama says:

May 1, 2014 at 5:07 pm

Good afternoon to all,
Friends can anyone please help me out how to run this project i found this link and i am using Ubuntu 12.04 i have installed hadoop successfully if any one tell me how to run this project with syntax would be appreciatable 🙂 Thanks advance
Here is the link. Pls help
http://sourceforge.net/projects/hadstat/files/latest/download?source=files

Reply
Pingback: Fix Jets3t-0.6.1.jar Errors - Windows XP, Vista, 7 & 8
recarda says:

June 28, 2019 at 1:09 pm

hello can u give aq tutorial to install hadoop with eclips??

Reply