Hadoop Development Environment with Eclipse

A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development…
So I’ve enable Eclipse to run Hadoop application(Map-Reduce Job) on its standalone mode. It make debug easier so it enable to Run as Debug.

I assume to finish Eclipse setup. See here if not yet.
By the way, the eclipse-plugin of Hadoop is not used. I tried it but couldn’t run it. One of Hadoop 0.20 can’t Run on Hadoop. One of Hadoop 0.21 can’t resister Hadoop’s Name Node furthermore.

Create a project for Hadoop

In this article, Hadoop version is 0.20.2 though the last one is 0.21.0 at a point of writing.
Because 0.21 has many toubles and no document for updating API and Mahout(0.4 and 0.5-SNAPSHOT) supports Hadoop 0.20.2 only.

Now, import Hadoop as an Eclipse project by the following. Need to do manually Hence Hadoop doesn’t have any Maven project.

  • Create a new Java Project in Eclipce and name it “hadoop-0.20.2”.
  • Download an archive file of Hadoop 0.20.2 (hadoop-0.20.2.tar.gz) and import it into the above project.
    Open Archive File Import dialog(File > Import > General > Archive File from Eclipse menu) and specify hadoop-0.20.2.tar.gz as archive file. Set the above project as Into Folder.
  • Set Apache Ant library(ant.jar) into the library folder of the project.
    Download apache-ant-1.8.2-bin.zip from here and extract ant.jar from it and copy into $WORKSPACE/hadoop-0.20.2/hadoop-0.20.2/lib/ .
  • Rewrite $WORKSPACE/hadoop-0.20.2/.classpath to add source folders and necessary libraries into the project.
    <?xml version="1.0" encoding="UTF-8"?>
    <classpath>
    	<classpathentry kind="src" path="hadoop-0.20.2/src/core"/>
    	<classpathentry kind="src" path="hadoop-0.20.2/src/mapred"/>
    	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-logging-1.0.4.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/xmlenc-0.52.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-net-1.4.1.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/kfs-0.2.2.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jets3t-0.6.1.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-6.1.14.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-util-6.1.14.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-codec-1.3.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/log4j-1.2.15.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-cli-1.2.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/ant.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar"/>
    	<classpathentry kind="output" path="bin"/>
    </classpath>
    
  • Refresh the project. Right click the hadoop-0.20.2 project and select “Refresh”.

Then importing Hadoop finishes. It is already running To build Hadoop in background.

Run Word Count Sample

We shall run a sample application on Hadoop in Eclipse for confirmation.

  • Create a Java Project for a sample application and name appropriately(e.g. wordcount).
    Click Next in Create Java Project dialog (not Finish!) and set up references of projects and libraries.

    • Add the above hadoop-0.20.2 project at Projects tab.
    • Add hadoop-0.20.2/hadoop-0.20.2/lib/commons-cli-1.2.jar at Libraries tab.

    If you’ve clicked Finish, open a Java Build Path dialog from project Properties.

  • Write classes of the sample application.
    // WordCountDriver.java
    import java.io.IOException;
    import java.util.Date;
    import java.util.Formatter;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
    
    public class WordCountDriver {
    
        public static void main(String[] args) throws IOException,
                InterruptedException, ClassNotFoundException {
            Configuration conf = new Configuration();
            GenericOptionsParser parser = new GenericOptionsParser(conf, args);
            args = parser.getRemainingArgs();
    
            Job job = new Job(conf, "wordcount");
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
    
            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(TextOutputFormat.class);
    
            Formatter formatter = new Formatter();
            String outpath = "Out"
                    + formatter.format("%1$tm%1$td%1$tH%1$tM%1$tS", new Date());
            FileInputFormat.setInputPaths(job, new Path("In"));
            FileOutputFormat.setOutputPath(job, new Path(outpath));
    
            job.setMapperClass(WordCountMapper.class);
            job.setReducerClass(WordCountReducer.class);
    
            System.out.println(job.waitForCompletion(true));
        }
    }
    
    // WordCountMapper.java
    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    public class WordCountMapper extends
            Mapper<LongWritable, Text, Text, IntWritable> {
        private Text word = new Text();
        private final static IntWritable one = new IntWritable(1);
    
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }
    
    // WordCountReducer.java
    import java.io.IOException;
    
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    
    public class WordCountReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {
        protected void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
                sum += value.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }
    
  • Create $WORKSPACE/wordcount/bin/log4j.properties as the following.
    log4j.rootLogger=INFO,console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    

    This enables Hadoop logs to output Eclipse console.

  • Create a sample text data into $WORKSPACE/wordcount/In .
  • Run the application. Click Run > Run from Eclipse menu or press ctrl+F11.

It succeeds if the application outputs logs to Eclipse console as the following.

11/02/18 19:52:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/02/18 19:52:39 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/02/18 19:52:39 INFO input.FileInputFormat: Total input paths to process : 1
11/02/18 19:52:39 INFO mapred.JobClient: Running job: job_local_0001
11/02/18 19:52:39 INFO input.FileInputFormat: Total input paths to process : 1
11/02/18 19:52:39 INFO mapred.MapTask: io.sort.mb = 100
11/02/18 19:52:39 INFO mapred.MapTask: data buffer = 79691776/99614720
11/02/18 19:52:39 INFO mapred.MapTask: record buffer = 262144/327680
11/02/18 19:52:40 INFO mapred.MapTask: Starting flush of map output
11/02/18 19:52:40 INFO mapred.MapTask: Finished spill 0
11/02/18 19:52:40 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.Merger: Merging 1 sorted segments
11/02/18 19:52:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 23563 bytes
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/02/18 19:52:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to Out0218195239
11/02/18 19:52:40 INFO mapred.LocalJobRunner: reduce > reduce
11/02/18 19:52:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/02/18 19:52:40 INFO mapred.JobClient:  map 100% reduce 100%
11/02/18 19:52:40 INFO mapred.JobClient: Job complete: job_local_0001
11/02/18 19:52:40 INFO mapred.JobClient: Counters: 12
11/02/18 19:52:40 INFO mapred.JobClient:   FileSystemCounters
11/02/18 19:52:40 INFO mapred.JobClient:     FILE_BYTES_READ=73091
11/02/18 19:52:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=110186
11/02/18 19:52:40 INFO mapred.JobClient:   Map-Reduce Framework
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce input groups=956
11/02/18 19:52:40 INFO mapred.JobClient:     Combine output records=0
11/02/18 19:52:40 INFO mapred.JobClient:     Map input records=94
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce output records=956
11/02/18 19:52:40 INFO mapred.JobClient:     Spilled Records=4130
11/02/18 19:52:40 INFO mapred.JobClient:     Map output bytes=19431
11/02/18 19:52:40 INFO mapred.JobClient:     Combine input records=0
11/02/18 19:52:40 INFO mapred.JobClient:     Map output records=2065
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce input records=2065
true

The result is outputed into $WORKSPACE/wordcount/Out**********/part-r-00000 .
Click Run > Debug if you would like to run it in debug mode. You can also use step runs and breakpoints as well as other Java applications.

If you encounter an Out of Memory error when running, change suitably the heap size configuration -Xmx in eclipse.ini. (thanks to Rick).

-Xmx384m

92 thoughts on “Hadoop Development Environment with Eclipse

  1. Thanks for the intro. I encountered an Out of Memory error when running the project in Eclipse. This was resolved by editing the run configuration and adding the following JVM parameter: -Xmx128m

  2. Hi, can you explain to me why we need the step :
    “Create $WORKSPACE/wordcount/bin/log4j.properties as the following. ”
    Because it locates in /bin directory, each time I rebuild the project, this file disappear and I get error. Is there anyway to avoid such error ?
    By the way, How can I know that hadoop has a property as “key.value.separator.in.input.line” but not anything else, ex : “key.value.separator.input.line” , or “key.value.separator.in.line”…?

    1. put it in src. it will get auto copied to bin.. and wont get deleted from src either, plus you can now put it in source control too.

  3. I see.
    It is because I wanted simpler steps(corner-cutting…).
    I didn’t notice that the file is deleted when rebuild so I rearly rebuild…
    I’ll try to solve it.

    I don’t know about the latter. Sorry.

  4. Hi shuyo, i tried to build the hadoop code with eclipse, but it is showing 2 errors.

    1)Description Resource Path Location Type
    The type org.eclipse.core.runtime.IProgressMonitor cannot be resolved. It is indirectly referenced from required .class files HadoopServer.java /hadoopsimple/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/server line 1 Java Problem

    2) The project was not built since its build path is incomplete. Cannot find the class file for org.eclipse.core.runtime.IProgressMonitor. Fix the build path then try building this projecthadoopsimpleUnknownJava Problem

    please solve my problem to successfully build and run some examples. Thank you shuyo..

  5. Hello Shuyo:

    I used your code and it ran properly in Eclipse. However, how do I create a jar file to use on the Hadoop cluster? I used the Eclipse to export to jar but I am not sure my settings were correct because the job will not run. It keeps asking for the job jar file name. See error below:

    11/08/28 18:18:22 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).

    Thank you.

    1. Sorry for my late response.
      Could you try a Hadoop tutorial (without Eclipse)? The setting of Hadoop clusters might not finish.
      Or Hadoop’s versions between development and cluster might be different. Hadoop’s APIs are altered very very verrrrry often!

    2. Hi hikarimay10,
      I am having the same problem right now – have you been able to solve yours?

      Thanks in advance!

  6. I tried everything but I get error ‘Expecting a line not the end of stream’

    Here is the log:
    11/12/22 00:30:48 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    11/12/22 00:30:48 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
    11/12/22 00:30:48 INFO input.FileInputFormat: Total input paths to process : 1
    11/12/22 00:30:48 INFO mapred.JobClient: Running job: job_local_0001
    11/12/22 00:30:48 INFO input.FileInputFormat: Total input paths to process : 1
    11/12/22 00:30:48 INFO mapred.MapTask: io.sort.mb = 100
    11/12/22 00:30:48 INFO mapred.MapTask: data buffer = 79691776/99614720
    11/12/22 00:30:48 INFO mapred.MapTask: record buffer = 262144/327680
    11/12/22 00:30:49 INFO mapred.MapTask: Starting flush of map output
    11/12/22 00:30:49 WARN mapred.LocalJobRunner: job_local_0001
    java.io.IOException: Expecting a line not the end of stream
    at org.apache.hadoop.fs.DF.parseExecResult(DF.java:109)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
    at org.apache.hadoop.util.Shell.run(Shell.java:134)
    at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
    at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
    11/12/22 00:30:49 INFO mapred.JobClient: map 0% reduce 0%
    11/12/22 00:30:49 INFO mapred.JobClient: Job complete: job_local_0001
    11/12/22 00:30:49 INFO mapred.JobClient: Counters: 0
    false

      1. I didn’t try it, but thanx for reply!

        I’ve switched environment from windows to linux. And it’s work like a charm! I can test anything from any angle.
        I really prefer Linux over Windows for such system even for development purpose.

      2. I’m glad to hear your problem solved.
        Hadoop’s Windows support is not very enough, so using Linux is better if you can! 😀

      1. Even in a stand alone mode, you start the datenodes and namenodes. Also, you can view the status of namenode and datanode at 50030 and 50070 respectively. But in your example, we can’t. I dont understand how it works,

  7. when i run the main project i have this error, please if you hace a solution help me, and i have question, How to create simple Text data ?

    11/12/24 10:58:42 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

    11/12/24 10:58:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50020. Already tried 0 time(s).

    Exception in thread “main” java.net.ConnectException: Call to localhost/127.0.0.1:50020 failed on connection exception: java.net.ConnectException: Connection refused: no further information
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.(Job.java:50)
    at org.apache.hadoop.mapreduce.Job.(Job.java:54)
    at wordcount.WordCountDriver.main(WordCountDriver.java:25)
    Caused by: java.net.ConnectException: Connection refused: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$3(Client.java:288)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    … 9 more

    1. Which article did you refer?
      In 0.20.2, hadoop-site.xml has been deprecated as the message mentioned.
      Hence Hadoops quite differ with each version, you should read documents with respect to 0.20.2 or the version which you are using.

  8. Hello Shuyo,

    This is Tareq.
    I am trying to configure hadoop-0.18.0 mapReduce application with eclipse 3.3.1, but not able to create it 😦
    I am using Yahoo hadoop-0.18.0 Virtual Machine. I haven’t modify any thing in downloaded hadoop, just following instruction of bellow link,
    http://developer.yahoo.com/hadoop/tutorial/module3.html

    Can you please help me to solve this problem

    Thank you.

    1. Hi Tareq,
      Hadoops quite differ with each version, in particular, between 0.18 earlier and 0.19 later.
      And an original distribution also differ from Apache Hadoop.
      So I’m afraid I cannot solve your trouble.
      I reckon you should use 0.20 or very new 1.0 (but it may not have enough documents…).

      1. Sure Shuyo and thank you for your quick response.
        I will give a shot with 0.20.0 or 1.0.

        Actually in Eclipse I can not see hadoop.job.ugi param in hadoop-advance tab.
        I have modified hadoo-default.xml file but it unable reflect in eclipse.
        Do I need to re-compile hadoop virtual image?
        How do I compile/deploy it?

        Thank you and regards,
        -T

      2. I’ve spent day and night to work 0.18 plug-in on eclipse and I was succeeded. The problem with eclipse version that is not compatible with plug-in in windows env. And tareq says hadoop.job.ugi was not able to see. I’ve managed to see by producing error,
        I’ve configured plugin and it shows error while navigating last tree node of DFS after that you can see “hadoop.job.ugi” in edit mode of current config. It’s working only for 0.18version
        I’ve tested it on unbuntu with Eclipse Europa. But using it, was useless for me as hadoop is being changed on every version as you said so you can’t find proper docs.

        Even yahoo’s 0.20 version doesn’t work for me. The plugin came with every version doesn’t seem to be maintained.

        I’ve built my on VM image that is configured for hadoop 0.20.2 single node, eclipse helios SR2 dev env(using your tute) and mongodb.

      3. Thanks for your useful information.
        As Abhishek mentioned, Hadoops much differ for each version.
        So I’m considering to need re-build (However I have not used 0.18, then can’t know exactly).

  9. hey shuyo i wanted to ask all of you so desperately whether hadoop runs on windows 07*64 edition?
    have you done all this steps in windows 7?
    or XP?
    Its really troubling meee:-(

  10. Dear Abhishek/Shuyo,

    Thank you very much for your investigation on this.
    I have not tried yet, will try today and let you know.
    Here is my understanding on this,
    UNIX/LINUX is the best platform to explore Hadoop.
    I should use latest eclipse for DFS.
    Info: As per yahoo documentation I have worked on Eclipse 3.3.1and Hadoop 0.18.0.
    Will go for latest Hadoop and Eclipse.

    Thank you once again.

      1. thank you tareq,

        and as per your discussion i get to learn that LINUX is the best place to explore hadoop..

        so just want to ask you that if it works best with virtual machine?as per your experience?

        hope i am not disturbing u..

      2. Amit,
        So far I have explored HDFS with the same and it worked perfect for me.
        Rest I am not sure.
        For beginning of Hadoop you can refer this.
        Your not disturbing me 🙂

  11. hey shuyo i really want to know if hadoop runs on windows 07?
    have you done all d above steps for windows 07?
    its really troubling mee bro..
    HELP!!!!!

    1. My environment is Windows 2008 Server, but they are not differ much as far as I know, and Windows 7 Home below and Professional above may differ much.
      If anything, I can response nothing without your trouble’s detail…

  12. Hi,
    When I am trying to give the input file in the dfs from eclipse, it says FILENOTFOUND, even though i can see it present in HDFS. I am using 3 node cluster. Pls help

  13. Thanks a lot really, I ran it without error, but the results are all zeros. Can you figure out why that is??

  14. Hi,
    Nice tutorial for beginers.I have one problem while running the wordcount.please help in solving that one.
    WARN mapred.LocalJobRunner: job_local_0001
    java.io.IOException: Cannot run program “bash”: CreateProcess error=2, The system cannot find the file specified
    at java.lang.ProcessBuilder.start(Unknown Source)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
    at org.apache.hadoop.util.Shell.run(Shell.java:134)
    at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
    at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
    Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
    at java.lang.ProcessImpl.create(Native Method)
    at java.lang.ProcessImpl.(Unknown Source)
    at java.lang.ProcessImpl.start(Unknown Source)
    … 13 more

    Thanks in Advance.

  15. yes.I didn’t install the cygwin.Whatever you told i just followed all the steps.Please clarify me where i am doing wrong.I am eagarly waiting to run that program.

    Thanks
    Arjun

  16. OneMore update from my side,I have some chmode dlls copied there in the workplace.I am able to run the HelloWorld.Something in cygwin bash missing i guess.Please give me the advise how to resolve my issue.I tried to install the cygwin but somehow i am facing problem to get to start.

    Thanks
    Arjun

      1. I am happy with Hadoop with eclipse.I just want to resolve the issue.Tell me one thing How to install cygin step by step.And then how to link that cygwin to Hadoop Eclipse.

        Thanks
        Arjun

  17. Hi,
    Now i am getting this error

    12/07/12 00:15:56 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    Exception in thread “main” java.lang.NullPointerException
    at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:103)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:184)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.hadoop.mapred.JobClient.getFs(JobClient.java:463)
    at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:567)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
    at WordCountDriver.main(WordCountDriver.java:41)

    Thanks
    Arjun

  18. Hello,
    How do you build the Hadoop project after importing the zip file? I set the imported hadoop directory as source, but then I get package errors. For example:
    The declared package “org.apache.hadoop.ant” does not match the expected package
    “src.ant.org.apache.hadoop.ant”

    1. I have no answer because I don’t know when you get the message.
      I reckon you may try on another clean environment from the start.

      1. Hi Shuyo,
        The last step in the section “Creating the Hadoop project” is not clear for me. I am ok until changing the classpath. I did the zip file import and I can see it as a directory in the project, but it is not compiling. Now, how do I compile it? Do I make this directory a source directory? In that case I get package errors. If I move the whole directory under the default ‘src’ directory, I also get package errors.
        I appreciate your help.
        EE

  19. Hello Shuyo,
    Yes I did. My issue is how to add the imported zip file to the project and compile it. The way I added it makes the package: src.ant.org.apache.hadoop.ant, but the code is actually org.apache.hadoop.ant
    Thanks!

    1. I guess there were some mistakes in your process because the package name is obviously wrong. Why you need the source package of ant?

  20. hi i followed your steps and i configured on hadoop 1.0.1 but when i executed i am getting the following problem

    12/09/02 18:04:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    12/09/02 18:04:12 ERROR security.UserGroupInformation: PriviledgedActionException as:devikiran.setty cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-devikiran.setty\mapred\staging\devikiran.setty1483959294\.staging to 0700
    Exception in thread “main” java.io.IOException: Failed to set permissions of path: \tmp\hadoop-devikiran.setty\mapred\staging\devikiran.setty1483959294\.staging to 0700
    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    at WordCountDriver.main(WordCountDriver.java:44)

    please suggest me how to get rid of this problem…

    1. I cannot tell you how to do because I have not tried the current version of Hadoop.
      This article was written of version 0.20.2, therefore there are some differences.

  21. I have followed the above procedure in windows. But i’m getting following exceptions. Please give a solution for this:
    12/09/07 12:25:19INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    12/09/07 12:25:19WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
    12/09/07 12:25:19INFO input.FileInputFormat: Total input paths to process : 0
    12/09/07 12:25:20INFO mapred.JobClient: Running job: job_local_0001
    12/09/07 12:25:20INFO input.FileInputFormat: Total input paths to process : 0
    12/09/07 12:25:20WARN mapred.LocalJobRunner: job_local_0001
    java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.RangeCheck(Unknown Source)
    at java.util.ArrayList.get(Unknown Source)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:125)
    12/09/07 12:25:21INFO mapred.JobClient: map 0% reduce 0%
    12/09/07 12:25:21INFO mapred.JobClient: Job complete: job_local_0001
    12/09/07 12:25:21INFO mapred.JobClient: Counters: 0
    false

  22. @Abhishek: Could you please more precise. I’m dealing with the same problem but no metter how the plugin is configured the property: hadoop.job.ugi is not shown in the “Advance Tab”. Thanks in advance

  23. Thanks for the tutorial. I followed you steps but I got the following errors. Do you have an idea on how to solve the errors? Thanks

    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
    at org.apache.hadoop.conf.Configuration.(Configuration.java:139)
    at com.cloudsen.hadoop.WordCountDriver.main(WordCountDriver.java:22)
    Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    … 2 more

    1. Hi Shuyo,

      In the first paragraph you mentioned that “A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development…”
      Can you please specify the steps to recreate new jar file after modifying source code in hadoop-0.20.2.tar.gz

  24. Hi there – I just updated your tutorial for the setup notes of our university Hadoop cluster. I will send you the updated files (wordcount classes and the classpath) if you post me a mail 🙂 Please answer to this comment in case you cannot see my mail address.

  25. By the way, for those who are getting the ‘Expecting a line not the end of stream’ IOException in Eclipse on Windows, a solution to this is:

    1) Install cygwin
    2) open cygwin console
    3) Execute:
    /cygdrive/c/eclipse/eclipse.exe
    This assumes eclipse.exe is under c:\eclipse. Change as needed for your system.

    Now try running the example code. It should work.

  26. Is there any way to after modifying source of some core files we can test these changes under hadoop environment ? I mean creating core jar again placing under hadoop environment. Please let me know.

Leave a reply to mayur Cancel reply