Hadoop Development Environment with Eclipse

A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development…
So I’ve enable Eclipse to run Hadoop application(Map-Reduce Job) on its standalone mode. It make debug easier so it enable to Run as Debug.

I assume to finish Eclipse setup. See here if not yet.
By the way, the eclipse-plugin of Hadoop is not used. I tried it but couldn’t run it. One of Hadoop 0.20 can’t Run on Hadoop. One of Hadoop 0.21 can’t resister Hadoop’s Name Node furthermore.

Create a project for Hadoop

In this article, Hadoop version is 0.20.2 though the last one is 0.21.0 at a point of writing.
Because 0.21 has many toubles and no document for updating API and Mahout(0.4 and 0.5-SNAPSHOT) supports Hadoop 0.20.2 only.

Now, import Hadoop as an Eclipse project by the following. Need to do manually Hence Hadoop doesn’t have any Maven project.

  • Create a new Java Project in Eclipce and name it “hadoop-0.20.2”.
  • Download an archive file of Hadoop 0.20.2 (hadoop-0.20.2.tar.gz) and import it into the above project.
    Open Archive File Import dialog(File > Import > General > Archive File from Eclipse menu) and specify hadoop-0.20.2.tar.gz as archive file. Set the above project as Into Folder.
  • Set Apache Ant library(ant.jar) into the library folder of the project.
    Download apache-ant-1.8.2-bin.zip from here and extract ant.jar from it and copy into $WORKSPACE/hadoop-0.20.2/hadoop-0.20.2/lib/ .
  • Rewrite $WORKSPACE/hadoop-0.20.2/.classpath to add source folders and necessary libraries into the project.
    <?xml version="1.0" encoding="UTF-8"?>
    <classpath>
    	<classpathentry kind="src" path="hadoop-0.20.2/src/core"/>
    	<classpathentry kind="src" path="hadoop-0.20.2/src/mapred"/>
    	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-logging-1.0.4.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/xmlenc-0.52.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-net-1.4.1.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/kfs-0.2.2.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jets3t-0.6.1.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-6.1.14.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-util-6.1.14.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-codec-1.3.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/log4j-1.2.15.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-cli-1.2.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/ant.jar"/>
    	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar"/>
    	<classpathentry kind="output" path="bin"/>
    </classpath>
    
  • Refresh the project. Right click the hadoop-0.20.2 project and select “Refresh”.

Then importing Hadoop finishes. It is already running To build Hadoop in background.

Run Word Count Sample

We shall run a sample application on Hadoop in Eclipse for confirmation.

  • Create a Java Project for a sample application and name appropriately(e.g. wordcount).
    Click Next in Create Java Project dialog (not Finish!) and set up references of projects and libraries.

    • Add the above hadoop-0.20.2 project at Projects tab.
    • Add hadoop-0.20.2/hadoop-0.20.2/lib/commons-cli-1.2.jar at Libraries tab.

    If you’ve clicked Finish, open a Java Build Path dialog from project Properties.

  • Write classes of the sample application.
    // WordCountDriver.java
    import java.io.IOException;
    import java.util.Date;
    import java.util.Formatter;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
    
    public class WordCountDriver {
    
        public static void main(String[] args) throws IOException,
                InterruptedException, ClassNotFoundException {
            Configuration conf = new Configuration();
            GenericOptionsParser parser = new GenericOptionsParser(conf, args);
            args = parser.getRemainingArgs();
    
            Job job = new Job(conf, "wordcount");
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
    
            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(TextOutputFormat.class);
    
            Formatter formatter = new Formatter();
            String outpath = "Out"
                    + formatter.format("%1$tm%1$td%1$tH%1$tM%1$tS", new Date());
            FileInputFormat.setInputPaths(job, new Path("In"));
            FileOutputFormat.setOutputPath(job, new Path(outpath));
    
            job.setMapperClass(WordCountMapper.class);
            job.setReducerClass(WordCountReducer.class);
    
            System.out.println(job.waitForCompletion(true));
        }
    }
    
    // WordCountMapper.java
    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    public class WordCountMapper extends
            Mapper<LongWritable, Text, Text, IntWritable> {
        private Text word = new Text();
        private final static IntWritable one = new IntWritable(1);
    
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }
    
    // WordCountReducer.java
    import java.io.IOException;
    
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    
    public class WordCountReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {
        protected void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
                sum += value.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }
    
  • Create $WORKSPACE/wordcount/bin/log4j.properties as the following.
    log4j.rootLogger=INFO,console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    

    This enables Hadoop logs to output Eclipse console.

  • Create a sample text data into $WORKSPACE/wordcount/In .
  • Run the application. Click Run > Run from Eclipse menu or press ctrl+F11.

It succeeds if the application outputs logs to Eclipse console as the following.

11/02/18 19:52:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/02/18 19:52:39 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/02/18 19:52:39 INFO input.FileInputFormat: Total input paths to process : 1
11/02/18 19:52:39 INFO mapred.JobClient: Running job: job_local_0001
11/02/18 19:52:39 INFO input.FileInputFormat: Total input paths to process : 1
11/02/18 19:52:39 INFO mapred.MapTask: io.sort.mb = 100
11/02/18 19:52:39 INFO mapred.MapTask: data buffer = 79691776/99614720
11/02/18 19:52:39 INFO mapred.MapTask: record buffer = 262144/327680
11/02/18 19:52:40 INFO mapred.MapTask: Starting flush of map output
11/02/18 19:52:40 INFO mapred.MapTask: Finished spill 0
11/02/18 19:52:40 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.Merger: Merging 1 sorted segments
11/02/18 19:52:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 23563 bytes
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/02/18 19:52:40 INFO mapred.LocalJobRunner: 
11/02/18 19:52:40 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/02/18 19:52:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to Out0218195239
11/02/18 19:52:40 INFO mapred.LocalJobRunner: reduce > reduce
11/02/18 19:52:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/02/18 19:52:40 INFO mapred.JobClient:  map 100% reduce 100%
11/02/18 19:52:40 INFO mapred.JobClient: Job complete: job_local_0001
11/02/18 19:52:40 INFO mapred.JobClient: Counters: 12
11/02/18 19:52:40 INFO mapred.JobClient:   FileSystemCounters
11/02/18 19:52:40 INFO mapred.JobClient:     FILE_BYTES_READ=73091
11/02/18 19:52:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=110186
11/02/18 19:52:40 INFO mapred.JobClient:   Map-Reduce Framework
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce input groups=956
11/02/18 19:52:40 INFO mapred.JobClient:     Combine output records=0
11/02/18 19:52:40 INFO mapred.JobClient:     Map input records=94
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce output records=956
11/02/18 19:52:40 INFO mapred.JobClient:     Spilled Records=4130
11/02/18 19:52:40 INFO mapred.JobClient:     Map output bytes=19431
11/02/18 19:52:40 INFO mapred.JobClient:     Combine input records=0
11/02/18 19:52:40 INFO mapred.JobClient:     Map output records=2065
11/02/18 19:52:40 INFO mapred.JobClient:     Reduce input records=2065
true

The result is outputed into $WORKSPACE/wordcount/Out**********/part-r-00000 .
Click Run > Debug if you would like to run it in debug mode. You can also use step runs and breakpoints as well as other Java applications.

If you encounter an Out of Memory error when running, change suitably the heap size configuration -Xmx in eclipse.ini. (thanks to Rick).

-Xmx384m
Advertisements
This entry was posted in Development, Eclipse, Hadoop, Java. Bookmark the permalink.

89 Responses to Hadoop Development Environment with Eclipse

  1. Thanks for the intro. I encountered an Out of Memory error when running the project in Eclipse. This was resolved by editing the run configuration and adding the following JVM parameter: -Xmx128m

  2. shuyo says:

    Thanks for your comment. I’ll add your advice!

  3. Tolis Georgas says:

    Thanks a lot for your post. It was really helpful!

  4. nxhoaf says:

    useful introduction, thanks for your post.

  5. nxhoaf says:

    Hi, can you explain to me why we need the step :
    “Create $WORKSPACE/wordcount/bin/log4j.properties as the following. ”
    Because it locates in /bin directory, each time I rebuild the project, this file disappear and I get error. Is there anyway to avoid such error ?
    By the way, How can I know that hadoop has a property as “key.value.separator.in.input.line” but not anything else, ex : “key.value.separator.input.line” , or “key.value.separator.in.line”…?

    • Apurva Siingh says:

      put it in src. it will get auto copied to bin.. and wont get deleted from src either, plus you can now put it in source control too.

  6. shuyo says:

    I see.
    It is because I wanted simpler steps(corner-cutting…).
    I didn’t notice that the file is deleted when rebuild so I rearly rebuild…
    I’ll try to solve it.

    I don’t know about the latter. Sorry.

  7. zyh4530 says:

    Hi, can you tell me how to set the number of reducers?

  8. shuyo says:

    Hence Hadoop executes as Standalone mode in this environment, the number of reducers is senseless, I guess.

  9. vamshi says:

    Hi shuyo, i tried to build the hadoop code with eclipse, but it is showing 2 errors.

    1)Description Resource Path Location Type
    The type org.eclipse.core.runtime.IProgressMonitor cannot be resolved. It is indirectly referenced from required .class files HadoopServer.java /hadoopsimple/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/server line 1 Java Problem

    2) The project was not built since its build path is incomplete. Cannot find the class file for org.eclipse.core.runtime.IProgressMonitor. Fix the build path then try building this projecthadoopsimpleUnknownJava Problem

    please solve my problem to successfully build and run some examples. Thank you shuyo..

  10. shuyo says:

    I’m grad to hear you solved your problem.

  11. hikarimay10 says:

    Hello Shuyo:

    I used your code and it ran properly in Eclipse. However, how do I create a jar file to use on the Hadoop cluster? I used the Eclipse to export to jar but I am not sure my settings were correct because the job will not run. It keeps asking for the job jar file name. See error below:

    11/08/28 18:18:22 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).

    Thank you.

  12. S says:

    Very good introduction!

  13. Abhishek says:

    I tried everything but I get error ‘Expecting a line not the end of stream’

    Here is the log:
    11/12/22 00:30:48 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    11/12/22 00:30:48 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
    11/12/22 00:30:48 INFO input.FileInputFormat: Total input paths to process : 1
    11/12/22 00:30:48 INFO mapred.JobClient: Running job: job_local_0001
    11/12/22 00:30:48 INFO input.FileInputFormat: Total input paths to process : 1
    11/12/22 00:30:48 INFO mapred.MapTask: io.sort.mb = 100
    11/12/22 00:30:48 INFO mapred.MapTask: data buffer = 79691776/99614720
    11/12/22 00:30:48 INFO mapred.MapTask: record buffer = 262144/327680
    11/12/22 00:30:49 INFO mapred.MapTask: Starting flush of map output
    11/12/22 00:30:49 WARN mapred.LocalJobRunner: job_local_0001
    java.io.IOException: Expecting a line not the end of stream
    at org.apache.hadoop.fs.DF.parseExecResult(DF.java:109)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
    at org.apache.hadoop.util.Shell.run(Shell.java:134)
    at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
    at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
    11/12/22 00:30:49 INFO mapred.JobClient: map 0% reduce 0%
    11/12/22 00:30:49 INFO mapred.JobClient: Job complete: job_local_0001
    11/12/22 00:30:49 INFO mapred.JobClient: Counters: 0
    false

  14. Amit says:

    When do you start Hadoop nodes in this example? The above example do not make use of Hadoop Cluster?

  15. Ali says:

    when i run the main project i have this error, please if you hace a solution help me, and i have question, How to create simple Text data ?

    11/12/24 10:58:42 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

    11/12/24 10:58:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50020. Already tried 0 time(s).

    Exception in thread “main” java.net.ConnectException: Call to localhost/127.0.0.1:50020 failed on connection exception: java.net.ConnectException: Connection refused: no further information
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.(Job.java:50)
    at org.apache.hadoop.mapreduce.Job.(Job.java:54)
    at wordcount.WordCountDriver.main(WordCountDriver.java:25)
    Caused by: java.net.ConnectException: Connection refused: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$3(Client.java:288)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    … 9 more

    • shuyo says:

      Which article did you refer?
      In 0.20.2, hadoop-site.xml has been deprecated as the message mentioned.
      Hence Hadoops quite differ with each version, you should read documents with respect to 0.20.2 or the version which you are using.

  16. tareq says:

    Hello Shuyo,

    This is Tareq.
    I am trying to configure hadoop-0.18.0 mapReduce application with eclipse 3.3.1, but not able to create it 😦
    I am using Yahoo hadoop-0.18.0 Virtual Machine. I haven’t modify any thing in downloaded hadoop, just following instruction of bellow link,
    http://developer.yahoo.com/hadoop/tutorial/module3.html

    Can you please help me to solve this problem

    Thank you.

    • shuyo says:

      Hi Tareq,
      Hadoops quite differ with each version, in particular, between 0.18 earlier and 0.19 later.
      And an original distribution also differ from Apache Hadoop.
      So I’m afraid I cannot solve your trouble.
      I reckon you should use 0.20 or very new 1.0 (but it may not have enough documents…).

      • Tareq says:

        Sure Shuyo and thank you for your quick response.
        I will give a shot with 0.20.0 or 1.0.

        Actually in Eclipse I can not see hadoop.job.ugi param in hadoop-advance tab.
        I have modified hadoo-default.xml file but it unable reflect in eclipse.
        Do I need to re-compile hadoop virtual image?
        How do I compile/deploy it?

        Thank you and regards,
        -T

      • Abhishek says:

        I’ve spent day and night to work 0.18 plug-in on eclipse and I was succeeded. The problem with eclipse version that is not compatible with plug-in in windows env. And tareq says hadoop.job.ugi was not able to see. I’ve managed to see by producing error,
        I’ve configured plugin and it shows error while navigating last tree node of DFS after that you can see “hadoop.job.ugi” in edit mode of current config. It’s working only for 0.18version
        I’ve tested it on unbuntu with Eclipse Europa. But using it, was useless for me as hadoop is being changed on every version as you said so you can’t find proper docs.

        Even yahoo’s 0.20 version doesn’t work for me. The plugin came with every version doesn’t seem to be maintained.

        I’ve built my on VM image that is configured for hadoop 0.20.2 single node, eclipse helios SR2 dev env(using your tute) and mongodb.

      • shuyo says:

        Thanks for your useful information.
        As Abhishek mentioned, Hadoops much differ for each version.
        So I’m considering to need re-build (However I have not used 0.18, then can’t know exactly).

  17. tareq says:

    Dear Abhishek/Shuyo,

    Thank you very much for your investigation on this.
    I have not tried yet, will try today and let you know.
    Here is my understanding on this,
    UNIX/LINUX is the best platform to explore Hadoop.
    I should use latest eclipse for DFS.
    Info: As per yahoo documentation I have worked on Eclipse 3.3.1and Hadoop 0.18.0.
    Will go for latest Hadoop and Eclipse.

    Thank you once again.

  18. Amit Chotai says:

    hey shuyo i really want to know if hadoop runs on windows 07?
    have you done all d above steps for windows 07?
    its really troubling mee bro..
    HELP!!!!!

    • shuyo says:

      My environment is Windows 2008 Server, but they are not differ much as far as I know, and Windows 7 Home below and Professional above may differ much.
      If anything, I can response nothing without your trouble’s detail…

  19. Raghu Nallapeta says:

    Hello,

    How to install Hadoop ?

    – Raghu

  20. Rosana says:

    Thanks very much!!!

  21. prazjain says:

    I know your post is very old, but this is how I resolved the cannot “Run on Hadoop” http://prazjain.wordpress.com/2012/02/02/issues-with-hadoop-eclipse-plugin/
    Not a good fix, but it works.

  22. Balakrishna says:

    Can u please provide more examples to work withhhhh

  23. mayur says:

    will you please post for hadoop in cluster mode,I really need it friend…

  24. shuyo says:

    My current work is without Hadoop, so I have no plan to update this article. thanks!

  25. Ananth says:

    Hi,
    When I am trying to give the input file in the dfs from eclipse, it says FILENOTFOUND, even though i can see it present in HDFS. I am using 3 node cluster. Pls help

  26. eng.m says:

    Thanks a lot really, I ran it without error, but the results are all zeros. Can you figure out why that is??

  27. geeky says:

    Thanks for the very nice tutorial

  28. Arjun says:

    Hi,
    Nice tutorial for beginers.I have one problem while running the wordcount.please help in solving that one.
    WARN mapred.LocalJobRunner: job_local_0001
    java.io.IOException: Cannot run program “bash”: CreateProcess error=2, The system cannot find the file specified
    at java.lang.ProcessBuilder.start(Unknown Source)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
    at org.apache.hadoop.util.Shell.run(Shell.java:134)
    at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
    at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
    Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
    at java.lang.ProcessImpl.create(Native Method)
    at java.lang.ProcessImpl.(Unknown Source)
    at java.lang.ProcessImpl.start(Unknown Source)
    … 13 more

    Thanks in Advance.

  29. Arjun says:

    yes.I didn’t install the cygwin.Whatever you told i just followed all the steps.Please clarify me where i am doing wrong.I am eagarly waiting to run that program.

    Thanks
    Arjun

  30. Arjun says:

    OneMore update from my side,I have some chmode dlls copied there in the workplace.I am able to run the HelloWorld.Something in cygwin bash missing i guess.Please give me the advise how to resolve my issue.I tried to install the cygwin but somehow i am facing problem to get to start.

    Thanks
    Arjun

  31. Arjun says:

    Hi,
    Now i am getting this error

    12/07/12 00:15:56 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    Exception in thread “main” java.lang.NullPointerException
    at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:103)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:184)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.hadoop.mapred.JobClient.getFs(JobClient.java:463)
    at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:567)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
    at WordCountDriver.main(WordCountDriver.java:41)

    Thanks
    Arjun

  32. Arjun says:

    Hi,
    Finally I am able to run wordcount.Thank you so much.

    Thanks
    Arjun

  33. Pingback: Building Hadoop-1.0.3 | Scientific Computing in the Cloud with Apache Hadoop

  34. EE says:

    Hello,
    How do you build the Hadoop project after importing the zip file? I set the imported hadoop directory as source, but then I get package errors. For example:
    The declared package “org.apache.hadoop.ant” does not match the expected package
    “src.ant.org.apache.hadoop.ant”

    • shuyo says:

      I have no answer because I don’t know when you get the message.
      I reckon you may try on another clean environment from the start.

      • EE says:

        Hi Shuyo,
        The last step in the section “Creating the Hadoop project” is not clear for me. I am ok until changing the classpath. I did the zip file import and I can see it as a directory in the project, but it is not compiling. Now, how do I compile it? Do I make this directory a source directory? In that case I get package errors. If I move the whole directory under the default ‘src’ directory, I also get package errors.
        I appreciate your help.
        EE

      • EE says:

        I’d appreciate it if you can help me with my question above.
        Thanks shuyo.

      • shuyo says:

        Did you set Apache Ant library as the article mentioned?

  35. EE says:

    Hello Shuyo,
    Yes I did. My issue is how to add the imported zip file to the project and compile it. The way I added it makes the package: src.ant.org.apache.hadoop.ant, but the code is actually org.apache.hadoop.ant
    Thanks!

  36. DeviKiran says:

    hi i followed your steps and i configured on hadoop 1.0.1 but when i executed i am getting the following problem

    12/09/02 18:04:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    12/09/02 18:04:12 ERROR security.UserGroupInformation: PriviledgedActionException as:devikiran.setty cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-devikiran.setty\mapred\staging\devikiran.setty1483959294\.staging to 0700
    Exception in thread “main” java.io.IOException: Failed to set permissions of path: \tmp\hadoop-devikiran.setty\mapred\staging\devikiran.setty1483959294\.staging to 0700
    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    at WordCountDriver.main(WordCountDriver.java:44)

    please suggest me how to get rid of this problem…

    • shuyo says:

      I cannot tell you how to do because I have not tried the current version of Hadoop.
      This article was written of version 0.20.2, therefore there are some differences.

  37. KANNIBALA says:

    I have followed the above procedure in windows. But i’m getting following exceptions. Please give a solution for this:
    12/09/07 12:25:19INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    12/09/07 12:25:19WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
    12/09/07 12:25:19INFO input.FileInputFormat: Total input paths to process : 0
    12/09/07 12:25:20INFO mapred.JobClient: Running job: job_local_0001
    12/09/07 12:25:20INFO input.FileInputFormat: Total input paths to process : 0
    12/09/07 12:25:20WARN mapred.LocalJobRunner: job_local_0001
    java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.RangeCheck(Unknown Source)
    at java.util.ArrayList.get(Unknown Source)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:125)
    12/09/07 12:25:21INFO mapred.JobClient: map 0% reduce 0%
    12/09/07 12:25:21INFO mapred.JobClient: Job complete: job_local_0001
    12/09/07 12:25:21INFO mapred.JobClient: Counters: 0
    false

  38. KANNIBALA says:

    Also in ubuntu 10.10, The .classpath is not visible

  39. vasil says:

    @Abhishek: Could you please more precise. I’m dealing with the same problem but no metter how the plugin is configured the property: hadoop.job.ugi is not shown in the “Advance Tab”. Thanks in advance

  40. Steve says:

    Thanks for the tutorial. I followed you steps but I got the following errors. Do you have an idea on how to solve the errors? Thanks

    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
    at org.apache.hadoop.conf.Configuration.(Configuration.java:139)
    at com.cloudsen.hadoop.WordCountDriver.main(WordCountDriver.java:22)
    Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    … 2 more

    • infysam says:

      Hi Shuyo,

      In the first paragraph you mentioned that “A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development…”
      Can you please specify the steps to recreate new jar file after modifying source code in hadoop-0.20.2.tar.gz

  41. Katja says:

    Hi there – I just updated your tutorial for the setup notes of our university Hadoop cluster. I will send you the updated files (wordcount classes and the classpath) if you post me a mail 🙂 Please answer to this comment in case you cannot see my mail address.

  42. Katja says:

    (I updated them for Hadoop 1.0.3.)

  43. Pingback: Menjalankan Hadoop MapReduce Mode Pseudo-distributed dengan Linux | Wonogiri Linux Community

  44. misterblinky says:

    By the way, for those who are getting the ‘Expecting a line not the end of stream’ IOException in Eclipse on Windows, a solution to this is:

    1) Install cygwin
    2) open cygwin console
    3) Execute:
    /cygdrive/c/eclipse/eclipse.exe
    This assumes eclipse.exe is under c:\eclipse. Change as needed for your system.

    Now try running the example code. It should work.

  45. misterblinky says:

    Note in my post that the idea is to start eclipse from cygwin. That’s the solution.

  46. Pingback: Menjalankan Aplikasi Hadoop MapReduce dengan Eclipse Java SE « DokterPC14's Blog

  47. Anonym says:

    Thank you so much ! 😀 Very useful !

  48. Amith says:

    follow these steps to create eclipse plugin for any hadoop versions
    https://docs.google.com/document/d/1yuZ4IjlquPkmC1zXtCeL4GUNKT1uY1xnS_SCBJHps6A/edit?pli=1

  49. Thanks…..this helped me to start with hadoop

  50. Sandeep Patil says:

    Is there any way to after modifying source of some core files we can test these changes under hadoop environment ? I mean creating core jar again placing under hadoop environment. Please let me know.

  51. rama says:

    Good afternoon to all,
    Friends can anyone please help me out how to run this project i found this link and i am using Ubuntu 12.04 i have installed hadoop successfully if any one tell me how to run this project with syntax would be appreciatable 🙂 Thanks advance
    Here is the link. Pls help
    http://sourceforge.net/projects/hadstat/files/latest/download?source=files

  52. Pingback: Fix Jets3t-0.6.1.jar Errors - Windows XP, Vista, 7 & 8

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s