A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development…
So I’ve enable Eclipse to run Hadoop application(Map-Reduce Job) on its standalone mode. It make debug easier so it enable to Run as Debug.
I assume to finish Eclipse setup. See here if not yet.
By the way, the eclipse-plugin of Hadoop is not used. I tried it but couldn’t run it. One of Hadoop 0.20 can’t Run on Hadoop. One of Hadoop 0.21 can’t resister Hadoop’s Name Node furthermore.
Create a project for Hadoop
In this article, Hadoop version is 0.20.2 though the last one is 0.21.0 at a point of writing.
Because 0.21 has many toubles and no document for updating API and Mahout(0.4 and 0.5-SNAPSHOT) supports Hadoop 0.20.2 only.
Now, import Hadoop as an Eclipse project by the following. Need to do manually Hence Hadoop doesn’t have any Maven project.
- Create a new Java Project in Eclipce and name it “hadoop-0.20.2”.
- Download an archive file of Hadoop 0.20.2 (hadoop-0.20.2.tar.gz) and import it into the above project.
Open Archive File Import dialog(File > Import > General > Archive File from Eclipse menu) and specify hadoop-0.20.2.tar.gz as archive file. Set the above project as Into Folder.
- Set Apache Ant library(ant.jar) into the library folder of the project.
Download apache-ant-1.8.2-bin.zip from here and extract ant.jar from it and copy into $WORKSPACE/hadoop-0.20.2/hadoop-0.20.2/lib/ . - Rewrite $WORKSPACE/hadoop-0.20.2/.classpath to add source folders and necessary libraries into the project.
<?xml version="1.0" encoding="UTF-8"?> <classpath> <classpathentry kind="src" path="hadoop-0.20.2/src/core"/> <classpathentry kind="src" path="hadoop-0.20.2/src/mapred"/> <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-logging-1.0.4.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/xmlenc-0.52.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-net-1.4.1.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/kfs-0.2.2.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/jets3t-0.6.1.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-6.1.14.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-util-6.1.14.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-codec-1.3.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/log4j-1.2.15.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-cli-1.2.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/ant.jar"/> <classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar"/> <classpathentry kind="output" path="bin"/> </classpath>
- Refresh the project. Right click the hadoop-0.20.2 project and select “Refresh”.
Then importing Hadoop finishes. It is already running To build Hadoop in background.
Run Word Count Sample
We shall run a sample application on Hadoop in Eclipse for confirmation.
- Create a Java Project for a sample application and name appropriately(e.g. wordcount).
Click Next in Create Java Project dialog (not Finish!) and set up references of projects and libraries.- Add the above hadoop-0.20.2 project at Projects tab.
- Add hadoop-0.20.2/hadoop-0.20.2/lib/commons-cli-1.2.jar at Libraries tab.
If you’ve clicked Finish, open a Java Build Path dialog from project Properties.
- Write classes of the sample application.
// WordCountDriver.java import java.io.IOException; import java.util.Date; import java.util.Formatter; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCountDriver { public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); GenericOptionsParser parser = new GenericOptionsParser(conf, args); args = parser.getRemainingArgs(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); Formatter formatter = new Formatter(); String outpath = "Out" + formatter.format("%1$tm%1$td%1$tH%1$tM%1$tS", new Date()); FileInputFormat.setInputPaths(job, new Path("In")); FileOutputFormat.setOutputPath(job, new Path(outpath)); job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); System.out.println(job.waitForCompletion(true)); } }
// WordCountMapper.java import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }
// WordCountReducer.java import java.io.IOException; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } }
- Create $WORKSPACE/wordcount/bin/log4j.properties as the following.
log4j.rootLogger=INFO,console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
This enables Hadoop logs to output Eclipse console.
- Create a sample text data into $WORKSPACE/wordcount/In .
- Run the application. Click Run > Run from Eclipse menu or press ctrl+F11.
It succeeds if the application outputs logs to Eclipse console as the following.
11/02/18 19:52:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 11/02/18 19:52:39 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 11/02/18 19:52:39 INFO input.FileInputFormat: Total input paths to process : 1 11/02/18 19:52:39 INFO mapred.JobClient: Running job: job_local_0001 11/02/18 19:52:39 INFO input.FileInputFormat: Total input paths to process : 1 11/02/18 19:52:39 INFO mapred.MapTask: io.sort.mb = 100 11/02/18 19:52:39 INFO mapred.MapTask: data buffer = 79691776/99614720 11/02/18 19:52:39 INFO mapred.MapTask: record buffer = 262144/327680 11/02/18 19:52:40 INFO mapred.MapTask: Starting flush of map output 11/02/18 19:52:40 INFO mapred.MapTask: Finished spill 0 11/02/18 19:52:40 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 11/02/18 19:52:40 INFO mapred.LocalJobRunner: 11/02/18 19:52:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done. 11/02/18 19:52:40 INFO mapred.LocalJobRunner: 11/02/18 19:52:40 INFO mapred.Merger: Merging 1 sorted segments 11/02/18 19:52:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 23563 bytes 11/02/18 19:52:40 INFO mapred.LocalJobRunner: 11/02/18 19:52:40 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 11/02/18 19:52:40 INFO mapred.LocalJobRunner: 11/02/18 19:52:40 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now 11/02/18 19:52:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to Out0218195239 11/02/18 19:52:40 INFO mapred.LocalJobRunner: reduce > reduce 11/02/18 19:52:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done. 11/02/18 19:52:40 INFO mapred.JobClient: map 100% reduce 100% 11/02/18 19:52:40 INFO mapred.JobClient: Job complete: job_local_0001 11/02/18 19:52:40 INFO mapred.JobClient: Counters: 12 11/02/18 19:52:40 INFO mapred.JobClient: FileSystemCounters 11/02/18 19:52:40 INFO mapred.JobClient: FILE_BYTES_READ=73091 11/02/18 19:52:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=110186 11/02/18 19:52:40 INFO mapred.JobClient: Map-Reduce Framework 11/02/18 19:52:40 INFO mapred.JobClient: Reduce input groups=956 11/02/18 19:52:40 INFO mapred.JobClient: Combine output records=0 11/02/18 19:52:40 INFO mapred.JobClient: Map input records=94 11/02/18 19:52:40 INFO mapred.JobClient: Reduce shuffle bytes=0 11/02/18 19:52:40 INFO mapred.JobClient: Reduce output records=956 11/02/18 19:52:40 INFO mapred.JobClient: Spilled Records=4130 11/02/18 19:52:40 INFO mapred.JobClient: Map output bytes=19431 11/02/18 19:52:40 INFO mapred.JobClient: Combine input records=0 11/02/18 19:52:40 INFO mapred.JobClient: Map output records=2065 11/02/18 19:52:40 INFO mapred.JobClient: Reduce input records=2065 true
The result is outputed into $WORKSPACE/wordcount/Out**********/part-r-00000 .
Click Run > Debug if you would like to run it in debug mode. You can also use step runs and breakpoints as well as other Java applications.
If you encounter an Out of Memory error when running, change suitably the heap size configuration -Xmx in eclipse.ini. (thanks to Rick).
-Xmx384m
Thanks for the intro. I encountered an Out of Memory error when running the project in Eclipse. This was resolved by editing the run configuration and adding the following JVM parameter: -Xmx128m
Hi Rick,
Where do we add this parameter? do we add ” -Xmx128m” or anything else ?
Thanks for your comment. I’ll add your advice!
Thanks a lot for your post. It was really helpful!
useful introduction, thanks for your post.
Hi, can you explain to me why we need the step :
“Create $WORKSPACE/wordcount/bin/log4j.properties as the following. ”
Because it locates in /bin directory, each time I rebuild the project, this file disappear and I get error. Is there anyway to avoid such error ?
By the way, How can I know that hadoop has a property as “key.value.separator.in.input.line” but not anything else, ex : “key.value.separator.input.line” , or “key.value.separator.in.line”…?
put it in src. it will get auto copied to bin.. and wont get deleted from src either, plus you can now put it in source control too.
I see.
It is because I wanted simpler steps(corner-cutting…).
I didn’t notice that the file is deleted when rebuild so I rearly rebuild…
I’ll try to solve it.
I don’t know about the latter. Sorry.
Hi, can you tell me how to set the number of reducers?
Hence Hadoop executes as Standalone mode in this environment, the number of reducers is senseless, I guess.
Hi shuyo, i tried to build the hadoop code with eclipse, but it is showing 2 errors.
1)Description Resource Path Location Type
The type org.eclipse.core.runtime.IProgressMonitor cannot be resolved. It is indirectly referenced from required .class files HadoopServer.java /hadoopsimple/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/server line 1 Java Problem
2) The project was not built since its build path is incomplete. Cannot find the class file for org.eclipse.core.runtime.IProgressMonitor. Fix the build path then try building this projecthadoopsimpleUnknownJava Problem
please solve my problem to successfully build and run some examples. Thank you shuyo..
Hey shuyo, i rectified the above problem with adding the jar downloaded from below link:
http://www.java2s.com/Code/Jar/DEF/Downloadeclipsejar.htm
I’m grad to hear you solved your problem.
Hello Shuyo:
I used your code and it ran properly in Eclipse. However, how do I create a jar file to use on the Hadoop cluster? I used the Eclipse to export to jar but I am not sure my settings were correct because the job will not run. It keeps asking for the job jar file name. See error below:
11/08/28 18:18:22 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Thank you.
Sorry for my late response.
Could you try a Hadoop tutorial (without Eclipse)? The setting of Hadoop clusters might not finish.
Or Hadoop’s versions between development and cluster might be different. Hadoop’s APIs are altered very very verrrrry often!
Hi hikarimay10,
I am having the same problem right now – have you been able to solve yours?
Thanks in advance!
For me it helped to set the Jar in the driver as follows:
job.setJarByClass(YourDriver.class); [1]
Then, after creating a .jar and running it on hadoop, all other classes are found and the warning is not shown anymore.
[1] http://stackoverflow.com/questions/5803445/hadoop-not-running-in-the-multinode-cluster
Very good introduction!
I tried everything but I get error ‘Expecting a line not the end of stream’
Here is the log:
11/12/22 00:30:48 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/12/22 00:30:48 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/12/22 00:30:48 INFO input.FileInputFormat: Total input paths to process : 1
11/12/22 00:30:48 INFO mapred.JobClient: Running job: job_local_0001
11/12/22 00:30:48 INFO input.FileInputFormat: Total input paths to process : 1
11/12/22 00:30:48 INFO mapred.MapTask: io.sort.mb = 100
11/12/22 00:30:48 INFO mapred.MapTask: data buffer = 79691776/99614720
11/12/22 00:30:48 INFO mapred.MapTask: record buffer = 262144/327680
11/12/22 00:30:49 INFO mapred.MapTask: Starting flush of map output
11/12/22 00:30:49 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Expecting a line not the end of stream
at org.apache.hadoop.fs.DF.parseExecResult(DF.java:109)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
11/12/22 00:30:49 INFO mapred.JobClient: map 0% reduce 0%
11/12/22 00:30:49 INFO mapred.JobClient: Job complete: job_local_0001
11/12/22 00:30:49 INFO mapred.JobClient: Counters: 0
false
I didn’t reproduce it, but found the same situation in Japanese blog’s article.
http://www.ne.jp/asahi/hishidama/home/tech/apache/hadoop/pseudo.html (in Japanese)
It says that it solves to add df command’s directory (i.e. c:\cygwin\bin ) to PATH.
Could you try it?
I didn’t try it, but thanx for reply!
I’ve switched environment from windows to linux. And it’s work like a charm! I can test anything from any angle.
I really prefer Linux over Windows for such system even for development purpose.
I’m glad to hear your problem solved.
Hadoop’s Windows support is not very enough, so using Linux is better if you can! 😀
When do you start Hadoop nodes in this example? The above example do not make use of Hadoop Cluster?
No, it do not. It treat standalone mode only.
Even in a stand alone mode, you start the datenodes and namenodes. Also, you can view the status of namenode and datanode at 50030 and 50070 respectively. But in your example, we can’t. I dont understand how it works,
No, invoking name and data nodes is not a standalone mode but pseudo-distribution mode.
when i run the main project i have this error, please if you hace a solution help me, and i have question, How to create simple Text data ?
11/12/24 10:58:42 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
11/12/24 10:58:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50020. Already tried 0 time(s).
Exception in thread “main” java.net.ConnectException: Call to localhost/127.0.0.1:50020 failed on connection exception: java.net.ConnectException: Connection refused: no further information
at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:410)
at org.apache.hadoop.mapreduce.Job.(Job.java:50)
at org.apache.hadoop.mapreduce.Job.(Job.java:54)
at wordcount.WordCountDriver.main(WordCountDriver.java:25)
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
at org.apache.hadoop.ipc.Client$Connection.access$3(Client.java:288)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
… 9 more
Which article did you refer?
In 0.20.2, hadoop-site.xml has been deprecated as the message mentioned.
Hence Hadoops quite differ with each version, you should read documents with respect to 0.20.2 or the version which you are using.
Hello Shuyo,
This is Tareq.
I am trying to configure hadoop-0.18.0 mapReduce application with eclipse 3.3.1, but not able to create it 😦
I am using Yahoo hadoop-0.18.0 Virtual Machine. I haven’t modify any thing in downloaded hadoop, just following instruction of bellow link,
http://developer.yahoo.com/hadoop/tutorial/module3.html
Can you please help me to solve this problem
Thank you.
Hi Tareq,
Hadoops quite differ with each version, in particular, between 0.18 earlier and 0.19 later.
And an original distribution also differ from Apache Hadoop.
So I’m afraid I cannot solve your trouble.
I reckon you should use 0.20 or very new 1.0 (but it may not have enough documents…).
Sure Shuyo and thank you for your quick response.
I will give a shot with 0.20.0 or 1.0.
Actually in Eclipse I can not see hadoop.job.ugi param in hadoop-advance tab.
I have modified hadoo-default.xml file but it unable reflect in eclipse.
Do I need to re-compile hadoop virtual image?
How do I compile/deploy it?
Thank you and regards,
-T
I’ve spent day and night to work 0.18 plug-in on eclipse and I was succeeded. The problem with eclipse version that is not compatible with plug-in in windows env. And tareq says hadoop.job.ugi was not able to see. I’ve managed to see by producing error,
I’ve configured plugin and it shows error while navigating last tree node of DFS after that you can see “hadoop.job.ugi” in edit mode of current config. It’s working only for 0.18version
I’ve tested it on unbuntu with Eclipse Europa. But using it, was useless for me as hadoop is being changed on every version as you said so you can’t find proper docs.
Even yahoo’s 0.20 version doesn’t work for me. The plugin came with every version doesn’t seem to be maintained.
I’ve built my on VM image that is configured for hadoop 0.20.2 single node, eclipse helios SR2 dev env(using your tute) and mongodb.
Thanks for your useful information.
As Abhishek mentioned, Hadoops much differ for each version.
So I’m considering to need re-build (However I have not used 0.18, then can’t know exactly).
hey shuyo i wanted to ask all of you so desperately whether hadoop runs on windows 07*64 edition?
have you done all this steps in windows 7?
or XP?
Its really troubling meee:-(
Dear Abhishek/Shuyo,
Thank you very much for your investigation on this.
I have not tried yet, will try today and let you know.
Here is my understanding on this,
UNIX/LINUX is the best platform to explore Hadoop.
I should use latest eclipse for DFS.
Info: As per yahoo documentation I have worked on Eclipse 3.3.1and Hadoop 0.18.0.
Will go for latest Hadoop and Eclipse.
Thank you once again.
hey tareq m new to hadoop….
n i really want to start with it..
please help me ..
if it get installed on my windows 07?
with good tutorial..
waiting!!!!
Amit,
You can refer bellow link to learn Hadoop.
http://developer.yahoo.com/hadoop/tutorial/
You can download hadoop+virtualmachine+tutorial.
Here yahoo guys have given everything with this bundle.
Here is a link to get the yahoo hadoop bundle:
http://ydn.zenfs.com/site/hadoop/hadooptutorial_v1.zip
Hope so this will help you to learn it on Window 7.
thank you tareq,
and as per your discussion i get to learn that LINUX is the best place to explore hadoop..
so just want to ask you that if it works best with virtual machine?as per your experience?
hope i am not disturbing u..
Amit,
So far I have explored HDFS with the same and it worked perfect for me.
Rest I am not sure.
For beginning of Hadoop you can refer this.
Your not disturbing me 🙂
hey shuyo i really want to know if hadoop runs on windows 07?
have you done all d above steps for windows 07?
its really troubling mee bro..
HELP!!!!!
hey shuyo i really want to know if hadoop runs on windows 07?
have you done all d above steps for windows 07?
its really troubling mee bro..
HELP!!!!!
My environment is Windows 2008 Server, but they are not differ much as far as I know, and Windows 7 Home below and Professional above may differ much.
If anything, I can response nothing without your trouble’s detail…
Hello,
How to install Hadoop ?
– Raghu
Raghu,
Refer bellow link to install Hadoop:
http://hadoop.apache.org/common/docs/r0.15.2/quickstart.html#Download
This article is not about Hadoop installation. You should read other tutorials.
Thanks very much!!!
I know your post is very old, but this is how I resolved the cannot “Run on Hadoop” http://prazjain.wordpress.com/2012/02/02/issues-with-hadoop-eclipse-plugin/
Not a good fix, but it works.
Can u please provide more examples to work withhhhh
will you please post for hadoop in cluster mode,I really need it friend…
My current work is without Hadoop, so I have no plan to update this article. thanks!
Hi,
When I am trying to give the input file in the dfs from eclipse, it says FILENOTFOUND, even though i can see it present in HDFS. I am using 3 node cluster. Pls help
I have no idea from your report. Sorry.
Thanks a lot really, I ran it without error, but the results are all zeros. Can you figure out why that is??
Thanks for the very nice tutorial
Hi,
Nice tutorial for beginers.I have one problem while running the wordcount.please help in solving that one.
WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Cannot run program “bash”: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.(Unknown Source)
at java.lang.ProcessImpl.start(Unknown Source)
… 13 more
Thanks in Advance.
Are you trying it on Windows?
I guess you forgot some settings about cygwin.
yes.I didn’t install the cygwin.Whatever you told i just followed all the steps.Please clarify me where i am doing wrong.I am eagarly waiting to run that program.
Thanks
Arjun
OneMore update from my side,I have some chmode dlls copied there in the workplace.I am able to run the HelloWorld.Something in cygwin bash missing i guess.Please give me the advise how to resolve my issue.I tried to install the cygwin but somehow i am facing problem to get to start.
Thanks
Arjun
You should read https://shuyo.wordpress.com/2011/02/01/mahout-development-environment-with-maven-and-eclipse-1/ and do step by step on another clean environment. Thanks.
I am happy with Hadoop with eclipse.I just want to resolve the issue.Tell me one thing How to install cygin step by step.And then how to link that cygwin to Hadoop Eclipse.
Thanks
Arjun
I can not do such a thing.
You must read Cygwin’s document by yourself.
Thanks.
Hi,
Now i am getting this error
12/07/12 00:15:56 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread “main” java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:103)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:184)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.mapred.JobClient.getFs(JobClient.java:463)
at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:567)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at WordCountDriver.main(WordCountDriver.java:41)
Thanks
Arjun
Hi,
Finally I am able to run wordcount.Thank you so much.
Thanks
Arjun
Hello,
How do you build the Hadoop project after importing the zip file? I set the imported hadoop directory as source, but then I get package errors. For example:
The declared package “org.apache.hadoop.ant” does not match the expected package
“src.ant.org.apache.hadoop.ant”
I have no answer because I don’t know when you get the message.
I reckon you may try on another clean environment from the start.
Hi Shuyo,
The last step in the section “Creating the Hadoop project” is not clear for me. I am ok until changing the classpath. I did the zip file import and I can see it as a directory in the project, but it is not compiling. Now, how do I compile it? Do I make this directory a source directory? In that case I get package errors. If I move the whole directory under the default ‘src’ directory, I also get package errors.
I appreciate your help.
EE
I’d appreciate it if you can help me with my question above.
Thanks shuyo.
Did you set Apache Ant library as the article mentioned?
Hello Shuyo,
Yes I did. My issue is how to add the imported zip file to the project and compile it. The way I added it makes the package: src.ant.org.apache.hadoop.ant, but the code is actually org.apache.hadoop.ant
Thanks!
I guess there were some mistakes in your process because the package name is obviously wrong. Why you need the source package of ant?
hi i followed your steps and i configured on hadoop 1.0.1 but when i executed i am getting the following problem
12/09/02 18:04:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
12/09/02 18:04:12 ERROR security.UserGroupInformation: PriviledgedActionException as:devikiran.setty cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-devikiran.setty\mapred\staging\devikiran.setty1483959294\.staging to 0700
Exception in thread “main” java.io.IOException: Failed to set permissions of path: \tmp\hadoop-devikiran.setty\mapred\staging\devikiran.setty1483959294\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at WordCountDriver.main(WordCountDriver.java:44)
please suggest me how to get rid of this problem…
I cannot tell you how to do because I have not tried the current version of Hadoop.
This article was written of version 0.20.2, therefore there are some differences.
I have followed the above procedure in windows. But i’m getting following exceptions. Please give a solution for this:
12/09/07 12:25:19INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/09/07 12:25:19WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/09/07 12:25:19INFO input.FileInputFormat: Total input paths to process : 0
12/09/07 12:25:20INFO mapred.JobClient: Running job: job_local_0001
12/09/07 12:25:20INFO input.FileInputFormat: Total input paths to process : 0
12/09/07 12:25:20WARN mapred.LocalJobRunner: job_local_0001
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:125)
12/09/07 12:25:21INFO mapred.JobClient: map 0% reduce 0%
12/09/07 12:25:21INFO mapred.JobClient: Job complete: job_local_0001
12/09/07 12:25:21INFO mapred.JobClient: Counters: 0
false
Also in ubuntu 10.10, The .classpath is not visible
@Abhishek: Could you please more precise. I’m dealing with the same problem but no metter how the plugin is configured the property: hadoop.job.ugi is not shown in the “Advance Tab”. Thanks in advance
Thanks for the tutorial. I followed you steps but I got the following errors. Do you have an idea on how to solve the errors? Thanks
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.hadoop.conf.Configuration.(Configuration.java:139)
at com.cloudsen.hadoop.WordCountDriver.main(WordCountDriver.java:22)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
… 2 more
Hi Shuyo,
In the first paragraph you mentioned that “A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development…”
Can you please specify the steps to recreate new jar file after modifying source code in hadoop-0.20.2.tar.gz
Hi there – I just updated your tutorial for the setup notes of our university Hadoop cluster. I will send you the updated files (wordcount classes and the classpath) if you post me a mail 🙂 Please answer to this comment in case you cannot see my mail address.
Though I don’t use Hadoop 1.0 now, I am interested in your update. So I’ll send a mail to you 🙂 Thanks!!
(I updated them for Hadoop 1.0.3.)
By the way, for those who are getting the ‘Expecting a line not the end of stream’ IOException in Eclipse on Windows, a solution to this is:
1) Install cygwin
2) open cygwin console
3) Execute:
/cygdrive/c/eclipse/eclipse.exe
This assumes eclipse.exe is under c:\eclipse. Change as needed for your system.
Now try running the example code. It should work.
Note in my post that the idea is to start eclipse from cygwin. That’s the solution.
Thank you so much ! 😀 Very useful !
follow these steps to create eclipse plugin for any hadoop versions
https://docs.google.com/document/d/1yuZ4IjlquPkmC1zXtCeL4GUNKT1uY1xnS_SCBJHps6A/edit?pli=1
Thanks…..this helped me to start with hadoop
Is there any way to after modifying source of some core files we can test these changes under hadoop environment ? I mean creating core jar again placing under hadoop environment. Please let me know.
Good afternoon to all,
Friends can anyone please help me out how to run this project i found this link and i am using Ubuntu 12.04 i have installed hadoop successfully if any one tell me how to run this project with syntax would be appreciatable 🙂 Thanks advance
Here is the link. Pls help
http://sourceforge.net/projects/hadstat/files/latest/download?source=files
hello can u give aq tutorial to install hadoop with eclips??