Category Archives: Java

Whether language profile should be bundled or not?

I’m going to support maven for language-detection, but I have some troubles about language profiles… language-detection have separated language profiles from jar file because they are lots of (meanwhile jar without profiles has only 100KB around, jar with ones has … Continue reading

Posted in Java, Language Detection | 4 Comments

Hadoop Development Environment with Eclipse

A usual Hadoop application needs a jar file for distributed nodes. It is very troublesome to repeat creating jar frequently in development… So I’ve enable Eclipse to run Hadoop application(Map-Reduce Job) on its standalone mode. It make debug easier so … Continue reading

Posted in Development, Eclipse, Hadoop, Java | 89 Comments

Mahout Development Environment with Maven and Eclipse (2)

Sample Codes of “Mahout in Action” The sample codes of “Mahout in Action”, which is a Mahout book from Manning, are published at here. They include source codes in Chapter 2 to 6. Now, We’ll build them on the Eclipse … Continue reading

Posted in Eclipse, Java, Machine Learning, Mahout, Maven | 27 Comments

Mahout Development Environment with Maven and Eclipse (1)

I’m reading “Mahout in Action” MEAP Edition, but it doesn’t teach how to construct a development environment of Mahout… So I wrote the way of that by testing sample codes of “Mahout in Action”. Install I examine based on Windows … Continue reading

Posted in Development, Eclipse, Java, Machine Learning, Mahout, Maven | 10 Comments

Updated LangDetect Library (4x faster)

I’ve updated LangDetect (Language Detection Library for Java). http://code.google.com/p/language-detection/ Download a package “langdetect-01-24-2011.zip” from Download List This updating has 4x faster detection based on Posted improvement code by Elmer Garduno. Very Thanks!! table: 100 times detection time for test data(ms). … Continue reading

Posted in i18n, Java, Language Detection, NLP | 4 Comments

How to develop Apache Nutch’s plugin (5) Sample Code (Language Detection Plugin)

Now, as a Nutch plugin sample code, we shall see a Language Detection plugin with our LangDetect library. In 3 extensions which Apache Nutch’s Language Identificaiton plugin has, we will replace a IndexingFilter extension only (see the previous post). So … Continue reading

Posted in Java, Nutch, Plugin, Search Engine | 1 Comment

How to develop Apache Nutch’s plugin (4) IndexingFilter extension-point

In previous post, I introduced that Nutch’s Language Identificaiton plugin has 3 extensions on HtmlParseFilter, IndexingFilter and QueryFilter. In particular, the IndexingFilter extension handles a procedure of language identification. So we’ll research the way of developing an extension plugin on … Continue reading

Posted in Java, Nutch, Search Engine | 2 Comments