Category Archives: i18n

Updated LangDetect Library (4x faster)

I’ve updated LangDetect (Language Detection Library for Java). http://code.google.com/p/language-detection/ Download a package “langdetect-01-24-2011.zip” from Download List This updating has 4x faster detection based on Posted improvement code by Elmer Garduno. Very Thanks!! table: 100 times detection time for test data(ms). … Continue reading

Posted in i18n, Java, Language Detection, NLP | 4 Comments

Language Detection Plugin for Apache Nutch

I developed a Language Detection plugin for Apache Nutch with our LangDetect library. Download (bundled in the LangDetect library) Setup manual Compatible to the Standard language identification plugin of Nutch 99% over accuracy Supports 49 languages Afrikaans, Albanian, Arabic, Bengali, … Continue reading

Posted in i18n, NLP, Nutch, Plugin, Search Engine, text analysis | Leave a comment

Language Detection Library for Java

I developed a Language Detection library for Java which is able to detect 49 languages for given text (English, Japanese, Chinese, …). http://code.google.com/p/language-detection/ This library has 99% over accuracy for news corpus (see below presentation). I’ll try to substitute Apache … Continue reading

Posted in i18n, Java, NLP, text analysis | 3 Comments