I’m currently write thesis about text analysis and want to cited your
“Language Detection Library for Java”. do you write any publication about that? and how to cited your project?
Hi, I’m glad to hear that.
Hence there are no publication of the library, would you write its author (Shuyo Nakatani), title (Language Detection Library for Java) and URL( http://code.google.com/p/language-detection/ ) to cite?
Thanks.
Are you a student?I guess so. I am a student researching on topic models. However, I am a newcomer in this field. Your blog posts have taught me much. Please accept my thanks.
Is there any intention to port the library to C++? Is there any documentation regarding how to generate language profiles? I am very impressed with the abilities of the library, thank you for committing the time and effort. Would you be interested in some kind of assistance in furthering development and/or financial support?
I have no plan to support C++ about this library.
I also have no document for language profile generation, but its code is very short and straightforward so that I guess you could probably understand it to read them.
> Would you be interested in some kind of assistance in furthering development and/or financial support?
Thanks for your proporsal. It is welcome to folk this library!
I am a engineer in Cybozu Labs, a research subsidiary of a Japanese groupware company.
I’m interested in various methods of natural language processing and am developing another langage detecition!
Hi Again ,
Another question regarding the language detector –
The languages from the detector are a two string letters identifying the lanauge –
Where could I find the mapping to the languages themselves ? Are these some standard language codes ?
Thanks for your answer.
As the previous comment mentioned, the language codes using on langdetect is quite standard.
> How about enhancement to the library to use the ISO639-3 3 character codes for identified languages as it is more inclusive?
The language code is decided when profile is generated. So 3 character-codes are probably available with the corresponding profiles.
But I don’t verify it
I have been researching language identification as well. I recently published a paper about it at IJCNLP2011. I investigated training language identification models using training data from multiple domains, you can find my paper at http://www.ijcnlp2011.org/proceeding/IJCNLP2011-MAIN/pdf/IJCNLP-2011062.pdf . There is also an implementation of my system available at http://www.csse.unimelb.edu.au/research/lt/resources/langid/ . I hope that you will be able to include it in future comparisons! My testing indicates that it is very competitive with CLD, as well as your java language-detection software.
Nakatani-san, Your libraries are very nicely done! We are working on closely releated stuff — also in Python. Please send me an email — perhaps we can meet and exchange ideas. Are you in Tokyo?
Hi,
I’m currently write thesis about text analysis and want to cited your
“Language Detection Library for Java”. do you write any publication about that? and how to cited your project?
Hi, I’m glad to hear that.
Hence there are no publication of the library, would you write its author (Shuyo Nakatani), title (Language Detection Library for Java) and URL( http://code.google.com/p/language-detection/ ) to cite?
Thanks.
Ok, thanks.
Are you a student?I guess so. I am a student researching on topic models. However, I am a newcomer in this field. Your blog posts have taught me much. Please accept my thanks.
Thank you, too!
I am a web engineer and started to study machine learnings about 2 years ago.
Please tell me if you find some mistakes!
Hi Shuyo,
Are you still actively developing your language-detection Java library?
I’m maintaining the library currently and experimenting another idea of language-detection.
What are you concerned in?
Is there any intention to port the library to C++? Is there any documentation regarding how to generate language profiles? I am very impressed with the abilities of the library, thank you for committing the time and effort. Would you be interested in some kind of assistance in furthering development and/or financial support?
I have no plan to support C++ about this library.
I also have no document for language profile generation, but its code is very short and straightforward so that I guess you could probably understand it to read them.
> Would you be interested in some kind of assistance in furthering development and/or financial support?
Thanks for your proporsal. It is welcome to folk this library!
I am a engineer in Cybozu Labs, a research subsidiary of a Japanese groupware company.
I’m interested in various methods of natural language processing and am developing another langage detecition!
Hello Mr. Shuyo,
I want to try the language detector code you’ve written , and I see a library
is required.
What is this library ? I couldn’t find any documentation about this. What does it do ?
Thanks !
langdetect library is distributed at the below.
http://code.google.com/p/language-detection/
Please download and try. Thanks!
Somehow the library name was not added to my question ( perhaps since it was a url ) –
so I meant JSONIC
I see.
JSONIC is provided at here http://sourceforge.jp/projects/jsonic/devel/ .
Hi Again ,
Another question regarding the language detector –
The languages from the detector are a two string letters identifying the lanauge –
Where could I find the mapping to the languages themselves ? Are these some standard language codes ?
The two alpha character codes are part of ISO639-1 here -> http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
How about enhancement to the library to use the ISO639-3 3 character codes for identified languages as it is more inclusive?
Thanks for your answer.
As the previous comment mentioned, the language codes using on langdetect is quite standard.
> How about enhancement to the library to use the ISO639-3 3 character codes for identified languages as it is more inclusive?
The language code is decided when profile is generated. So 3 character-codes are probably available with the corresponding profiles.
But I don’t verify it
Hello Shuyo
I have been researching language identification as well. I recently published a paper about it at IJCNLP2011. I investigated training language identification models using training data from multiple domains, you can find my paper at http://www.ijcnlp2011.org/proceeding/IJCNLP2011-MAIN/pdf/IJCNLP-2011062.pdf . There is also an implementation of my system available at http://www.csse.unimelb.edu.au/research/lt/resources/langid/ . I hope that you will be able to include it in future comparisons! My testing indicates that it is very competitive with CLD, as well as your java language-detection software.
Cheers
Marco
Hi Marco,
Very Thanks! I’ll try your system.
Pingback: Quora
Hi..
i m new to apache mahout…
plz give me the detail step to install it on the eclipse.
Nakatani-san, Your libraries are very nicely done! We are working on closely releated stuff — also in Python. Please send me an email — perhaps we can meet and exchange ideas. Are you in Tokyo?